The document describes an Entity Registry System (ERS) that allows for decentralized, linked data storage in a document store. It was designed to work in environments with poor network connectivity. The ERS uses contributors to write data, bridges to connect isolated parts of the system, and an optional aggregator for high-performance read-only data retrieval. Testing showed the ERS could tolerate disconnects and poor networks as long as connections lasted at least half a second. It was tested with up to 40 nodes and was able to reliably synchronize data in real-world simulation scenarios like a conference social network and remote merchants updating prices between villages via a mobile bridge.
2. Linked data
- In computing, linked data (often capitalized as Linked
Data) describes a method of publishing structured data
so that it can be interlinked and become more useful
through semantic queries (Wikipedia)
- Tim Berners Lee
- DBpedia alone 3.4 million concepts described by 1
billion triples
3.
4. Example solutions - centralised
- rdf4j
- neo4j
- gremlin (apache tinkerpop)
- virtuoso
- many others
5. Entity Registry System
- decentralised
- linked data in
document store
(s, p1, o1), (s, p2, o2) -> {“id”: s, “p1”: o1,
“p2”: o2, ..}
- designed for poor
network connectivity
8. Contributor
- read and write content
- private, public and cache
- private never shared
- public can be searched by others and distributed
9. Bridge
- can connect isolated parts of the system
- similar to a cache (if network goes down, data can still be read from
the bridge)
- reduce O(n^2) links down to O(n)
10. Aggregator
- optional component
- high-performance node/cluster
- read-only entry retrieval of data
- contributors and bridges push public data to it
14. Similar solutions
Nintendo streetpass and spotpass(3DS only, no aggregator)
Transparent Inter-process Communication
- nodes, zones, cluster (only logical grouping)
- ip layer, bridges reduce links
Sugar network (OLPC)
- in sugar network, clients can communicate to nodes(equivalent of bridges)
or Master node(equivalent of Aggregator)
- ers has more flexibility in the data format.
15. Initial status
- not working,
- very limited testing
- a bit frustrating to install
- not much investigation (real world tests)
Thus far, the highest number of concurrent users of ERS has been 4 XO laptops, with one bridge, all in the same
geographical location
16. Research Questions
- Can the Entity Registration system reliably perform in
a real-world scenario ? (i.e. provide the required
functionality in a robust manner)?
- Does the ERS scale to a large number of users?
- How does the ERS cope with poor network
connectivity?
17. “local” tests
- unit tests for storage, daemon and api
communication
- basic operations performance tests
- entity creation 5/s
- property edits 20/s
- value edits 20/s
18. Docker
- much larger number of nodes
- faster to start
- docker hub image -> ers can be ran within
seconds on any x86 device (atm no docker
image for arm)
19. “Real world” simulation
- simulations so far have a more or less ideal
network
- reality is a bit different
20. Simian army
- http://techblog.netflix.com/2011/07/netflix-simian-army.html
- december 2012 amazon employee launched a
maintenance process against the running production system
which deleted the state information needed by load balancers
- problems on Christmas Eve at 1:45 p.m.
- lasted until 9:41 a.m. on Christmas Day,
an outage of about 20 hours
- no Netflix on christmas - unhappy customers
21. Simian army - cont
- clunky to integrate
- only works on aws
24. Real world case - Conference
deployment
- Simulate conference social network
- Think LinkedIn without central server
- Profile, skills
- Endorsements
- Fixed bridge, mobile contributors
26. Remote merchants
- vendors in different remote villages
- no network connectivity
- box on a truck that visits every
farmer, provides up to date
information on the prices of the other
villages, picks up any new
information from current one
- Fixed contributors, mobile bridge
27. Behavior
- As long as network isn’t too bad(will be
detailed) if the truck stops for a couple of
seconds we achieve synchronization
28. Network tolerance
Fixed 5 seconds wait time, contributor writing
as fast as it can
- 100 ms each way
- 15% loss/corruption each way
- duplication is fine
- mostly binary progress
29. Research Questions Revisited
Can the Entity Registration system reliably perform in a
real-world scenario ? (i.e. provide the required
functionality in a robust manner)? Suite deployed, tests
indicate yes
- Does the ERS scale to a large number of users? 40
nodes
- How does the ERS cope with poor network connectivity?
- If connection for .5 sec or longer: yes