2. High Replication
Datastore
Ikai Lan - @ikai
Esto es Google
August 9th, 2011
3. About the speaker
• Developer Relations at Google based out
of San Francisco, CA
• Google+: http://plus.ikailan.com
• Twitter: @ikai
4. About the speaker
BIOGRAFÍA: Ikai es ingeniero de Desarrollo de
Programas en el motor de Google App. Antes de
Google, trabajó como ingeniero programador
construyendo aplicaciones para móviles y redes
sociales en LinkedIn. Ikai es un ávido de la
tecnología, consumiendo cantidades de material
acerca de nuevos lenguajes de programación,
estructuras o servicios. En sus ratos libres disfruta
de California, ganando concursos de karaoke
chino y jugando futbol de bandera. Actualmente
vive en el área de la Bahía de San Francisco,
donde agoniza viendo como su equipo favorito
explota temporada tras temporada.
English original: http://code.google.com/team/
5. About the speaker
BIOGRAFÍA: Ikai es ingeniero de Desarrollo de
Programas en el motor de Google App. Antes de
Google, trabajó como ingeniero programador
construyendo aplicaciones para móviles y redes
sociales en LinkedIn. Ikai es un ávido de la
tecnología, consumiendo cantidades de material
acerca de nuevos lenguajes de programación,
estructuras o servicios. En sus ratos libres disfruta
de California, ganando concursos de karaoke
chino y jugando futbol de bandera. Actualmente
vive en el área de la Bahía de San Francisco,
donde agoniza viendo como su equipo favorito
explota temporada tras temporada. !!!
English original: http://code.google.com/team/
6. Agenda
• What is Google App Engine?
• Intro to High Replication Datastore
• How does High Replication work under
the hood?
7. If you’re not an App
Engine developer ..
• First off, shame on you
• The code in these examples might not
make sense
• The concepts in these slides are always
good to understand
22. Customer: The Royal Wedding
Peak: 32,000 requests a second with no disruption!
23. Core APIs
Memcache Datastore URL Fetch
Mail XMPP Task Queue
Images Blobstore User Service
24. App Engine
Datastore
Schemaless, non-relational
datastore built on top of
Google’s Bigtable technology
Enables rapid development
and scalability
25. High Replication
• Strongly consistent
• Multi-data center
• Consistent
performance
• High Reliability
• No data loss
26. How do I use it?
• Create a new application! Just remember
the rules
• Fetch by key and ancestor queries exhibit
strongly consistent behavior
• Queries without an ancestor exhibit
eventually consistent behavior
27. Strong vs. Eventual
• Strong consistency means immediately after
the datastore tells us a write has been
committed, the effects of that write are
immediately visible
• Eventual consistency means that after the
datastore tells us a write has been
committed, the effects of that write are
visible after some time
30. This is strongly consistent
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity item = new Entity("Item");
item.setProperty("data", 123);
Key key = datastore.put(item);
// This exhibits strong consistency.
// It should return the item we just saved.
Entity result = datastore.get(key);
Get by key
31. This is strongly consistent
// Save the entity root
Entity root = new Entity("Root");
Key rootKey = datastore.put(root);
// Save the child
Entity childItem = new Entity("Item", rootKey);
Ancestor
childItem.setProperty("data", 123);
datastore.put(childItem);
query
Query strongConsistencyQuery = new Query("Item");
strongConsistencyQuery.setAncestor(rootKey);
strongConsistencyQuery.addFilter("data", FilterOperator.EQUAL, 123);
FetchOptions opts = FetchOptions.Builder.withDefaults();
// This query exhibits strong consistency.
// It will return the item we just saved.
List<Entity> results = datastore.prepare(strongConsistencyQuery)
.asList(opts);
32. This is eventually consistent
Entity item = new Entity("Item");
item.setProperty("data", 123);
datastore.put(item);
// Not an ancestor query
Query eventuallyConsistentQuery = new Query("Item");
eventuallyConsistentQuery.addFilter("data", FilterOperator.EQUAL, 123);
FetchOptions opts = FetchOptions.Builder.withDefaults();
// This query exhibits eventual consistency.
// It will likely return an empty list.
List<Entity> results = datastore.prepare(eventuallyConsistentQuery)
.asList(opts);
33. Why?
• Reads are transactional
• On a read, we try to determine if we have
the latest version of data
• If not, we catch up the local node to the
latest version
34. To understand this ..
• We need some understanding of our
implementation of Paxos
• ...which necessitates some understanding of
transactions
• ... which necessitates some some
understanding of entity groups
36. Entity Groups
Entity
User
group root
Blog Blog
Entry Entry Entry
Comment
Comment Comment
37. Entity group root
// Save the entity root
Entity root = new Entity("Root");
Key rootKey = datastore.put(root);
// Save the child
Entity childItem = new Entity("Item", rootKey);
childItem.setProperty("data", 123);
datastore.put(childItem);
Query strongConsistencyQuery = new Query("Item");
strongConsistencyQuery.setAncestor(rootKey);
strongConsistencyQuery.addFilter("data", FilterOperator.EQUAL, 123);
FetchOptions opts = FetchOptions.Builder.withDefaults();
// This query exhibits strong consistency.
// It will return the item we just saved.
List<Entity> results = datastore.prepare(strongConsistencyQuery)
.asList(opts);
39. Adding an entity child
// Save the entity root
Entity root = new Entity("Root");
Key rootKey = datastore.put(root);
// Save the child
Entity childItem = new Entity("Item", rootKey);
childItem.setProperty("data", 123);
datastore.put(childItem);
Query strongConsistencyQuery = new Query("Item");
strongConsistencyQuery.setAncestor(rootKey);
strongConsistencyQuery.addFilter("data", FilterOperator.EQUAL, 123);
FetchOptions opts = FetchOptions.Builder.withDefaults();
// This query exhibits strong consistency.
// It will return the item we just saved.
List<Entity> results = datastore.prepare(strongConsistencyQuery)
.asList(opts);
44. Transactional reads
Version 12 of data
Still being committed
App App
Server Data read begins Server
Datastore
Version 11
45. Transactional reads
Version 12 of data
Still being committed
App App
Server Data read begins Server
Datastore
Version 11
Version 11 of data
returned - this is fully
committed version
46. Optimistic locking
Optimistic because
you assume
most of the time no
modifications will
occur while you have
object out - you only
do work when this
isn’t true
49. Optimistic Locking
App
Server Data read begins
Datastore
Version 11 of data
Version 11
50. Optimistic Locking
Version 12 of data
App Finished committed. App
Server Data read begins Datastore has version 12. Server
Datastore
Version 11 of data
Version 11
51. Optimistic Locking
Version 12 of data
App Finished committed. App
Server Data read begins Datastore has version 12. Server
Datastore
Version 11 of data
Version 12
52. Optimistic Locking
Version 12 of data
App Finished committed. App
Server Data read begins Datastore has version 12. Server
Datastore
Version 11 of data
Write data back
Version 12
53. Optimistic Locking
Version 12 of data
App Finished committed. App
Server Data read begins Datastore has version 12. Server
Datastore
Version 11 of data
Write data back
Version 12
Trying to write back to
datastore: exception!
Another client has
modified data
54. Life of a distributed write
1. Writes to the
journal of multiple
datastores
App
Server
2. Returns once
datastores
acknowledge
receiving write
55. Life of a distributed write
(part 2)
The journal tracks
writes that need to
be applied
Write being Item 1 <data>
applied Version 25
Item 5 <data>
Version 12
Write to be Item 1 <data>
applied in future Version 26
56. Transactional reads
Local datastore up to date
App
Server
Is the item caught up
to the latest journal
Tries to read from write?
local datastore
Journal
Applied. Item 1 <data>
Version 25
Applied. Item 5 <data>
Version 12
Applied. Item 1 <data>
Version 26
57. Transactional reads
Local datastore up to date
App
Server
Is the item caught up
to the latest journal
Tries to read from write?
local datastore
Yes!
Journal
Applied. Item 1 <data> Return the
Version 25
data in the
Item 5 <data>
Applied.
Version 12 datastore
Applied. Item 1 <data>
Version 26
58. Transactional reads
Local datastore not up to date - Step 1
App Is the item caught up
Server to the latest journal
write?
Tries to read from
local datastore
Journal
Applied. Item 1 <data>
Version 25
Applied. Item 5 <data>
Version 12
Unapplied. Item 1 <data>
Version 26
59. Transactional reads
Local datastore not up to date - Step 1
App Is the item caught up
Server to the latest journal
write?
Tries to read from
local datastore
No.
Journal
Applied. Item 1 <data> Catch the
Version 25
local data
Item 5 <data>
Applied.
Version 12 up
Unapplied. Item 1 <data>
Version 26
60. Transactional reads
Local datastore not up to date - Step 2
All datastores either:
Waits for local return up-to-date data
App datastore to return or force catch up
Server
Request data from
remote datastore App uses data from
first datastore that
Request data from responds
remote datastore
61. More reading
• My example was grossly oversimplified
• More details can be found here:
http://www.cidrdb.org/cidr2011/Papers/
CIDR11_Paper32.pdf
62. Contradictory advice
• Entity groups must be as big as possible to
cover as much related data as you can
• Entity groups must be small enough such
that your write rate per entity group never
goes above one write/second
63. Summary
• Remember the rules of strong consistency
vs. eventual consistency
• Group your data into entity groups and use
ancestor queries when possible
• App Engine’s datastore gives you the best
of all worlds: high reliability and
strong data consistency
Does CLOUD COMPUTING just means your servers are SOMEWHERE ELSE? Or is it SOMETHING MORE?\nWHY put your servers in the cloud?\n- Don&#x2019;t want to MANAGE servers?\n- Or is it the ELASTICITY and SCALABILITY of the cloud?\n- If so, you NEED: DISTRIBUTED cloud computing\n * TODAY we&#x2019;ll talk about why\n
Does CLOUD COMPUTING just means your servers are SOMEWHERE ELSE? Or is it SOMETHING MORE?\nWHY put your servers in the cloud?\n- Don&#x2019;t want to MANAGE servers?\n- Or is it the ELASTICITY and SCALABILITY of the cloud?\n- If so, you NEED: DISTRIBUTED cloud computing\n * TODAY we&#x2019;ll talk about why\n
Does CLOUD COMPUTING just means your servers are SOMEWHERE ELSE? Or is it SOMETHING MORE?\nWHY put your servers in the cloud?\n- Don&#x2019;t want to MANAGE servers?\n- Or is it the ELASTICITY and SCALABILITY of the cloud?\n- If so, you NEED: DISTRIBUTED cloud computing\n * TODAY we&#x2019;ll talk about why\n