8. On the Web side
- Created Dynamo
Similar needs for Web giants :
- < 40 min of unavailability per year
• Huge amount of data
• High availability
• Fault tolerance
- Created BigTable & MapReduce
• Scalability on commodity - Stores every webpages of Internet
hardware
9. Amazon : the birth of Dynamo
Requires complex requests,
temporal unavailability is acceptable
Fill cart Checkout Payment Process order Prepare Send
Requires high availability,
key-value store is enough
10. On the Financial side
- Released Coherence in 2001
Needs within financial market :
- Started as a distributed cache
• Very low latency
• Rich queries & transactions
• Scalability
- Released Gigaspaces XAP in 2001
• Data consistency - Routes the request inside the data
17. Partitioned Data Modeling
Seat
Booking Passenger
number
reduction name
price
Train
code
type
TrainStation
TrainStop
code
date
name
Typical relational data model
18. Partitionned Data Modeling
Partitioning ready
entities tree
e ntity
Root Seat
Booking Passenger
number
reduction name
price
Train
code
Du
type pli Refe
ca
ted renc
in e d
TrainStation ea ata
TrainStop ch
code pa
date rtit
ion
name
Find the root entity and denormalize
19. Partitionned Data Modeling
Remove unused data
Seat
Booking Passenger
number
reduction name
price
booked
Train
code
type
TrainStation
TrainStop
code
date
name
20. Partitionned Data Modeling
Sharding ready data structure
Seat
number
price
booked
Train
code
type
TrainStation
TrainStop
code
date
name
33. Request Driven Data Modeling
• Relational data modeling is business driven
Adaptation to requests comes with tuning
• With partitioning, data modeling had to be adapted for requests
Because network latency matters
• NoSQL & DataGrids data modeling is request driven
Two requests may require to store data twice
34. Key-Value Store
In memory
In memory
with async
persistence
Persistent
35. Example with a user profile
johndoe User profile as byte[]
Similar to a Java
HashMap
36. Write Example with Riak
RiakClient riak = new RiakClient("http://server1:8098/riak");
RiakObject userProfileObj =
new RiakObject("bucket", "johndoe", serializer.serialize(userProfile);
riak.store(userProfileObj);
Inserts a user profile
into Riak
37. Read Example with Riak
FetchResponse response = riak.fetch("bucket", "johndoe");
if (response.hasObject()) {
userProfileObj = response.getObject();
}
Fetch a user profile using
its key in Riak
39. Column Families Store
For each Row ID we have
a list of key-value pairs
Key-value
pairs are
sorted by keys
Relational DB Column families DB
40. Example with a shopping cart
johndoe 17:21 Iphone 17:32 DVD Player 17:44 MacBook
willsmith 6:10 Camera 8:29 Ipad
pitdavis 14:45 PlayStation 15:01 Asus EEE 15:03 Iphone
41. Write Example with Cassandra
Cluster cluster =
HFactory.getOrCreateCluster("cluster", new CassandraHostConfigurator("server1:9160"));
Keyspace keyspace = HFactory.createKeyspace("EcommerceKeyspace", cluster);
Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
mutator.insert("johndoe", "ShoppingCartColumnFamily",
HFactory.createStringColumn("14:21", "Iphone"));
Inserts a column into the
ShoppingCartColumnFamily
42. Read Example with Cassandra
SliceQuery<String, String, String> query =
HFactory.createSliceQuery(keyspace,
stringSerializer, stringSerializer, stringSerializer);
query.setColumnFamily("ShoppingCartColumnFamily")
.setKey("johndoe")
.setRange("", "", false, 10);
QueryResult<ColumnSlice<String, String>> result = query.execute();
Reads a slice of 10 columns
from ShoppingCartColumnFamily
44. Example with an item of a catalog
{
"name": "Iphone",
"price": 559.0,
item_1 "vendor": "Apple",
"rating": 4.6,
"tags": [ "phone", "touch" ]
}
The database is aware of
document’s fields and
can offers complex
queries
45. Write Example with MongoDB
Mongo mongo = new Mongo("mongos_1", 27017);
DB db = mongo.getDB("Ecommerce");
DBCollection catalog = db.getCollection("Catalog");
BasicDBObject doc = new BasicDBObject();
doc.put("name", "Iphone");
doc.put("price", 559.0);
catalog.insert(doc);
Inserts an item
document into MongoDB
46. Read Example with MongoDB
BasicDBObject query = new BasicDBObject();
query.put("price", new BasicDBObject("$lt", 600));
DBCursor cursor = catalog.find(query);
while(cursor.hasNext()) {
System.out.println(cursor.next());
}
Queries for all items with
a price lower than 600
48. Example with train booking with IBM eXtremeScale
@Entity(schemaRoot=true)
public class Train { Seat
number
price
@Id
booked
String code; Train
code
@Index type
@Basic
TrainStop
String name;
date
@OneToMany(cascade=CascadeType.ALL)
List<Seat> seats = new ArrayList<Seat>();
@Version
int version;
...
} With Data Grids,
sub entities can have
cross relations
49. Write Example with IBM eXtreme Scale
eXtreme Scale provides
a JPA Style API
void persist(Train train) {
entityManager.persist(train);
}
Inserts a train into
eXtreme Scale
50. Read Example with IBM eXtreme Scale
/** Find by key */
Train findById(String id) {
return (Train) entityManager.find(Train.class, id);
}
/** Query Language */
Train findByTrain(String code) {
Query q = entityManager.createQuery("select t from Train t where t.code=:code");
q.setParameter("code", code);
return (Train) q.getSingleResult();
}
Simple and complex queries
with eXtreme Scale
51. More APIs
• Another Java EE versus Spring battle ? JSR 347 Data Grids vs. Spring Data
Unified API ontop of relational, document, column, key-value ?
Object to tuple projection API
64. Transactions with Manual Compensation
• Code “do” & “undo” & chain execution
• What about interrupted chain execution ? Data corruption ?
65. Transactions with Manual Compensation
• Code “do” & “undo” & chain execution
• What about interrupted chain execution ? Data corruption ?
data store managed transaction chain execution
67. Key-Value Store
• Get and Set by key
Simple but enough for a lot of use cases
• Riak and Voldemort provide a great scalability
Great to persist continuously growing datasets
• Memcached and Redis offer low overhead and latency
Great for cache and live data
68. Column Families Store
• Get and Set by key of a list of columns
Makes it possible to fetch and update partial data
• Queries are simples, but columns slice fetching is possible
Great for pagination
• Data model is too low level for many complex data modeling
Should typically be used for the largest scalability needs
69. Document Store
• Schema less
Great for continuously updated schemas
• Complex queries are available
Necessary for filtering and search
• Scalability may be limited if not querying using partition key
Can be handle using multiple storage and limited queries
70. In Memory Data Grid
• Very Low Latency & eXtreme Transaction Processing (XTP)
Investment banking, booking & inventory systems
• In Memory - No Persistence
Most of the time backed with a database
• High budget and Developer skills required
Some Open Source alternatives are appearing
71. Polyglot storage for eCommerce
Products
Solr
search
Product catalog MongoDB
Application
User account and
Cassandra
Shopping cart
Warehouse
inventory Coherence
72. Why NoSQL & DataGrids matter ?
• Polyglot Storage: databases that fit the needs of every type of data
• Linear Scalability: being able to handle any further business requirements
• High Availability: multi-servers and multi-datacenters
• Elasticity: natural integration with Cloud Computing philosophy
• Some new use cases now available