In Memory Data Grids in Action with Oracle Coherence presented to No SQL users.
The "transactions" chapter is missing as it has been rescheduled to another session.
Why Teams call analytics are critical to your entire business
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions chapter)
1. Transactions chapter will be presented
during another session
In Memory Data Grid in Action
with Oracle Coherence
for Paris NoSQL User Group
Cyrille Le Clerc
Wednesday, May 25, 2011
2. Speaker
@cyrilleleclerc
blog.xebia.fr
Cyrille Le Clerc
Large Scale
In Memory Data Grid
Open Source
(Apache CXF, ...)
“you build it, you run it”
2
Wednesday, May 25, 2011
4. On the Financial side
- Released Coherence in 2001
Needs within financial market :
- Started as a distributed cache
• Very low latency
• Rich queries & transactions
• Scalability
- Released Gigaspaces XAP in 2001
• Data consistency - Started as a data grid
4
Wednesday, May 25, 2011
5. Let’s define an In Memory Data Grid ...
5
Wednesday, May 25, 2011
6. Let’s define an In Memory Data Grid
eXtreme Scale
This is an In Memory Data Grid
6
Wednesday, May 25, 2011
7. Let’s define an In Memory Data Grid
This is Network Attached Memory
7
Wednesday, May 25, 2011
8. Let’s define an In Memory Data Grid
Similarities with NoSQL document oriented
Partitioned, distributed Hastable, schema-less, value is not opaque,
scale-out scalability
Very fast
In memory (persistence coming), business logic inside the data
Consistent and Available
Transactional, redundant
Written in Java, data are POJOs
Not necessary
Clients in Java, Microsoft, etc
8
Wednesday, May 25, 2011
9. Use cases for this presentation
9
Wednesday, May 25, 2011
10. Train Booking System
trains, stations,
seats, booking and
passengers
10
Wednesday, May 25, 2011
18. Data Access Patterns
This is not traditional Java EE coding style !
Can apply very complex business logic inside the
data
Stored Procedures Style
Change management challenge !
18
Wednesday, May 25, 2011
25. Data Access Patterns
This is not traditional Java EE coding style
Change management
Don’t forget “Map Reduce” = “Distributed Table
Scan”
Use Indexes
25
Wednesday, May 25, 2011
26. CAP Theorem & In Memory Data Grids
26
Wednesday, May 25, 2011
27. CAP Theorem and In Memory Data Grid
Only 2 of these 3
properties can be
Consistency achieved at any given
moment in time
Brewer’s Conjecture
Availability
Partition
Tolerance
http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
27
Wednesday, May 25, 2011
28. CAP Theorem and In Memory Data Grid
Data Grids Only 2 of these 3
properties can be
Consistency achieved at any given
moment in time
Brewer’s Conjecture
Availability
Partition
Tolerance
http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
28
Wednesday, May 25, 2011
29. Cross Data Center Data Consistency
London
New York
Tokyo
World wide replication
for financial market
29
Wednesday, May 25, 2011
30. Cross Data Center Data Consistency
{
"name": "Barbie Computer",
"stock": 147,
"weigth" : 200
}
{
"name": "Barbie Computer",
West Coast "stock": 147,
"weigth" : 200
}
East Coast
Warehouse stocks
30
Wednesday, May 25, 2011
31. Cross Data Center Data Consistency
set stock to 146
{
"name": "Barbie Computer",
"stock": 147,
"weigth" : 200
}
{
"name": "Barbie Computer",
West Coast "stock": 147,
"weigth" : 200
}
East Coast
propagation delay !
31
Wednesday, May 25, 2011
32. Cross Data Center Data Consistency
set stock to 146
{
"name": "Barbie Computer",
"stock": 147,
"weigth" : 200
}
{
"name": "Barbie Computer",
West Coast "stock": 147,
"weigth" : 200
}
East Coast
set weight 175
reconciliation API needed !
32
Wednesday, May 25, 2011
33. Cross Data Center Data Consistency
set stock to 146
{
"name": "Barbie Computer",
"stock": 147,
"weigth" : 200
}
{
"name": "Barbie Computer",
West Coast "stock": 147,
"weigth" : 200
}
East Coast
set weight 175
Network partitioning
33
Wednesday, May 25, 2011
35. Data Modeling
Dominant Question Driven Design
Opposite to Relational which is Domain Driven Design
Constrained Tree Schema
Because RPC matters
Denormalized
Due to dominant questions and CTS
35
Wednesday, May 25, 2011
36. Data Modeling
Seat
Booking Passenger
number
reduction name
price
Train
code
type
TrainStation
TrainStop
code
date
name
Typical relational data model
36
Wednesday, May 25, 2011
37. Data Modeling
Partitioning ready
entities tree
e ntity
Root Seat
Booking Passenger
number
reduction name
price
Train
code
type Du R
pli
ca efer
ted en
TrainStation in ce d
ea
TrainStop ch ata
code gri
date dn
name od
e
Find the root entity and denormalize
37
Wednesday, May 25, 2011
38. Data Modeling
Remove unused data
Seat
Booking Passenger
number
reduction name
price
booked
Train
code
type
TrainStation
TrainStop
code
date
name
Partitioned
Replicated
38
Wednesday, May 25, 2011
39. Data Modeling
Seat
number
price
booked
Train
code
type
TrainStation
TrainStop
code
date
name
Partitioned
Replicated
Data Grid Ready data structure
39
Wednesday, May 25, 2011
41. Data Modeling is Hard !
Account Account
number number
from to
CashWitdrawal MoneyTransfer CashWitdrawal
date id date
amount date amount
amount
Two root entities for the
same MoneyTransfer !
41
Wednesday, May 25, 2011
42. Data Modeling is Hard !
Account Account
number number
CashWitdrawal MoneyTransferIn MoneyTransferOut CashWitdrawal
date id id date
amount date date amount
amount amount
Split MoneyTransfer
42
Wednesday, May 25, 2011
43. Data Modeling is Hard !
Account Account
number number
CashWitdrawal MoneyTransferIn MoneyTransferOut CashWitdrawal
date id id date
amount date date amount
amount amount
Split MoneyTransfer
43
Wednesday, May 25, 2011
44. Data Modeling is Hard !
Account
number
CashWitdrawal MoneyTransferOut MoneyTransferIn
date id id
amount date date
amount amount
Data Grid Ready data structure
44
Wednesday, May 25, 2011
46. Data Serialization
Used for data transfer and byte oriented storage
Must support evolvable data structure
Hot topic like Apache Thrift, Apache Avro, Google
Protocol Buffer
46
Wednesday, May 25, 2011
47. Data Storage
Store Java Beans in the grid
No need to unmarshall for inprocess operations
Beware of garbage collector !
Store byte arrays in the grid
Pay unmarshalling at each read and write
Low-level / byte-oriented APIs to read data
Slightly more garbage collector friendly
47
Wednesday, May 25, 2011
48. Communication Protocols
UDP Multi Cast (Coherence, Gigaspaces)
TCP/IP (Websphere eXtreme Scale)
48
Wednesday, May 25, 2011
49. Topology
Partitions made of shards : 1 primary + 0..*
backups)
Dynamic shards location (changes at runtime and
at restart)
Can use dedicated “directory servers” or embed it
in the “data nodes”
49
Wednesday, May 25, 2011
50. JVM and Memory
Many editors recommend tiny 1.4 Go JVM !
Garbage collector hell
More than ten JVM per server
Management hell
More and more IMDG support large heaps
50
Wednesday, May 25, 2011
52. Raw Java Mapping with Oracle Coherence
public class Train extends AbstractEvolvable implements PortableObject {
enum Type {
HIGH_SPEED, NORMAL
}
/** Key of the Cache */
String code;
/** Indexed */ Seat
String name;
number
Type type;
price
booked
List<Seat> seats = new ArrayList<Seat>(); Train
code
int version; type
List<TrainStop> trainStops = new ArrayList<TrainStop>();
TrainStop
@Override date
public int getImplVersion() {
return 1;
}
@Override
public void readExternal(PofReader pofReader) throws IOException {
this.code = pofReader.readString(0);
this.name = pofReader.readString(1);
this.type = (Type) pofReader.readObject(2);
pofReader.readCollection(3, this.seats);
pofReader.readCollection(4, this.trainStops);
this.version = pofReader.readInt(5);
hand-coded serialization
}
@Override
JUnit is your friend !
public void writeExternal(PofWriter pofWriter) throws IOException {
pofWriter.writeString(0, this.code);
pofWriter.writeString(1, this.name);
pofWriter.writeObject(2, this.type);
pofWriter.writeCollection(3, this.seats, Seat.class);
pofWriter.writeCollection(4, this.trainStops, TrainStop.class);
pofWriter.writeInt(5, this.version);
}
}
52
Wednesday, May 25, 2011
53. JPA Style Mapping with Websphere eXtreme
Scale
@Entity(schemaRoot=true)
public class Train { Seat
number
price
@Id
booked
String code; Train
code
@Index type
@Basic
TrainStop
String name;
date
@OneToMany(cascade=CascadeType.ALL)
List<Seat> seats = new ArrayList<Seat>();
@Version
int version;
...
}
sub entities can have
cross relations
53
Wednesday, May 25, 2011
54. Map API with Oracle Coherence
NamedCache trainCache = CacheFactory.getCache("train-cache");
/** Save */
void persist(Train train) {
trainCache.put(train.getCode(), train);
}
/** Find by key */
Train findByCode(String code) {
return (Train) trainCache.get(code);
}
/** Find by Query Language */
Train findByTrainName(String name) {
Filter filter = QueryHelper.createFilter("name = :name" ,
Collections.singletonMap("name", name));
Set<Map.Entry<String, Train>> trainEntrySet = trainCache.entrySet(filter);
if (trainEntrySet.isEmpty()) {
return null;
} else {
return trainEntrySet.iterator().next().getValue();
}
}
Map API
54
Wednesday, May 25, 2011
55. JPA Style with Websphere eXtreme Scale
/** Save */
void persist(Train train) {
entityManager.persist(train);
}
/** Find by key */
Train findByCode(String code) {
return (Train) entityManager.find(Train.class, code);
}
/** Query Language */
Train findByTrainName(String name) {
Query q = entityManager.createQuery("select t from Train t where t.name=:name");
q.setParameter("name", name);
return (Train) q.getSingleResult();
}
JPA Style Entity Manager
55
Wednesday, May 25, 2011
58. Indexes with Websphere eXtreme Scale
@Entity(schemaRoot=true)
class Train {
@Index
@Basic eXtreme Scale
String name;
@Index
Collection<String> getTrainStationsCodes() {
return Collections2.transform(trainStops, ...);
}
...
}
Query query = em.createQuery("select t from Train t where t.name=:name");
query.getPlan();
This is an execution plan
for q2 in Train ObjectMap using INDEX on name = ( ?name)
filter ( q2.c[0] = ?name )
returning new Tuple( q2 )
58
Wednesday, May 25, 2011
59. More APIs
Another Java EE versus Spring battle ?
JSR 347 Data Grids vs. Spring Data
Serialization / Object to Tuple Mapping API ?
Unified API ontop of NoSQL stores ?
59
Wednesday, May 25, 2011
60. Data Grid <-> Relational Database Interactions
60
Wednesday, May 25, 2011
61. Data Grid <-> Relational Database
Data Grids are “In Memory” -> we need to persist data on disk !
61
Wednesday, May 25, 2011
62. Data Grid <-> Relational Database
update / insert / delete
“select directly modified in DB”
62
Wednesday, May 25, 2011
63. Data Grid <-> Relational Database
Data Grid -> Relational Database
backend DB
Highly available write behind queues
+ SQL batched statements
63
Wednesday, May 25, 2011
64. Data Grid <-> Relational Database
Data Grid -> Relational Database
Seat
number
price
booked
Train
code
type
TrainStation
TrainStop
code
date
name
Constrained Tree Schema <-> Relational
Impedance Mismatch
64
Wednesday, May 25, 2011
65. Data Grid <-> Relational Database
DB writes MUST succeed !
Prefer raw SQL rather than reused business logic
Denormalize the database
Remove the foreign keys, use same PKs in DB and data grid
Support unordered SQL statements
Align the database on the Data Grid model !
65
Wednesday, May 25, 2011
66. Data Grid <-> Relational Database
Relational Database -> Data Grid
select * from train
where last_modif > ?
backend DB
Data Grid Originated Scheduled Refresh
(Oracle System Change Number, etc)
66
Wednesday, May 25, 2011
67. Data Grid <-> Relational Database
Relational Database -> Data Grid
backend DB
Database Originated Push
JMS = durable subscription
(Oracle Database Change Notification, etc)
67
Wednesday, May 25, 2011
68. Data Grid <-> Relational Database
In Memory -> prepare for reloading after
maintenance operations !
Need for “graceful shutdown with disk persistence”
Prepare consistency checkers
68
Wednesday, May 25, 2011
72. Data Grids and Operations
Standard packaging?
Do It Yourself (layout, scripts, etc)
Limited Management
Do It Yourself (stop/start, detecting data loss, etc)
Limited debugging tools
Do It Yourself (debugging consoles, troubleshooting agents)
JVM pandemia
Dozens of JVM to manage !
72
Wednesday, May 25, 2011
73. Data Grids and Operations
Dev / Ops collaboration is required
Experts only !
73
Wednesday, May 25, 2011
74. The right tool for the right job
74
Wednesday, May 25, 2011
75. The right tool for the right job
Incredibly fast ! Even with transactions !
Scalable
If you solve the data loading issue
Good at data replication (when it implements it)
Reconciliation api, etc
Very geeky on both dev and ops side
Not an enterprise grade data store
Requires very skilled people + change management
“Quite” expensive
75
Wednesday, May 25, 2011