1. NoSQL: No sweat with JBoss Data Grid
Shane Johnson
Technical Marketing Manager
Tristan Tarrant
Principal Software Engineer
10/08/2012
1 Shane K Johnson / Tristan Tarrant
26. CAP Theorem
Eric Brewer
26 Shane K Johnson / Tristan Tarrant
27. CAP Theorem
● Consistency
● Availability
● Partition Tolerance
27 Shane K Johnson / Tristan Tarrant
28. JBoss Data Grid + CAP Theorem
● No Physical Partition
● Consistent and Available (C + A)
● Physical Partition
● Available (A + P)
● Pseudo Partition (e.g. Unresponsive Node)
● Consistent or Available (C + P / A + P)
28 Shane K Johnson / Tristan Tarrant
32. Caching and Data Grids for JEE
Caching Data Grids
JSR-107 JSR-347
32 Shane K Johnson / Tristan Tarrant
33. Caching in Java
● Developers have been doing it forever
● To increase performance
● To offload legacy data-stores from unnecessary
requests
● Home-brew approach based on Hashtables and Maps
● Many Free and commercial libraries but...
● … no Standard !
33 Shane K Johnson / Tristan Tarrant
34. JSR-107: Caching for JEE
● Local (single JVM) and Distributed (multiple JVMs)
caches
● CacheManager: a way to obtain caches
● Cache, “inspired” by the Map API with extensions for
entry expiration and additional atomic operations
● A Cache Lifecycle (starting, stopping)
● Entry Listeners for specific events
● Optional features: JTA support and annotations
● One of the oldest JSRs, dormant for a long time,
recently revived by JSR-347
34 Shane K Johnson / Tristan Tarrant
35. And now ?
● Now that I've put a lot of data in my distributed cache,
what can I do with it ?
● And most importantly...
● HOW ?
35 Shane K Johnson / Tristan Tarrant
36. Multiple clustering options
● Replication
● All nodes have all of the data.
● Grid Size == smallest node
● Distribution
● The Grid maintains n copies of each time of data on
different nodes
● Grid Size == total size / n
36 Shane K Johnson / Tristan Tarrant
37. We like asynchronous
● So much that we want it in the API:
● Future<V> getAsync(K);
● Future<V> getAndPut(K, V);
37 Shane K Johnson / Tristan Tarrant
38. Keeping things close together
● If I need to access semantically-close data quickly, why
not keep it on the same node ?
● Grouping API
● Distribution per-group and not per-key
● Via annotations
● Via a Grouper class
38 Shane K Johnson / Tristan Tarrant
39. Eventual consistency
● One step further than asynchronous clustering for
higher performance
● Entries are tagged with a version (e.g. a timestamp or
a time-based UUID): newer versions will eventually
replace all older versions in the cluster
● Applications retrieving data may get an older entry,
which may be “good enough”
39 Shane K Johnson / Tristan Tarrant
43. Performing parallel computation
● Distributed Executors
● Run on all nodes where a cache exists
● Each executor works on the slice of data local to itself
● Fastest access
● Parallelization of operations
● Usually returns
43 Shane K Johnson / Tristan Tarrant
44. Map / Reduce
● A mapper function iterates through a set of key/values
transforming them and sending them to a collector
void map(KIn, VIn, Collector<KOut, Vout>)
● A reducer works through the collected values for each
key, returning a single value
VOut reduce(KOut, Iterator<VOut>)
● Finally a collator processes the reduced key/values
and returns a result to the invoker
R collate(Map<KOut, VOut> reducedResults)
44 Shane K Johnson / Tristan Tarrant
46. Replicated Use Case
● Finance
● Master / Slave
● High Availability
● Failover
● Performance + Consistency
● Data – Lifespan
● Servers – Few
● Memory – Medium
46 Shane K Johnson / Tristan Tarrant
47. Distributed Use Case #1
● Telecom / Media
● Performance > Consistency
● Data
● Infinite
● Calculated
● Servers – Few
● Memory – Large
47 Shane K Johnson / Tristan Tarrant
48. Distributed Use Case #2
● Telecom
● Consistency > Performance
● Data
● Continuous
● Limited Lifespan
● Servers – Many
● Memory - Normal
48 Shane K Johnson / Tristan Tarrant
49. Q&A
Look for a follow up on the howtojboss.com blog.
49 Shane K Johnson / Tristan Tarrant