NoSQL, No sweat with JBoss Data Grid

NoSQL: No sweat with JBoss Data Grid

Shane Johnson
Technical Marketing Manager

Tristan Tarrant
Principal Software Engineer

10/08/2012

1 Shane K Johnson / Tristan Tarrant

NoSQL NOSQL


Agenda

● Data Stores
● Data Grid
● NOSQL
● Cache
● Big Data
● Use Cases
● Q&A


Data Stores

● Key / Value
● Document
● Graph
● Column Family
● And more...


Data Grid?


NOSQL

● Elasticity
● Distributed Data
● Concurrency
● CAP Theorem
● Flexibility


Elasticity

● Node Discovery
● Failure Detection


How?


JBoss Data Grid is built on a reliable group
membership protocol: JGroups.


Distributed Data


Replicated


Distributed


How?


Consistent Hashing
JBoss Data Grid Implementation: MurmurHash3


Hash Wheel


Virtual Nodes


Linear Scaling


Concurrency


How?


Multi Version Concurrency Control


Internals

● Transactions
● 2 PC
● Isolation Level
● Read Committed
● Repeatable Read
● Locking
● Optimistic
● Pessimistic
● Write Skew
● Version – Vector Clocks


Consistency


CAP Theorem
Eric Brewer


CAP Theorem

● Consistency
● Availability
● Partition Tolerance


JBoss Data Grid + CAP Theorem

● No Physical Partition
● Consistent and Available (C + A)
● Physical Partition
● Available (A + P)
● Pseudo Partition (e.g. Unresponsive Node)
● Consistent or Available (C + P / A + P)


Flexibility


Flexibility

● Replicated Data
● Replication Queue
● State Transfer – Enable / Disabled
● Distributed Data
● Number of Owners
● Rehash – Enable / Disable
● Communication – Synchronous / Asynchronous
● Isolation – Read Committed / Repeatable Read
● Locking – Optimistic / Pessimistic


Caching and Data Grids for JEE

Caching Data Grids

JSR-107 JSR-347


Caching in Java

● Developers have been doing it forever
● To increase performance
● To offload legacy data-stores from unnecessary
requests
● Home-brew approach based on Hashtables and Maps
● Many Free and commercial libraries but...
● … no Standard !


JSR-107: Caching for JEE

● Local (single JVM) and Distributed (multiple JVMs)
caches
● CacheManager: a way to obtain caches
● Cache, “inspired” by the Map API with extensions for
entry expiration and additional atomic operations
● A Cache Lifecycle (starting, stopping)
● Entry Listeners for specific events
● Optional features: JTA support and annotations
● One of the oldest JSRs, dormant for a long time,
recently revived by JSR-347


And now ?

● Now that I've put a lot of data in my distributed cache,
what can I do with it ?
● And most importantly...
● HOW ?


Multiple clustering options

● Replication
● All nodes have all of the data.
● Grid Size == smallest node
● Distribution
● The Grid maintains n copies of each time of data on
different nodes
● Grid Size == total size / n


We like asynchronous

● So much that we want it in the API:
● Future<V> getAsync(K);
● Future<V> getAndPut(K, V);


Keeping things close together

● If I need to access semantically-close data quickly, why
not keep it on the same node ?
● Grouping API
● Distribution per-group and not per-key
● Via annotations
● Via a Grouper class


Eventual consistency

● One step further than asynchronous clustering for
higher performance
● Entries are tagged with a version (e.g. a timestamp or
a time-based UUID): newer versions will eventually
replace all older versions in the cluster
● Applications retrieving data may get an older entry,
which may be “good enough”


Big Data


Remote Query


Distributed Query


Performing parallel computation

● Distributed Executors
● Run on all nodes where a cache exists
● Each executor works on the slice of data local to itself
● Fastest access
● Parallelization of operations
● Usually returns


Map / Reduce

● A mapper function iterates through a set of key/values
transforming them and sending them to a collector

void map(KIn, VIn, Collector<KOut, Vout>)
● A reducer works through the collected values for each
key, returning a single value

VOut reduce(KOut, Iterator<VOut>)
● Finally a collator processes the reduced key/values
and returns a result to the invoker

R collate(Map<KOut, VOut> reducedResults)


Use Cases


Replicated Use Case

● Finance
● Master / Slave
● High Availability
● Failover
● Performance + Consistency
● Data – Lifespan
● Servers – Few
● Memory – Medium


Distributed Use Case #1

● Telecom / Media
● Performance > Consistency
● Data
● Infinite
● Calculated
● Servers – Few
● Memory – Large


Distributed Use Case #2

● Telecom
● Consistency > Performance
● Data
● Continuous
● Limited Lifespan
● Servers – Many
● Memory - Normal


Q&A

Look for a follow up on the howtojboss.com blog.


Thanks for joining us.


NoSQL, No sweat with JBoss Data Grid

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

NoSQL, No sweat with JBoss Data Grid