1. Scale over the limits:
an overview of modern distributed caching solutions
Davide Carnevali – Lorenzo Acerbo
Talk duration 50”
Talk slides 40# JUG Lugano – Lugano (CH), October 5th 2010
2. What, Why and When about distributed caching?
What
Fast and efficient memory to store frequently used data, shared by several machines.
Why
• leverage database load
• increment cluster tolerance to failures
• enable horizontal scalability
When
• several machines need the same data
• computation is spread throughout many individual nodes
Page 2
3. How does a local cache work?
Distinct JVMs have
their own cache, and
access database
independently
x x ?
x x ?
X(C1) == X(C2) == X(C3) Database x
Page 3
4. How does a replicated cache work?
Distinct JVMs have
their own cache, but
share the same data
x x x
x x x
C
Heap-based data is
X(C1) == X(C2) == X(C3) Database x accessible by any JVM
Page 4
5. How does a distributed cache work?
Distinct JVMs see
caches and data as
their own
x x x
x
C
Heap-based data is
accessible by any JVM
Database x
Page 5
6. Replicated VS Distributed
Replicated Mode
Pro Cons
Best choice for small clusters Does not scale in terms of memory
All data are in local memory Limited to the heap of a single JVM
(read high performance)
Distributed Mode
Pro Cons
Scale (almost) linearly Not all data is in local memory
No memory limit Higher network traffic
Resilience to server failure Performance lost on
serialization/deserialization
Page 6
7. Caching strategy – Write Behind with distributed mode
When a application puts (writes) the informations in the distributed cache can be used two different mechanisms for
syncronize the shared memory and write the data on resource (such as database).
The write behind strategy, also know as asynchronous strategy, means that updates to the cache store are done by a
separate thread to the client thread interacting with the cache.
put() store()
A
A A
JVM 1 JVM 3
Database
A
JVM 2 JVM 4
Page 7
8. Some JAVA fameworks
Many JAVA open source frameworks are available to create your distributed cache.
We discover some the solutions with major features and with a good open source communities.
Name Company / Community License Link
EHCache 2.x Terracotta Apache License 2.0 http://ehcache.org/
Infinispan 4.0 FINAL Jboss-Red Hat LGPL 2.1 http://jboss.org/infinispan
HazelCast 1.9 Hazelcast Apache License 2.0 http://www.hazelcast.com
MemCached [Server] BSD License http://memcached.org/
-
Xmemcached [Client] Apache License 2.0 http://code.google.com/p/xmemcached/
Terracotta Server Terracotta Commercial http://terracotta.org/
Page 8
9. EHCache Overview
• EHCache require JAVA 1.5 or 1.6 runtime
• Standards based of JSR 107 API
• Replicated caching via Jgroups TCP / IP, RMI or JMS
• Transactional support through JTA
• Dinamically Modifying Cache Configuration (at Runtime) : Cache Manager is a singleton.
• Fast integration with ORM such as Hibernate
Two interesting decorator Cache manager
• UnlockedReadsView
Normally a read lock must first be obtained to read data from backed. If there is an outstanding write lock, the read
lock queues up. This is done so that the happens before guarantee can be made.
If the business logic is happy to read stale data even if a write lock has been acquired in preparation for changing
it, then much higher speeds can be obtained.
• NonStopCache :
Provides SLA level control features for your cache. Automatically respond to cluster topology events to take a pre-
configured action.
• You're using a write-through cache and your DB hangs. Use your Non-stop cache decorator to keep it from
hanging your entire Application Server.
• You have one cache that is accessed for multiple functions. For some of those functions you want operations
to timeout after 5 seconds and for others you want 20 seconds. You can have multiple decorators on the
same cache with the different semantics defined.
Page 9
11. Beta
Enterprise EHCache BigMemory
The distributed cache with MemoryStore use many heap memory, and you need to increase the heap-
size but the “Garbage collection (GC) is like a ticking time bomb for Java”.
To minimize the garbage collection penalty, organizations often limit heap size to 2–4 GBs. This
constrains cache size, limiting the performance benefits that can be achieved by caching.
BigMemory use a “off-heap” Memory:
• Limited only by the amount of RAM
on your hardware and address space
• It’s very recommend 64 bit OS.
Off-heap data is stored in bytes, there are two implications:
• Only Serializable cache keys and values
can be placed in the store
• Serialization and deserialization take place on
putting and getting from the store. This means
that the off-heap store is slower in an absolute sense.
Page 11
12. Beta
Enterprise EHCache BigMemory
Configuration (with EHCache)
overflowToOffHeap : true | false for enables the off-heap memory
maxMemoryOffHeap : Sets the amount of off-heap memory available to the cache
maxElementsInMemory : Max elements in off-heap, reccomend at least 100 elements.
maxOffHeapValueSize : Max dimension of size (in MB) for each objects. Default is 4Mb
DoNotHaltOnCriticalAllocationDelay : If the memory use is dramatically overallocated for at least 3 seconds (1GB),
the application call a System.exit(1), with this properties you force to wait.
diskPersistent : true | false, for store in async way on Disk Store, for handling JVM shutdown.
diskSpoolBufferSizeMB : The max size of disk store buffer
… BigMemory is built-in with
Terracotta Server Arrays.
Page 12
13. Terracotta Scalability Plattform
Distributed Shared Objects
• www.terracotta.org
• Open source, by Terracotta Inc.
• Terracotta acquired EHCache and Quartz and provides integration plugins
Page 13
14. Terracotta Scalability Plattform
How Terracotta works
• Data distribution and synchronization, through Terracotta server arrays
• Bytecode waving to leverage application code from distribution/synchronization
details
• Client/Server approach: clients are applications modified with use of AOP, Server
maintain application state, with redundancy for fault tolerance
• Application state resides in Terracotta Server
• Application use object data as local in-memory, TC replicates changes
Page 14
15. Terracotta Scalability Plattform
Terracotta standard application layout
AS 1 AS 2 AS 3
Business Business Business
Logic Logic Logic
TC Libraries TC Libraries TC Libraries
Database TC Server
TC Server
Backup
Page 15
16. Terracotta Scalability Plattform
JVM 1 JVM 2
Heap Heap
Business Logic Business Logic
3 3
1 1
2 2
1 1
2 … 2 …
4 7 4 7
7 7
… …
3 4 3 4
5 5 5 5
6 6
… …
6 6
TC libraries … TC libraries …
Terracotta replicates delta modification
client make transmitted to
modifications to other 5
modification clients other nodes
to object
TC Server
Page 16
17. Infinispan - Advanced Datagrid Platform (1/8)
• www.infinispan.org
• Open source, based on JBoss Cache
• Current version is 4.1.0 FINAL
• LGPL License
• Developed and supported by RedHat and JBoss community
• Entirely written in Java, works on Java 6 machines
Page 17
18. Infinispan - Advanced Datagrid Platform (2/8)
What does Infinispan provide?
• Cache memory for distributed environments
• Supports three modes: Local, Replicated and Distributed
• Network communication based on JGroups
a) TCP / UDP network protocol
b) Multicast / Unicast
c) Auto discovery of cluster members
d) State recovery upon cluster partitioning
• Transaction support (JTA compliant)
• Eviction algorithms to control memory usage
• Persisting state to configurable cache stores
Page 18
19. Infinispan - Advanced Datagrid Platform (3/8)
Abstraction: distributed key/value Map
Same interface, same semantics
Map<String, Object> myMap = new HashMap<String, Object>();
myMap.put("id01", myObject);
MyObject x = (MyObject) myMap.get("id01");
assert myMap.size() == 1;
CacheManager cacheManager = new DefaultCacheManager();
Cache cache = cacheManager.getDefaultCache();
cache.put("id01", myObject);
MyObject x = (MyObject) cache.get("id01");
assert cache.size() == 1;
Easy to extend features and capabilities to existent code
Page 19
20. Infinispan - Advanced Datagrid Platform (4/8)
Data is saved in backup copies for fault tolerance
Infinispan node 1
DS1 ds3 Infinispan node 3
DS3 ds2
Infinispan node 2
DS2 ds1
Page 20
21. Infinispan - Advanced Datagrid Platform (5/8)
Common issues in a distributed environment
• Where to put objects in a cluster? Is there a “right node”?
• Uniform distribution across al nodes
• Where to look for a key? Multicast? Metadata? Routing?
• What if a node crashes?
• What if a new node joins the cluster?
Consistent Hashing
• A deterministic algorithm that maps keys to nodes
• No look up, no multicast, no waste of network traffic
• Based on distance, all keys maps to the nearest bucket
• Cluster changes involve only a small subset of data transfer
• When a node joins, it takes some of its neighborhood keys
• When a node leaves its data is spread among its neighborhoods
Page 21
22. Infinispan - Advanced Datagrid Platform (6/8)
Chord: a consistent hashing algorithm
e e e e
d d d d
a a a a
c c c c
b b b b
Hash function maps data to an integer 0…N
Keys and nodes can be seen as points on the edge of a circle
A key belongs to the clockwise next node
When a new node enters, part of the keys are reassigned
When a node crashes its data goes to the next-in-line node
Page 22
23. Infinispan - Advanced Datagrid Platform (7/8)
Peer-to-Peer vs Client/Server
User User
C++ Java
App App
Load balancer
Client Client
App App App
Server Server
Embedded Embedded Embedded Infinispan Infinispan
Infinispan Infinispan Infinispan
App Server 1 App Server 2 App Server 3 App Server 1 App Server 2
Clustered Clustered
Page 23
24. Infinispan - Advanced Datagrid Platform (8/8)
Protocol comparison
Protocol Type Client Availability Clustered Smart Load Balancing / Fail
Routing over
REST Text Tons Yes No Any Http Load
Balancer
Memcached
In server mode support Yes
Text Tons
different No Only with
protocols, can be used with different predefined list of
(non-JVM) languages server
Hot Rod Binary Right now, Yes Yes Yes, dynamic via
only Java Hot Rod client
Web Socket Text Javascript only Yes No Any Http Load
Balancer
Taken from http://community.jboss.org/wiki/InfinispanServerModules
Page 24
25. Hazelcast - In-memory datagrid computing (1/4)
• www.hazelcast.com
• Open source, by Hazel Bilisim Ltd
• Current version is 1.9
• Apache License 2.0
• Hosted on google code @ http://code.google.com/p/hazelcast/
• Entirely written in Java, works on Java 6 machines
Page 25
26. Hazelcast - In-memory datagrid computing (2/4)
What does Hazelcast provide?
• Specifically targeted for distributed environment, works only in distributed mode
• Distributed API for Lists, Queues, (Multi)Maps, Sets
• Not only data: Locks, Tasks execution, Events and Messages
• Ad hoc network communication with auto discovery and monitoring tools
• Easy configuration, easy to use. Hides major aspects of distribution
• Configurable number of backups for replication and fault-tolerance
• Concurrency, Transaction support, state persistency
• Peer to peer communication only, with super clients (nodes without data)
Page 26
27. Hazelcast - In-memory datagrid computing (3/4)
More than data = distributed computing
Distributed Object Queries
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.Imap;
import com.hazelcast.query.SqlPredicate;
import java.util.Collection;
IMap map = Hazelcast.getMap("employees");
map.addIndex("active", false);
map.addIndex("name", false);
map.addIndex("age", true);
Collection<Employee> employees = map.values(new SqlPredicate("active AND age <= 30"));
Objects that satisfy criteria are efficiently fetched from the cluster and return by reference
in a collection
Page 27
28. Hazelcast - In-memory datagrid computing (4/4)
More than data = distributed computing
Distributed Task Execution
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.FutureTask;
import com.hazelcast.core.DistributedTask;
import com.hazelcast.core.Hazelcast;
FutureTask<String> task = new DistributedTask<String>(new Callable<String>() {
@Override
public String call() throws Exception {
String result = null;
// Do something useful here
return result;
}
});
ExecutorService executorService = Hazelcast.getExecutorService();
executorService.execute(task);
String result = task.get();
Execution is sent over the cluster
Page 28
29. Memcached Overview
• http://memcached.org/
• Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects)
• Memcached server is wrote in C/C++, and It’s a daemon service installable on different O.S.
• Its API is available for most popular languages.
• It’s designed to take advantage of free memory.
Java Client xmemcached : http://code.google.com/p/xmemcached/
Page 29
30. Memcached & Xmemcached Java client
Telnet Interface Java Client Interface (xmemcached)
> telnet localhost 11211 MemcachedClientBuilder builder = new
XMemcachedClientBuilder(
> set key 0 900 13 AddrUtil.getAddresses("localhost:11211"));
> data_to_store
MemcachedClient memcachedClient =
STORED
builder.build();
> get key memcachedClient.add("key", 0,”Hello”);
VALUE key 0 13
data_to_store memcachedClient.get("key");
END
memcachedClient.shutdown();
• Supports connection pool.You can create more connections to one memcached server with java.nio.*
• Dynamically add/remove server
• Data compression (because Memcached is inefficient when you store large data)
• Fast Integration with Hibernate-memcached
Page 30
31. Memcached – Xmemcached Example
Weighted Server
MemcachedClientBuilder builder = new
XMemcachedClientBuilder(AddrUtil.getAddresses("localhost:12000 localhost:12001"),
new int[]{1,3});
MemcachedClient memcachedClient=builder.build();
You can change the weight dynamically through JMX
public interface XMemcachedClientMBean {
public void setServerWeight(String server, int weight);
}
XMemcached can adjust weight of node for balance the load of memcached server, the weight is more high,
the memcached server will store more data, and receive more load.
Use counter to increment / decrement
...
MemcachedClient memcachedClient=builder.build();
Counter counter= memcachedClient.getCounter("counter",0);
counter.incrementAndGet();
counter.decrementAndGet();
counter.addAndGet(-10);
You can use MemcachedClient's incr/decr methods to increase or decrease counter,but xmemcached has a
counter which encapsulate the incr/decr methods,you can use counter just like AtomicLong:
Page 31
32. Memcached & MySQL (multiple Mechached servers and a stand-AloneMySQLServers)
MySQL Enterprise included memcached for a cluster configurations.
There are various ways to design scalable architectures using memcached and MySQL
Page 32
33. Memcached & MySQL (Multiple Memcached Servers with a Master and multiple Slave MySQL)
This architecture allows for scaling a read intensive application.
Page 33
34. Memcached & MySQL (Sharding, multiple Memcached Servers with a Master and multiple Slave MySQL)
With sharding (application partitioning) we partition data across multiple physical servers to gain read and
write scalability.
Page 34
35. Performance Reports
Items 100.000 Objects : String name, String surname, int age, String description
Configuration 2 node on same physical machine (localhost)
Hardware Intel Core2 Duo CPU - P8400 @2.26 Ghz – 3.48 GB Ram
35
30
25
20 Put()
15 Get()
10
5
0
EHCache 2.2 Infinispan HazelCast Xmemcached
(*) Before make a choose is really important execute realistic tests in your environment, with different cluster
Page 35 size, network bandwidth, different concurrent access and application type:
web application e-commerce like, data intensive processing, near real time apps…
36. CAP Theorem and Cache coherency
It states, that though its desirable to have Consistency, High-Availability and Partition-tolerance in every
system, unfortunately no system can achieve all three at the same time.
This is true also for Distributed cache solutions.
Consistency : all nodes see the same data at the same time.
Availability : node failures do not prevent survivors from continuing to operate.
Partition Tolerance : Risk of data partition; the system continues to operate despite arbitrary message loss.
Read and Write Through / Sync Mode:
Cache Mode => Replicated Distributed
Consistency Yes Yes
Availability Yes No
Partition Torelance No Yes
Write behind / ASync Mode:
Cache Mode => Replicated Distributed
Consistency No No
Availability Yes No
Partition Torelance No Yes
Page 36
37. Conclusion
• Distributed cache are interesting technology, but come at a cost
• There is no "perfect solution", every choice must be evaluated
• Work greatly on mostly read access data
• In-memory state more difficult to monitor than traditional solutions
• Replication best fit on small size cluster
• Big environment needs actual data distribution
Page 37
38. Reference 1/2
Official documentation
• EHCache & Terracotta : http://ehcache.org/documentation/index.html
• Terracotta : http://www.terracotta.org/platform/
• Infinispan : http://jboss.org/infinispan/docs.html
• HazelCast : http://www.hazelcast.com/documentation.jsp
• Memcached : http://memcached.org/
• Xmemcached : http://code.google.com/p/xmemcached/
Articles & Blogs
• Intro to Caching,Caching algorithms and caching frameworks part 4 :
http://javalandscape.blogspot.com/2009/03/intro-to-cachingcaching-algorithms-and.html
• Comparison of the Grid/Cloud Computing Frameworks (Hadoop, GridGain, Hazelcast, DAC) - Part II
http://java.dzone.com/articles/comparison-gridcloud-computing-0
• Brewers CAP Theorem on distributed systems
http://www.hazelcast.com/documentation.jsp
• Ehcache - A Java Distributed Cache
http://highscalability.com/ehcache-java-distributed-cache
• A Matter of Scale: The CAP Theorem and Memory Models
http://coverclock.blogspot.com/2010/05/matter-of-scale-cap-theorem-and-memory.html
• Consistent Hashing
http://www.lexemetech.com/2007/11/consistent-hashing.html
• A Couple Minutes With Non-Stop Ehcache
http://dsoguy.blogspot.com/2010/05/couple-minutes-with-non-stop-ehcache_07.html
• Designing and Implementing Scalable Applications with Memcached and MySQL
http://www.mysql.com/why-mysql/white-papers/mysql_wp_memcached.php
Page 38
39. Reference 2/2
Presentations
• Shopzilla On Concurrency : http://www.slideshare.net/WillGage/shopzilla-on-concurrency-3872625
• Scaling your cache : http://www.slideshare.net/alexmiller/scaling-your-cache
• Caching in Distributed Enviroment : http://www.slideshare.net/abhigad/7564192
• Infinispan by Jteam : http://www.jteam.nl/specials/techtalks/011110/attachment/Infinispan.pdf
Wikipedia
• Brewer’s theorem : http://en.wikipedia.org/wiki/CAP_theorem
• Performance tuning : http://en.wikipedia.org/wiki/Performance_tuning
Page 39
40. Q&A and Thanks
Davide Carnevali Lorenzo Acerbo
Email : davide.carnevali at gmail.com Email : lorenzo.acerbo at gmail.com
Skype : davide.carnevali Skype : lorenzo.acerbo
Page 40