Jug Lugano - Scale over the limits

Scale over the limits:
an overview of modern distributed caching solutions
Davide Carnevali – Lorenzo Acerbo

Talk duration 50”
Talk slides 40# JUG Lugano – Lugano (CH), October 5th 2010

What, Why and When about distributed caching?

What

Fast and efficient memory to store frequently used data, shared by several machines.

Why

• leverage database load

• increment cluster tolerance to failures

• enable horizontal scalability

When

• several machines need the same data

• computation is spread throughout many individual nodes

Page  2

How does a local cache work?

Distinct JVMs have
their own cache, and
access database
independently

x x ?

x x ?

X(C1) == X(C2) == X(C3) Database x
Page  3

How does a replicated cache work?

Distinct JVMs have
their own cache, but
share the same data

x x x

x x x

C

Heap-based data is
X(C1) == X(C2) == X(C3) Database x accessible by any JVM

Page  4

How does a distributed cache work?

Distinct JVMs see
caches and data as
their own

x x x

x

C

Heap-based data is
accessible by any JVM
Database x
Page  5

Replicated VS Distributed

Replicated Mode
Pro Cons

 Best choice for small clusters  Does not scale in terms of memory

 All data are in local memory  Limited to the heap of a single JVM
(read high performance)

Distributed Mode
Pro Cons

 Scale (almost) linearly  Not all data is in local memory

 No memory limit  Higher network traffic

 Resilience to server failure  Performance lost on
serialization/deserialization

Page  6

Caching strategy – Write Behind with distributed mode

When a application puts (writes) the informations in the distributed cache can be used two different mechanisms for
syncronize the shared memory and write the data on resource (such as database).

The write behind strategy, also know as asynchronous strategy, means that updates to the cache store are done by a
separate thread to the client thread interacting with the cache.

put() store()
A

A A

JVM 1 JVM 3
Database

A
JVM 2 JVM 4

Page  7

Some JAVA fameworks

Many JAVA open source frameworks are available to create your distributed cache.
We discover some the solutions with major features and with a good open source communities.

Name Company / Community License Link

EHCache 2.x Terracotta Apache License 2.0 http://ehcache.org/

Infinispan 4.0 FINAL Jboss-Red Hat LGPL 2.1 http://jboss.org/infinispan

HazelCast 1.9 Hazelcast Apache License 2.0 http://www.hazelcast.com

MemCached [Server] BSD License http://memcached.org/
-
Xmemcached [Client] Apache License 2.0 http://code.google.com/p/xmemcached/

Terracotta Server Terracotta Commercial http://terracotta.org/

Page  8

EHCache Overview

• EHCache require JAVA 1.5 or 1.6 runtime

• Standards based of JSR 107 API

• Replicated caching via Jgroups TCP / IP, RMI or JMS

• Transactional support through JTA

• Dinamically Modifying Cache Configuration (at Runtime) : Cache Manager is a singleton.

• Fast integration with ORM such as Hibernate

Two interesting decorator Cache manager
• UnlockedReadsView
Normally a read lock must first be obtained to read data from backed. If there is an outstanding write lock, the read
lock queues up. This is done so that the happens before guarantee can be made.
If the business logic is happy to read stale data even if a write lock has been acquired in preparation for changing
it, then much higher speeds can be obtained.

• NonStopCache :
Provides SLA level control features for your cache. Automatically respond to cluster topology events to take a pre-
configured action.
• You're using a write-through cache and your DB hangs. Use your Non-stop cache decorator to keep it from
hanging your entire Application Server.
• You have one cache that is accessed for multiple functions. For some of those functions you want operations
to timeout after 5 seconds and for others you want 20 seconds. You can have multiple decorators on the
same cache with the different semantics defined.
Page  9

EHCache Configuration sample

<ehcache >

<cacheManagerPeerProviderFactory
class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"
properties="connect=TCP(bind_port=7800;start_port=7800):
TCPPING(initial_hosts=localhost[7800];port_range=10;timeout=3000;
num_initial_members=3;up_thread=true;down_thread=true):
VERIFY_SUSPECT(timeout=1500;down_thread=false;up_thread=false):
pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000):
pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false;
print_local_addr=false;down_thread=true;up_thread=true)"
propertySeparator="::" />

<cache name="myCustomCache_EhCache"
maxElementsInMemory="1000000"
eternal="true"
overflowToDisk="false">
<cacheEventListenerFactory class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory"
properties="replicateAsynchronously=false,replicatePuts=true,replicateUpdates=true,
replicateUpdatesViaCopy=true,replicateRemovals=true"
/>
<bootstrapCacheLoaderFactory
class="net.sf.ehcache.distribution.jgroups.JGroupsBootstrapCacheLoaderFactory"/>
</cache>
</ehcache>

UnlockedReadsView NonStopCache
<cacheDecoratorFactory <cacheDecoratorFactory
class="net.sf.ehcache.constructs.unlockedreadsview. class="net.sf.ehcache.constructs.nonstop.NonStopCacheDeco
UnlockedReadsViewDecoratorFactory" ratorFactory"properties="name=nonStopCacheName,
properties="name=unlockedReadsViewOne" /> timeoutMillis=3000, timeoutBehavior= exception | noop |
localReads

Page  10

Beta
Enterprise EHCache BigMemory

The distributed cache with MemoryStore use many heap memory, and you need to increase the heap-
size but the “Garbage collection (GC) is like a ticking time bomb for Java”.

To minimize the garbage collection penalty, organizations often limit heap size to 2–4 GBs. This
constrains cache size, limiting the performance benefits that can be achieved by caching.

BigMemory use a “off-heap” Memory:
• Limited only by the amount of RAM
on your hardware and address space

• It’s very recommend 64 bit OS.

Off-heap data is stored in bytes, there are two implications:
• Only Serializable cache keys and values
can be placed in the store

• Serialization and deserialization take place on
putting and getting from the store. This means
that the off-heap store is slower in an absolute sense.

Page  11

Beta
Enterprise EHCache BigMemory

Configuration (with EHCache)

overflowToOffHeap : true | false for enables the off-heap memory
maxMemoryOffHeap : Sets the amount of off-heap memory available to the cache
maxElementsInMemory : Max elements in off-heap, reccomend at least 100 elements.
maxOffHeapValueSize : Max dimension of size (in MB) for each objects. Default is 4Mb
DoNotHaltOnCriticalAllocationDelay : If the memory use is dramatically overallocated for at least 3 seconds (1GB),
the application call a System.exit(1), with this properties you force to wait.
diskPersistent : true | false, for store in async way on Disk Store, for handling JVM shutdown.
diskSpoolBufferSizeMB : The max size of disk store buffer

… BigMemory is built-in with
Terracotta Server Arrays.

Page  12

Terracotta Scalability Plattform

Distributed Shared Objects

• www.terracotta.org

• Open source, by Terracotta Inc.

• Terracotta acquired EHCache and Quartz and provides integration plugins

Page  13


How Terracotta works

• Data distribution and synchronization, through Terracotta server arrays

• Bytecode waving to leverage application code from distribution/synchronization
details

• Client/Server approach: clients are applications modified with use of AOP, Server
maintain application state, with redundancy for fault tolerance

• Application state resides in Terracotta Server

• Application use object data as local in-memory, TC replicates changes

Page  14


Terracotta standard application layout

AS 1 AS 2 AS 3
Business Business Business
Logic Logic Logic

TC Libraries TC Libraries TC Libraries

Database TC Server
TC Server
Backup

Page  15


JVM 1 JVM 2

Heap Heap
Business Logic Business Logic

3 3
1 1
2 2
1 1
2 … 2 …
4 7 4 7
7 7
… …
3 4 3 4
5 5 5 5
6 6
… …
6 6
TC libraries … TC libraries …

Terracotta replicates delta modification
client make transmitted to
modifications to other 5
modification clients other nodes
to object
TC Server

Page  16

Infinispan - Advanced Datagrid Platform (1/8)

• www.infinispan.org

• Open source, based on JBoss Cache

• Current version is 4.1.0 FINAL
• LGPL License

• Developed and supported by RedHat and JBoss community

• Entirely written in Java, works on Java 6 machines

Page  17


What does Infinispan provide?

• Cache memory for distributed environments

• Supports three modes: Local, Replicated and Distributed

• Network communication based on JGroups
a) TCP / UDP network protocol
b) Multicast / Unicast
c) Auto discovery of cluster members
d) State recovery upon cluster partitioning

• Transaction support (JTA compliant)

• Eviction algorithms to control memory usage

• Persisting state to configurable cache stores

Page  18


Abstraction: distributed key/value Map
Same interface, same semantics

Map<String, Object> myMap = new HashMap<String, Object>();
myMap.put("id01", myObject);
MyObject x = (MyObject) myMap.get("id01");
assert myMap.size() == 1;

CacheManager cacheManager = new DefaultCacheManager();
Cache cache = cacheManager.getDefaultCache();
cache.put("id01", myObject);
MyObject x = (MyObject) cache.get("id01");
assert cache.size() == 1;

Easy to extend features and capabilities to existent code

Page  19


Data is saved in backup copies for fault tolerance

Infinispan node 1

DS1 ds3 Infinispan node 3

DS3 ds2

Infinispan node 2

DS2 ds1

Page  20


Common issues in a distributed environment
• Where to put objects in a cluster? Is there a “right node”?

• Uniform distribution across al nodes

• Where to look for a key? Multicast? Metadata? Routing?

• What if a node crashes?

• What if a new node joins the cluster?
Consistent Hashing
• A deterministic algorithm that maps keys to nodes

• No look up, no multicast, no waste of network traffic

• Based on distance, all keys maps to the nearest bucket

• Cluster changes involve only a small subset of data transfer

• When a node joins, it takes some of its neighborhood keys

• When a node leaves its data is spread among its neighborhoods
Page  21


Chord: a consistent hashing algorithm
e e e e
d d d d
a a a a

c c c c
b b b b

Hash function maps data to an integer 0…N
Keys and nodes can be seen as points on the edge of a circle

A key belongs to the clockwise next node
When a new node enters, part of the keys are reassigned
When a node crashes its data goes to the next-in-line node

Page  22


Peer-to-Peer vs Client/Server

User User

C++ Java
App App
Load balancer
Client Client

App App App

Server Server

Embedded Embedded Embedded Infinispan Infinispan
Infinispan Infinispan Infinispan

App Server 1 App Server 2 App Server 3 App Server 1 App Server 2

Clustered Clustered
Page  23


Protocol comparison

Protocol Type Client Availability Clustered Smart Load Balancing / Fail
Routing over
REST Text Tons Yes No Any Http Load
Balancer
Memcached
In server mode support Yes
Text Tons
different No Only with
protocols, can be used with different predefined list of
(non-JVM) languages server
Hot Rod Binary Right now, Yes Yes Yes, dynamic via
only Java Hot Rod client
Web Socket Text Javascript only Yes No Any Http Load
Balancer

Taken from http://community.jboss.org/wiki/InfinispanServerModules

Page  24

Hazelcast - In-memory datagrid computing (1/4)

• www.hazelcast.com

• Open source, by Hazel Bilisim Ltd

• Current version is 1.9

• Apache License 2.0

• Hosted on google code @ http://code.google.com/p/hazelcast/

• Entirely written in Java, works on Java 6 machines

Page  25


What does Hazelcast provide?

• Specifically targeted for distributed environment, works only in distributed mode

• Distributed API for Lists, Queues, (Multi)Maps, Sets

• Not only data: Locks, Tasks execution, Events and Messages

• Ad hoc network communication with auto discovery and monitoring tools

• Easy configuration, easy to use. Hides major aspects of distribution

• Configurable number of backups for replication and fault-tolerance

• Concurrency, Transaction support, state persistency

• Peer to peer communication only, with super clients (nodes without data)

Page  26


More than data = distributed computing

Distributed Object Queries

import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.Imap;
import com.hazelcast.query.SqlPredicate;
import java.util.Collection;

IMap map = Hazelcast.getMap("employees");
map.addIndex("active", false);
map.addIndex("name", false);
map.addIndex("age", true);

Collection<Employee> employees = map.values(new SqlPredicate("active AND age <= 30"));

Objects that satisfy criteria are efficiently fetched from the cluster and return by reference
in a collection

Page  27


More than data = distributed computing

Distributed Task Execution
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.FutureTask;
import com.hazelcast.core.DistributedTask;
import com.hazelcast.core.Hazelcast;

FutureTask<String> task = new DistributedTask<String>(new Callable<String>() {
@Override
public String call() throws Exception {
String result = null;
// Do something useful here
return result;
}
});

ExecutorService executorService = Hazelcast.getExecutorService();
executorService.execute(task);
String result = task.get();

Execution is sent over the cluster

Page  28

Memcached Overview

• http://memcached.org/

• Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects)

• Memcached server is wrote in C/C++, and It’s a daemon service installable on different O.S.

• Its API is available for most popular languages.

• It’s designed to take advantage of free memory.

Java Client xmemcached : http://code.google.com/p/xmemcached/

Page  29

Memcached & Xmemcached Java client

Telnet Interface Java Client Interface (xmemcached)

> telnet localhost 11211 MemcachedClientBuilder builder = new
XMemcachedClientBuilder(
> set key 0 900 13 AddrUtil.getAddresses("localhost:11211"));
> data_to_store
MemcachedClient memcachedClient =
STORED
builder.build();

> get key memcachedClient.add("key", 0,”Hello”);
VALUE key 0 13
data_to_store memcachedClient.get("key");
END
memcachedClient.shutdown();

• Supports connection pool.You can create more connections to one memcached server with java.nio.*

• Dynamically add/remove server

• Data compression (because Memcached is inefficient when you store large data)

• Fast Integration with Hibernate-memcached

Page  30

Memcached – Xmemcached Example

Weighted Server
MemcachedClientBuilder builder = new
XMemcachedClientBuilder(AddrUtil.getAddresses("localhost:12000 localhost:12001"),
new int[]{1,3});
MemcachedClient memcachedClient=builder.build();

You can change the weight dynamically through JMX
public interface XMemcachedClientMBean {
public void setServerWeight(String server, int weight);
}

XMemcached can adjust weight of node for balance the load of memcached server, the weight is more high,
the memcached server will store more data, and receive more load.

Use counter to increment / decrement
...
MemcachedClient memcachedClient=builder.build();

Counter counter= memcachedClient.getCounter("counter",0);
counter.incrementAndGet();
counter.decrementAndGet();
counter.addAndGet(-10);

You can use MemcachedClient's incr/decr methods to increase or decrease counter,but xmemcached has a
counter which encapsulate the incr/decr methods,you can use counter just like AtomicLong:
Page  31

Memcached & MySQL (multiple Mechached servers and a stand-AloneMySQLServers)

MySQL Enterprise included memcached for a cluster configurations.
There are various ways to design scalable architectures using memcached and MySQL

Page  32

Memcached & MySQL (Multiple Memcached Servers with a Master and multiple Slave MySQL)

This architecture allows for scaling a read intensive application.

Page  33

Memcached & MySQL (Sharding, multiple Memcached Servers with a Master and multiple Slave MySQL)

With sharding (application partitioning) we partition data across multiple physical servers to gain read and
write scalability.

Page  34

Performance Reports

Items 100.000 Objects : String name, String surname, int age, String description

Configuration 2 node on same physical machine (localhost)

Hardware Intel Core2 Duo CPU - P8400 @2.26 Ghz – 3.48 GB Ram

35
30
25
20 Put()
15 Get()
10
5
0
EHCache 2.2 Infinispan HazelCast Xmemcached

(*) Before make a choose is really important execute realistic tests in your environment, with different cluster
Page  35 size, network bandwidth, different concurrent access and application type:
web application e-commerce like, data intensive processing, near real time apps…

CAP Theorem and Cache coherency

It states, that though its desirable to have Consistency, High-Availability and Partition-tolerance in every
system, unfortunately no system can achieve all three at the same time.

This is true also for Distributed cache solutions.

Consistency : all nodes see the same data at the same time.
Availability : node failures do not prevent survivors from continuing to operate.
Partition Tolerance : Risk of data partition; the system continues to operate despite arbitrary message loss.

Read and Write Through / Sync Mode:

Cache Mode => Replicated Distributed
Consistency Yes Yes

Availability Yes No

Partition Torelance No Yes

Write behind / ASync Mode:
Cache Mode => Replicated Distributed
Consistency No No

Availability Yes No

Partition Torelance No Yes

Page  36

Conclusion

• Distributed cache are interesting technology, but come at a cost

• There is no "perfect solution", every choice must be evaluated

• Work greatly on mostly read access data

• In-memory state more difficult to monitor than traditional solutions

• Replication best fit on small size cluster

• Big environment needs actual data distribution

Page  37

Reference 1/2

Official documentation
• EHCache & Terracotta : http://ehcache.org/documentation/index.html
• Terracotta : http://www.terracotta.org/platform/
• Infinispan : http://jboss.org/infinispan/docs.html
• HazelCast : http://www.hazelcast.com/documentation.jsp
• Memcached : http://memcached.org/
• Xmemcached : http://code.google.com/p/xmemcached/

Articles & Blogs
• Intro to Caching,Caching algorithms and caching frameworks part 4 :
http://javalandscape.blogspot.com/2009/03/intro-to-cachingcaching-algorithms-and.html
• Comparison of the Grid/Cloud Computing Frameworks (Hadoop, GridGain, Hazelcast, DAC) - Part II
http://java.dzone.com/articles/comparison-gridcloud-computing-0
• Brewers CAP Theorem on distributed systems
http://www.hazelcast.com/documentation.jsp
• Ehcache - A Java Distributed Cache
http://highscalability.com/ehcache-java-distributed-cache
• A Matter of Scale: The CAP Theorem and Memory Models
http://coverclock.blogspot.com/2010/05/matter-of-scale-cap-theorem-and-memory.html
• Consistent Hashing
http://www.lexemetech.com/2007/11/consistent-hashing.html
• A Couple Minutes With Non-Stop Ehcache
http://dsoguy.blogspot.com/2010/05/couple-minutes-with-non-stop-ehcache_07.html
• Designing and Implementing Scalable Applications with Memcached and MySQL
http://www.mysql.com/why-mysql/white-papers/mysql_wp_memcached.php

Page  38

Reference 2/2

Presentations
• Shopzilla On Concurrency : http://www.slideshare.net/WillGage/shopzilla-on-concurrency-3872625
• Scaling your cache : http://www.slideshare.net/alexmiller/scaling-your-cache
• Caching in Distributed Enviroment : http://www.slideshare.net/abhigad/7564192
• Infinispan by Jteam : http://www.jteam.nl/specials/techtalks/011110/attachment/Infinispan.pdf

Wikipedia
• Brewer’s theorem : http://en.wikipedia.org/wiki/CAP_theorem
• Performance tuning : http://en.wikipedia.org/wiki/Performance_tuning

Page  39

Q&A and Thanks

Davide Carnevali Lorenzo Acerbo

Email : davide.carnevali at gmail.com Email : lorenzo.acerbo at gmail.com
Skype : davide.carnevali Skype : lorenzo.acerbo

Page  40

Jug Lugano - Scale over the limits

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Jug Lugano - Scale over the limits

Similaire à Jug Lugano - Scale over the limits (20)

Jug Lugano - Scale over the limits