SlideShare une entreprise Scribd logo
1  sur  40
Scale over the limits:
an overview of modern distributed caching solutions
Davide Carnevali – Lorenzo Acerbo



Talk duration   50”
Talk slides     40#                 JUG Lugano – Lugano (CH), October 5th 2010
What, Why and When about distributed caching?



  What

  Fast and efficient memory to store frequently used data, shared by several machines.

  Why

  • leverage database load

  • increment cluster tolerance to failures

  • enable horizontal scalability

  When

  • several machines need the same data

  • computation is spread throughout many individual nodes




Page  2
How does a local cache work?



Distinct JVMs have
their own cache, and
access database
independently




                           x        x         ?




                           x         x        ?




 X(C1) == X(C2) == X(C3)       Database   x
Page  3
How does a replicated cache work?




Distinct JVMs have
their own cache, but
share the same data




                           x            x        x




                           x            x        x

                                    C




                                                 Heap-based data is
 X(C1) == X(C2) == X(C3)        Database    x   accessible by any JVM

Page  4
How does a distributed cache work?




Distinct JVMs see
caches and data as
their own




                     x                   x        x




                                         x

                                     C




                                                  Heap-based data is
                                                 accessible by any JVM
                                Database     x
 Page  5
Replicated VS Distributed



                                            Replicated Mode
                            Pro                                          Cons

        Best choice for small clusters                   Does not scale in terms of memory

        All data are in local memory                     Limited to the heap of a single JVM
         (read high performance)




                                            Distributed Mode
                            Pro                                          Cons

            Scale (almost) linearly                      Not all data is in local memory

            No memory limit                              Higher network traffic

            Resilience to server failure                 Performance lost on
                                                           serialization/deserialization

Page  6
Caching strategy – Write Behind with distributed mode



When a application puts (writes) the informations in the distributed cache can be used two different mechanisms for
syncronize the shared memory and write the data on resource (such as database).

The write behind strategy, also know as asynchronous strategy, means that updates to the cache store are done by a
separate thread to the client thread interacting with the cache.


                          put()                                                             store()
                   A


                                    A                                     A



           JVM 1                                         JVM 3
                                                                                                                Database




                                                                          A
           JVM 2                                         JVM 4

Page  7
Some JAVA fameworks



Many JAVA open source frameworks are available to create your distributed cache.
We discover some the solutions with major features and with a good open source communities.


           Name              Company / Community        License                            Link

 EHCache 2.x             Terracotta                Apache License 2.0   http://ehcache.org/

 Infinispan 4.0 FINAL    Jboss-Red Hat             LGPL 2.1             http://jboss.org/infinispan

 HazelCast 1.9           Hazelcast                 Apache License 2.0   http://www.hazelcast.com

 MemCached [Server]                                BSD License          http://memcached.org/
                         -
 Xmemcached [Client]                               Apache License 2.0   http://code.google.com/p/xmemcached/

 Terracotta Server       Terracotta                Commercial           http://terracotta.org/




Page  8
EHCache Overview



  •    EHCache require JAVA 1.5 or 1.6 runtime

  •    Standards based of JSR 107 API

  •    Replicated caching via Jgroups TCP / IP, RMI or JMS

  •    Transactional support through JTA

  •    Dinamically Modifying Cache Configuration (at Runtime) : Cache Manager is a singleton.

  •    Fast integration with ORM such as Hibernate


  Two interesting decorator Cache manager
  •        UnlockedReadsView
           Normally a read lock must first be obtained to read data from backed. If there is an outstanding write lock, the read
           lock queues up. This is done so that the happens before guarantee can be made.
           If the business logic is happy to read stale data even if a write lock has been acquired in preparation for changing
           it, then much higher speeds can be obtained.

  •        NonStopCache                                                                                                      :
           Provides SLA level control features for your cache. Automatically respond to cluster topology events to take a pre-
           configured action.
            •     You're using a write-through cache and your DB hangs. Use your Non-stop cache decorator to keep it from
                  hanging your entire Application Server.
            •     You have one cache that is accessed for multiple functions. For some of those functions you want operations
                  to timeout after 5 seconds and for others you want 20 seconds. You can have multiple decorators on the
                  same cache with the different semantics defined.
Page  9
EHCache Configuration sample



<ehcache >
       <!-- JGroupsCacheReplicatorFactory -->
        <cacheManagerPeerProviderFactory
           class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"
           properties="connect=TCP(bind_port=7800;start_port=7800):
               TCPPING(initial_hosts=localhost[7800];port_range=10;timeout=3000;
               num_initial_members=3;up_thread=true;down_thread=true):
               VERIFY_SUSPECT(timeout=1500;down_thread=false;up_thread=false):
               pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000):
               pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false;
               print_local_addr=false;down_thread=true;up_thread=true)"
           propertySeparator="::" />

       <cache name="myCustomCache_EhCache"
              maxElementsInMemory="1000000"
              eternal="true"
              overflowToDisk="false">
              <cacheEventListenerFactory class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory"
              properties="replicateAsynchronously=false,replicatePuts=true,replicateUpdates=true,
              replicateUpdatesViaCopy=true,replicateRemovals=true"
              />
              <bootstrapCacheLoaderFactory
              class="net.sf.ehcache.distribution.jgroups.JGroupsBootstrapCacheLoaderFactory"/>
       </cache>
</ehcache>



UnlockedReadsView                                              NonStopCache
<cacheDecoratorFactory                                         <cacheDecoratorFactory
class="net.sf.ehcache.constructs.unlockedreadsview.            class="net.sf.ehcache.constructs.nonstop.NonStopCacheDeco
UnlockedReadsViewDecoratorFactory"                             ratorFactory"properties="name=nonStopCacheName,
properties="name=unlockedReadsViewOne" />                      timeoutMillis=3000, timeoutBehavior= exception | noop |
                                                               localReads

Page  10
Beta
Enterprise EHCache BigMemory



 The distributed cache with MemoryStore use many heap memory, and you need to increase the heap-
 size but the “Garbage collection (GC) is like a ticking time bomb for Java”.

 To minimize the garbage collection penalty, organizations often limit heap size to 2–4 GBs. This
 constrains cache size, limiting the performance benefits that can be achieved by caching.


 BigMemory use a “off-heap” Memory:
      • Limited only by the amount of RAM
        on your hardware and address space

        • It’s very recommend 64 bit OS.



Off-heap data is stored in bytes, there are two implications:
      • Only Serializable cache keys and values
        can be placed in the store

        • Serialization and deserialization take place on
          putting and getting from the store. This means
          that the off-heap store is slower in an absolute sense.




Page  11
Beta
Enterprise EHCache BigMemory


Configuration (with EHCache)

overflowToOffHeap                    : true | false for enables the off-heap memory
maxMemoryOffHeap                     : Sets the amount of off-heap memory available to the cache
maxElementsInMemory                  : Max elements in off-heap, reccomend at least 100 elements.
maxOffHeapValueSize                  : Max dimension of size (in MB) for each objects. Default is 4Mb
DoNotHaltOnCriticalAllocationDelay   : If the memory use is dramatically overallocated for at least 3 seconds (1GB),
                                       the application call a System.exit(1), with this properties you force to wait.
diskPersistent                       : true | false, for store in async way on Disk Store, for handling JVM shutdown.
diskSpoolBufferSizeMB                : The max size of disk store buffer




  … BigMemory is built-in with
   Terracotta Server Arrays.




Page  12
Terracotta Scalability Plattform




                                                      Distributed Shared Objects



   •    www.terracotta.org

   •    Open source, by Terracotta Inc.

   •    Terracotta acquired EHCache and Quartz and provides integration plugins




Page  13
Terracotta Scalability Plattform



       How Terracotta works



   •        Data distribution and synchronization, through Terracotta server arrays

   •        Bytecode waving to leverage application code from distribution/synchronization
            details

   •        Client/Server approach: clients are applications modified with use of AOP, Server
            maintain application state, with redundancy for fault tolerance

   •        Application state resides in Terracotta Server

   •        Application use object data as local in-memory, TC replicates changes




Page  14
Terracotta Scalability Plattform



     Terracotta standard application layout


            AS 1                      AS 2                AS 3
            Business                  Business            Business
             Logic                     Logic               Logic


            TC Libraries              TC Libraries       TC Libraries




                           Database                  TC Server
                                                                        TC Server
                                                                         Backup




Page  15
Terracotta Scalability Plattform



      JVM 1                                                          JVM 2

                                                   Heap                                                  Heap
                Business Logic                                               Business Logic

                                                    3                                                      3
                         1                                                           1
                                                    2                                                      2
                                                    1                                                      1
                2                                   …                        2                             …
                            4                       7                                    4                 7
                                   7                                                            7
                                                    …                                                      …
            3                                       4                   3                                  4
                        5                           5                                5                     5
                                   6                                                            6
                                                    …                                                      …
                                                    6                                                      6
                    TC libraries                    …                            TC libraries              …




                                       Terracotta replicates delta                                   modification
                client make                                                                         transmitted to
                                       modifications to other            5
                modification           clients                                                       other nodes
                 to object
                                       TC Server

Page  16
Infinispan - Advanced Datagrid Platform (1/8)




   •        www.infinispan.org

   •        Open source, based on JBoss Cache

   •        Current version is 4.1.0 FINAL
   •        LGPL License

   •        Developed and supported by RedHat and JBoss community

   •        Entirely written in Java, works on Java 6 machines



Page  17
Infinispan - Advanced Datagrid Platform (2/8)



  What does Infinispan provide?

  •     Cache memory for distributed environments

  •     Supports three modes: Local, Replicated and Distributed

  •     Network communication based on JGroups
         a) TCP / UDP network protocol
         b) Multicast / Unicast
         c) Auto discovery of cluster members
         d) State recovery upon cluster partitioning

  •     Transaction support (JTA compliant)

  •     Eviction algorithms to control memory usage

  •     Persisting state to configurable cache stores


Page  18
Infinispan - Advanced Datagrid Platform (3/8)



      Abstraction: distributed key/value Map
      Same interface, same semantics

      Map<String, Object> myMap = new HashMap<String, Object>();
      myMap.put("id01", myObject);
      MyObject x = (MyObject) myMap.get("id01");
      assert myMap.size() == 1;



      CacheManager cacheManager = new DefaultCacheManager();
      Cache cache = cacheManager.getDefaultCache();
      cache.put("id01", myObject);
      MyObject x = (MyObject) cache.get("id01");
      assert cache.size() == 1;




      Easy to extend features and capabilities to existent code




Page  19
Infinispan - Advanced Datagrid Platform (4/8)



     Data is saved in backup copies for fault tolerance



            Infinispan node 1


              DS1               ds3                       Infinispan node 3


                                                            DS3               ds2



                                Infinispan node 2


                                  DS2               ds1


Page  20
Infinispan - Advanced Datagrid Platform (5/8)


                           Common issues in a distributed environment
       • Where to put objects in a cluster? Is there a “right node”?

       • Uniform distribution across al nodes

       • Where to look for a key? Multicast? Metadata? Routing?

       • What if a node crashes?

       • What if a new node joins the cluster?
                                           Consistent Hashing
       • A deterministic algorithm that maps keys to nodes

       • No look up, no multicast, no waste of network traffic

       • Based on distance, all keys maps to the nearest bucket

       • Cluster changes involve only a small subset of data transfer

       • When a node joins, it takes some of its neighborhood keys

       • When a node leaves its data is spread among its neighborhoods
Page  21
Infinispan - Advanced Datagrid Platform (6/8)




                                    Chord: a consistent hashing algorithm
                        e                        e                    e                   e
                    d                        d                    d                   d
                                a                        a                    a                   a



                c                        c                    c                   c
                            b                        b                    b                   b




            Hash function maps data to an integer 0…N
            Keys and nodes can be seen as points on the edge of a circle

            A key belongs to the clockwise next node
            When a new node enters, part of the keys are reassigned
            When a node crashes its data goes to the next-in-line node




Page  22
Infinispan - Advanced Datagrid Platform (7/8)


                                      Peer-to-Peer vs Client/Server

              User                    User

                                                               C++                        Java
                                                               App                         App
                     Load balancer
                                                               Client                     Client




       App                App                   App

                                                               Server                     Server

     Embedded           Embedded              Embedded       Infinispan                 Infinispan
     Infinispan         Infinispan            Infinispan


    App Server 1       App Server 2          App Server 3   App Server 1               App Server 2




                       Clustered                                           Clustered
Page  23
Infinispan - Advanced Datagrid Platform (8/8)



                                            Protocol comparison

            Protocol        Type         Client Availability    Clustered    Smart    Load Balancing / Fail
                                                                            Routing          over
     REST                   Text                Tons               Yes        No      Any Http Load
                                                                                      Balancer
     Memcached
                             In server mode support Yes
                            Text         Tons
                                                     different No                     Only with
                             protocols, can be used with different                    predefined list of
                             (non-JVM) languages                                      server
     Hot Rod               Binary           Right now,             Yes       Yes      Yes, dynamic via
                                            only Java                                 Hot Rod client
     Web Socket             Text          Javascript only          Yes        No      Any Http Load
                                                                                      Balancer



    Taken from http://community.jboss.org/wiki/InfinispanServerModules




Page  24
Hazelcast - In-memory datagrid computing (1/4)




   •        www.hazelcast.com

   •        Open source, by Hazel Bilisim Ltd

   •        Current version is 1.9

   •        Apache License 2.0

   •        Hosted on google code @ http://code.google.com/p/hazelcast/

   •        Entirely written in Java, works on Java 6 machines

Page  25
Hazelcast - In-memory datagrid computing (2/4)




       What does Hazelcast provide?

   •        Specifically targeted for distributed environment, works only in distributed mode

   •        Distributed API for Lists, Queues, (Multi)Maps, Sets

   •        Not only data: Locks, Tasks execution, Events and Messages

   •        Ad hoc network communication with auto discovery and monitoring tools

   •        Easy configuration, easy to use. Hides major aspects of distribution

   •        Configurable number of backups for replication and fault-tolerance

   •        Concurrency, Transaction support, state persistency

   •        Peer to peer communication only, with super clients (nodes without data)


Page  26
Hazelcast - In-memory datagrid computing (3/4)




     More than data = distributed computing

                                          Distributed Object Queries

   import com.hazelcast.core.Hazelcast;
   import com.hazelcast.core.Imap;
   import com.hazelcast.query.SqlPredicate;
   import java.util.Collection;

   IMap map = Hazelcast.getMap("employees");
   map.addIndex("active", false);
   map.addIndex("name", false);
   map.addIndex("age", true);

   Collection<Employee> employees = map.values(new SqlPredicate("active AND age <= 30"));



   Objects that satisfy criteria are efficiently fetched from the cluster and return by reference
   in a collection



Page  27
Hazelcast - In-memory datagrid computing (4/4)




     More than data = distributed computing

                                   Distributed Task Execution
   import   java.util.concurrent.Callable;
   import   java.util.concurrent.ExecutorService;
   import   java.util.concurrent.FutureTask;
   import   com.hazelcast.core.DistributedTask;
   import   com.hazelcast.core.Hazelcast;

   FutureTask<String> task = new DistributedTask<String>(new Callable<String>() {
       @Override
       public String call() throws Exception {
         String result = null;
         // Do something useful here
         return result;
       }
   });

   ExecutorService executorService = Hazelcast.getExecutorService();
   executorService.execute(task);
   String result = task.get();


   Execution is sent over the cluster

Page  28
Memcached Overview




     • http://memcached.org/

     • Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects)

     • Memcached server is wrote in C/C++, and It’s a daemon service installable on different O.S.

     • Its API is available for most popular languages.

     • It’s designed to take advantage of free memory.



     Java Client xmemcached : http://code.google.com/p/xmemcached/



Page  29
Memcached & Xmemcached Java client



                    Telnet Interface                                  Java Client Interface (xmemcached)

> telnet localhost 11211                                     MemcachedClientBuilder builder = new
                                                             XMemcachedClientBuilder(
> set key 0 900 13                                           AddrUtil.getAddresses("localhost:11211"));
> data_to_store
                                                             MemcachedClient memcachedClient =
STORED
                                                             builder.build();

> get key                                                    memcachedClient.add("key", 0,”Hello”);
VALUE key 0 13
data_to_store                                                memcachedClient.get("key");
END
                                                             memcachedClient.shutdown();




• Supports connection pool.You can create more connections to one memcached server with java.nio.*

• Dynamically add/remove server

• Data compression (because Memcached is inefficient when you store large data)

• Fast Integration with Hibernate-memcached



Page  30
Memcached – Xmemcached Example


Weighted Server
MemcachedClientBuilder builder = new
XMemcachedClientBuilder(AddrUtil.getAddresses("localhost:12000 localhost:12001"),
                                               new int[]{1,3});
MemcachedClient memcachedClient=builder.build();

You can change the weight dynamically through JMX
public interface XMemcachedClientMBean {
      public void setServerWeight(String server, int weight);
}




XMemcached can adjust weight of node for balance the load of memcached server, the weight is more high,
the memcached server will store more data, and receive more load.

Use counter to increment / decrement
 ...
 MemcachedClient memcachedClient=builder.build();

 Counter counter= memcachedClient.getCounter("counter",0);
 counter.incrementAndGet();
 counter.decrementAndGet();
 counter.addAndGet(-10);


You can use MemcachedClient's incr/decr methods to increase or decrease counter,but xmemcached has a
counter which encapsulate the incr/decr methods,you can use counter just like AtomicLong:
 Page  31
Memcached & MySQL (multiple Mechached servers and a stand-AloneMySQLServers)



MySQL Enterprise included memcached for a cluster configurations.
There are various ways to design scalable architectures using memcached and MySQL




Page  32
Memcached & MySQL (Multiple Memcached Servers with a Master and multiple Slave MySQL)


This architecture allows for scaling a read intensive application.




Page  33
Memcached & MySQL (Sharding, multiple Memcached Servers with a Master and multiple Slave MySQL)



With sharding (application partitioning) we partition data across multiple physical servers to gain read and
write scalability.




 Page  34
Performance Reports


Items           100.000 Objects : String name, String surname, int age, String description

Configuration    2 node on same physical machine (localhost)

Hardware         Intel Core2 Duo CPU - P8400 @2.26 Ghz – 3.48 GB Ram




     35
     30
     25
     20                                                                                                     Put()
     15                                                                                                     Get()
     10
        5
        0
                EHCache 2.2           Infinispan           HazelCast           Xmemcached


            (*) Before make a choose is really important execute realistic tests in your environment, with different cluster
Page  35   size, network bandwidth, different concurrent access and application type:
            web application e-commerce like, data intensive processing, near real time apps…
CAP Theorem and Cache coherency


It states, that though its desirable to have Consistency, High-Availability and Partition-tolerance in every
system, unfortunately no system can achieve all three at the same time.

This is true also for Distributed cache solutions.

Consistency                 : all nodes see the same data at the same time.
Availability                : node failures do not prevent survivors from continuing to operate.
Partition Tolerance         : Risk of data partition; the system continues to operate despite arbitrary message loss.

Read and Write Through / Sync Mode:

                               Cache Mode =>            Replicated             Distributed
                             Consistency                    Yes                    Yes

                             Availability                   Yes                     No

                             Partition Torelance            No                     Yes


Write behind / ASync Mode:
                               Cache Mode =>            Replicated             Distributed
                              Consistency                    No                     No

                              Availability                  Yes                     No

                              Partition Torelance            No                     Yes

Page  36
Conclusion




     • Distributed cache are interesting technology, but come at a cost

     • There is no "perfect solution", every choice must be evaluated

     • Work greatly on mostly read access data

     • In-memory state more difficult to monitor than traditional solutions

     • Replication best fit on small size cluster

     • Big environment needs actual data distribution




Page  37
Reference 1/2



 Official documentation
        • EHCache & Terracotta    : http://ehcache.org/documentation/index.html
        • Terracotta             : http://www.terracotta.org/platform/
        • Infinispan             : http://jboss.org/infinispan/docs.html
        • HazelCast              : http://www.hazelcast.com/documentation.jsp
        • Memcached              : http://memcached.org/
        • Xmemcached             : http://code.google.com/p/xmemcached/

Articles & Blogs
       • Intro to Caching,Caching algorithms and caching frameworks part 4 :
               http://javalandscape.blogspot.com/2009/03/intro-to-cachingcaching-algorithms-and.html
       • Comparison of the Grid/Cloud Computing Frameworks (Hadoop, GridGain, Hazelcast, DAC) - Part II
               http://java.dzone.com/articles/comparison-gridcloud-computing-0
       • Brewers CAP Theorem on distributed systems
               http://www.hazelcast.com/documentation.jsp
       • Ehcache - A Java Distributed Cache
               http://highscalability.com/ehcache-java-distributed-cache
       • A Matter of Scale: The CAP Theorem and Memory Models
                http://coverclock.blogspot.com/2010/05/matter-of-scale-cap-theorem-and-memory.html
       • Consistent Hashing
                http://www.lexemetech.com/2007/11/consistent-hashing.html
       • A Couple Minutes With Non-Stop Ehcache
               http://dsoguy.blogspot.com/2010/05/couple-minutes-with-non-stop-ehcache_07.html
       • Designing and Implementing Scalable Applications with Memcached and MySQL
               http://www.mysql.com/why-mysql/white-papers/mysql_wp_memcached.php


Page  38
Reference 2/2



Presentations
        • Shopzilla On Concurrency             :   http://www.slideshare.net/WillGage/shopzilla-on-concurrency-3872625
        • Scaling your cache                   :   http://www.slideshare.net/alexmiller/scaling-your-cache
        • Caching in Distributed Enviroment   :    http://www.slideshare.net/abhigad/7564192
        • Infinispan by Jteam                  :   http://www.jteam.nl/specials/techtalks/011110/attachment/Infinispan.pdf


 Wikipedia
        • Brewer’s theorem : http://en.wikipedia.org/wiki/CAP_theorem
        • Performance tuning : http://en.wikipedia.org/wiki/Performance_tuning




Page  39
Q&A and Thanks




                   Davide Carnevali                        Lorenzo Acerbo

            Email : davide.carnevali at gmail.com   Email : lorenzo.acerbo at gmail.com
            Skype : davide.carnevali                Skype : lorenzo.acerbo


Page  40

Contenu connexe

Tendances

Ehcache Architecture, Features And Usage Patterns
Ehcache Architecture, Features And Usage PatternsEhcache Architecture, Features And Usage Patterns
Ehcache Architecture, Features And Usage PatternsEduardo Pelegri-Llopart
 
Distributed Caching Using the JCACHE API and ehcache, Including a Case Study ...
Distributed Caching Using the JCACHE API and ehcache, Including a Case Study ...Distributed Caching Using the JCACHE API and ehcache, Including a Case Study ...
Distributed Caching Using the JCACHE API and ehcache, Including a Case Study ...elliando dias
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduseScott Miao
 
A memcached implementation in Java
A memcached implementation in JavaA memcached implementation in Java
A memcached implementation in Javaelliando dias
 
Implementing High Availability Caching with Memcached
Implementing High Availability Caching with MemcachedImplementing High Availability Caching with Memcached
Implementing High Availability Caching with MemcachedGear6
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Noteslarsgeorge
 
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...Command Prompt., Inc
 
Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门frogd
 
Postgres on OpenStack
Postgres on OpenStackPostgres on OpenStack
Postgres on OpenStackEDB
 
Lightweight Grids With Terracotta
Lightweight Grids With TerracottaLightweight Grids With Terracotta
Lightweight Grids With TerracottaPT.JUG
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldJignesh Shah
 
Planning & Best Practice for Microsoft Virtualization
Planning & Best Practice for Microsoft VirtualizationPlanning & Best Practice for Microsoft Virtualization
Planning & Best Practice for Microsoft VirtualizationLai Yoong Seng
 
Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataJignesh Shah
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
 

Tendances (20)

Ehcache Architecture, Features And Usage Patterns
Ehcache Architecture, Features And Usage PatternsEhcache Architecture, Features And Usage Patterns
Ehcache Architecture, Features And Usage Patterns
 
PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"
 
Distributed Caching Using the JCACHE API and ehcache, Including a Case Study ...
Distributed Caching Using the JCACHE API and ehcache, Including a Case Study ...Distributed Caching Using the JCACHE API and ehcache, Including a Case Study ...
Distributed Caching Using the JCACHE API and ehcache, Including a Case Study ...
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
How swift is your Swift - SD.pptx
How swift is your Swift - SD.pptxHow swift is your Swift - SD.pptx
How swift is your Swift - SD.pptx
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
 
A memcached implementation in Java
A memcached implementation in JavaA memcached implementation in Java
A memcached implementation in Java
 
Implementing High Availability Caching with Memcached
Implementing High Availability Caching with MemcachedImplementing High Availability Caching with Memcached
Implementing High Availability Caching with Memcached
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
 
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
 
Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门
 
Postgres on OpenStack
Postgres on OpenStackPostgres on OpenStack
Postgres on OpenStack
 
Lightweight Grids With Terracotta
Lightweight Grids With TerracottaLightweight Grids With Terracotta
Lightweight Grids With Terracotta
 
Memcached Study
Memcached StudyMemcached Study
Memcached Study
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
Planning & Best Practice for Microsoft Virtualization
Planning & Best Practice for Microsoft VirtualizationPlanning & Best Practice for Microsoft Virtualization
Planning & Best Practice for Microsoft Virtualization
 
Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 

Similaire à Jug Lugano - Scale over the limits

Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutionspmanvi
 
Heapoff memory wtf
Heapoff memory wtfHeapoff memory wtf
Heapoff memory wtfOlivier Lamy
 
The effect of page size modification on jvm
The effect of page size modification on jvmThe effect of page size modification on jvm
The effect of page size modification on jvmParameswaran Selvam
 
Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and opsaragozin
 
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICESSpring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICESMichael Plöd
 
인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처Jaehong Cheon
 
Running your Java EE 6 applications in the clouds
Running your Java EE 6 applications in the clouds Running your Java EE 6 applications in the clouds
Running your Java EE 6 applications in the clouds Arun Gupta
 
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010Arun Gupta
 
Give Your Site a Boost with Memcache
Give Your Site a Boost with MemcacheGive Your Site a Boost with Memcache
Give Your Site a Boost with MemcacheBen Ramsey
 
VMworld 2013: Extreme Performance Series: Storage in a Flash
VMworld 2013: Extreme Performance Series: Storage in a Flash VMworld 2013: Extreme Performance Series: Storage in a Flash
VMworld 2013: Extreme Performance Series: Storage in a Flash VMworld
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engineBhuvaneshwaran R
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionSearce Inc
 
ContainerWorkloadwithSemeru.pdf
ContainerWorkloadwithSemeru.pdfContainerWorkloadwithSemeru.pdf
ContainerWorkloadwithSemeru.pdfSumanMitra22
 
Developing High Performance and Scalable ColdFusion Application Using Terraco...
Developing High Performance and Scalable ColdFusion Application Using Terraco...Developing High Performance and Scalable ColdFusion Application Using Terraco...
Developing High Performance and Scalable ColdFusion Application Using Terraco...ColdFusionConference
 
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...Shailendra Prasad
 
An Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle CoherenceAn Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle CoherenceOracle
 
Caching Data For Performance
Caching Data For PerformanceCaching Data For Performance
Caching Data For PerformanceDave Ross
 

Similaire à Jug Lugano - Scale over the limits (20)

Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutions
 
Heapoff memory wtf
Heapoff memory wtfHeapoff memory wtf
Heapoff memory wtf
 
The effect of page size modification on jvm
The effect of page size modification on jvmThe effect of page size modification on jvm
The effect of page size modification on jvm
 
Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
 
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICESSpring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
 
인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처
 
Running your Java EE 6 applications in the clouds
Running your Java EE 6 applications in the clouds Running your Java EE 6 applications in the clouds
Running your Java EE 6 applications in the clouds
 
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
 
Give Your Site a Boost with Memcache
Give Your Site a Boost with MemcacheGive Your Site a Boost with Memcache
Give Your Site a Boost with Memcache
 
VMworld 2013: Extreme Performance Series: Storage in a Flash
VMworld 2013: Extreme Performance Series: Storage in a Flash VMworld 2013: Extreme Performance Series: Storage in a Flash
VMworld 2013: Extreme Performance Series: Storage in a Flash
 
Mini-Training: To cache or not to cache
Mini-Training: To cache or not to cacheMini-Training: To cache or not to cache
Mini-Training: To cache or not to cache
 
Azure appfabric caching intro and tips
Azure appfabric caching intro and tipsAzure appfabric caching intro and tips
Azure appfabric caching intro and tips
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engine
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in Production
 
AppFabric Velocity
AppFabric VelocityAppFabric Velocity
AppFabric Velocity
 
ContainerWorkloadwithSemeru.pdf
ContainerWorkloadwithSemeru.pdfContainerWorkloadwithSemeru.pdf
ContainerWorkloadwithSemeru.pdf
 
Developing High Performance and Scalable ColdFusion Application Using Terraco...
Developing High Performance and Scalable ColdFusion Application Using Terraco...Developing High Performance and Scalable ColdFusion Application Using Terraco...
Developing High Performance and Scalable ColdFusion Application Using Terraco...
 
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...Developing High Performance and Scalable ColdFusion Applications Using Terrac...
Developing High Performance and Scalable ColdFusion Applications Using Terrac...
 
An Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle CoherenceAn Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle Coherence
 
Caching Data For Performance
Caching Data For PerformanceCaching Data For Performance
Caching Data For Performance
 

Jug Lugano - Scale over the limits

  • 1. Scale over the limits: an overview of modern distributed caching solutions Davide Carnevali – Lorenzo Acerbo Talk duration 50” Talk slides 40# JUG Lugano – Lugano (CH), October 5th 2010
  • 2. What, Why and When about distributed caching? What Fast and efficient memory to store frequently used data, shared by several machines. Why • leverage database load • increment cluster tolerance to failures • enable horizontal scalability When • several machines need the same data • computation is spread throughout many individual nodes Page  2
  • 3. How does a local cache work? Distinct JVMs have their own cache, and access database independently x x ? x x ? X(C1) == X(C2) == X(C3) Database x Page  3
  • 4. How does a replicated cache work? Distinct JVMs have their own cache, but share the same data x x x x x x C Heap-based data is X(C1) == X(C2) == X(C3) Database x accessible by any JVM Page  4
  • 5. How does a distributed cache work? Distinct JVMs see caches and data as their own x x x x C Heap-based data is accessible by any JVM Database x Page  5
  • 6. Replicated VS Distributed Replicated Mode Pro Cons  Best choice for small clusters  Does not scale in terms of memory  All data are in local memory  Limited to the heap of a single JVM (read high performance) Distributed Mode Pro Cons  Scale (almost) linearly  Not all data is in local memory  No memory limit  Higher network traffic  Resilience to server failure  Performance lost on serialization/deserialization Page  6
  • 7. Caching strategy – Write Behind with distributed mode When a application puts (writes) the informations in the distributed cache can be used two different mechanisms for syncronize the shared memory and write the data on resource (such as database). The write behind strategy, also know as asynchronous strategy, means that updates to the cache store are done by a separate thread to the client thread interacting with the cache. put() store() A A A JVM 1 JVM 3 Database A JVM 2 JVM 4 Page  7
  • 8. Some JAVA fameworks Many JAVA open source frameworks are available to create your distributed cache. We discover some the solutions with major features and with a good open source communities. Name Company / Community License Link EHCache 2.x Terracotta Apache License 2.0 http://ehcache.org/ Infinispan 4.0 FINAL Jboss-Red Hat LGPL 2.1 http://jboss.org/infinispan HazelCast 1.9 Hazelcast Apache License 2.0 http://www.hazelcast.com MemCached [Server] BSD License http://memcached.org/ - Xmemcached [Client] Apache License 2.0 http://code.google.com/p/xmemcached/ Terracotta Server Terracotta Commercial http://terracotta.org/ Page  8
  • 9. EHCache Overview • EHCache require JAVA 1.5 or 1.6 runtime • Standards based of JSR 107 API • Replicated caching via Jgroups TCP / IP, RMI or JMS • Transactional support through JTA • Dinamically Modifying Cache Configuration (at Runtime) : Cache Manager is a singleton. • Fast integration with ORM such as Hibernate Two interesting decorator Cache manager • UnlockedReadsView Normally a read lock must first be obtained to read data from backed. If there is an outstanding write lock, the read lock queues up. This is done so that the happens before guarantee can be made. If the business logic is happy to read stale data even if a write lock has been acquired in preparation for changing it, then much higher speeds can be obtained. • NonStopCache : Provides SLA level control features for your cache. Automatically respond to cluster topology events to take a pre- configured action. • You're using a write-through cache and your DB hangs. Use your Non-stop cache decorator to keep it from hanging your entire Application Server. • You have one cache that is accessed for multiple functions. For some of those functions you want operations to timeout after 5 seconds and for others you want 20 seconds. You can have multiple decorators on the same cache with the different semantics defined. Page  9
  • 10. EHCache Configuration sample <ehcache > <!-- JGroupsCacheReplicatorFactory --> <cacheManagerPeerProviderFactory class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory" properties="connect=TCP(bind_port=7800;start_port=7800): TCPPING(initial_hosts=localhost[7800];port_range=10;timeout=3000; num_initial_members=3;up_thread=true;down_thread=true): VERIFY_SUSPECT(timeout=1500;down_thread=false;up_thread=false): pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000): pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false; print_local_addr=false;down_thread=true;up_thread=true)" propertySeparator="::" /> <cache name="myCustomCache_EhCache" maxElementsInMemory="1000000" eternal="true" overflowToDisk="false"> <cacheEventListenerFactory class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory" properties="replicateAsynchronously=false,replicatePuts=true,replicateUpdates=true, replicateUpdatesViaCopy=true,replicateRemovals=true" /> <bootstrapCacheLoaderFactory class="net.sf.ehcache.distribution.jgroups.JGroupsBootstrapCacheLoaderFactory"/> </cache> </ehcache> UnlockedReadsView NonStopCache <cacheDecoratorFactory <cacheDecoratorFactory class="net.sf.ehcache.constructs.unlockedreadsview. class="net.sf.ehcache.constructs.nonstop.NonStopCacheDeco UnlockedReadsViewDecoratorFactory" ratorFactory"properties="name=nonStopCacheName, properties="name=unlockedReadsViewOne" /> timeoutMillis=3000, timeoutBehavior= exception | noop | localReads Page  10
  • 11. Beta Enterprise EHCache BigMemory The distributed cache with MemoryStore use many heap memory, and you need to increase the heap- size but the “Garbage collection (GC) is like a ticking time bomb for Java”. To minimize the garbage collection penalty, organizations often limit heap size to 2–4 GBs. This constrains cache size, limiting the performance benefits that can be achieved by caching. BigMemory use a “off-heap” Memory: • Limited only by the amount of RAM on your hardware and address space • It’s very recommend 64 bit OS. Off-heap data is stored in bytes, there are two implications: • Only Serializable cache keys and values can be placed in the store • Serialization and deserialization take place on putting and getting from the store. This means that the off-heap store is slower in an absolute sense. Page  11
  • 12. Beta Enterprise EHCache BigMemory Configuration (with EHCache) overflowToOffHeap : true | false for enables the off-heap memory maxMemoryOffHeap : Sets the amount of off-heap memory available to the cache maxElementsInMemory : Max elements in off-heap, reccomend at least 100 elements. maxOffHeapValueSize : Max dimension of size (in MB) for each objects. Default is 4Mb DoNotHaltOnCriticalAllocationDelay : If the memory use is dramatically overallocated for at least 3 seconds (1GB), the application call a System.exit(1), with this properties you force to wait. diskPersistent : true | false, for store in async way on Disk Store, for handling JVM shutdown. diskSpoolBufferSizeMB : The max size of disk store buffer … BigMemory is built-in with Terracotta Server Arrays. Page  12
  • 13. Terracotta Scalability Plattform Distributed Shared Objects • www.terracotta.org • Open source, by Terracotta Inc. • Terracotta acquired EHCache and Quartz and provides integration plugins Page  13
  • 14. Terracotta Scalability Plattform How Terracotta works • Data distribution and synchronization, through Terracotta server arrays • Bytecode waving to leverage application code from distribution/synchronization details • Client/Server approach: clients are applications modified with use of AOP, Server maintain application state, with redundancy for fault tolerance • Application state resides in Terracotta Server • Application use object data as local in-memory, TC replicates changes Page  14
  • 15. Terracotta Scalability Plattform Terracotta standard application layout AS 1 AS 2 AS 3 Business Business Business Logic Logic Logic TC Libraries TC Libraries TC Libraries Database TC Server TC Server Backup Page  15
  • 16. Terracotta Scalability Plattform JVM 1 JVM 2 Heap Heap Business Logic Business Logic 3 3 1 1 2 2 1 1 2 … 2 … 4 7 4 7 7 7 … … 3 4 3 4 5 5 5 5 6 6 … … 6 6 TC libraries … TC libraries … Terracotta replicates delta modification client make transmitted to modifications to other 5 modification clients other nodes to object TC Server Page  16
  • 17. Infinispan - Advanced Datagrid Platform (1/8) • www.infinispan.org • Open source, based on JBoss Cache • Current version is 4.1.0 FINAL • LGPL License • Developed and supported by RedHat and JBoss community • Entirely written in Java, works on Java 6 machines Page  17
  • 18. Infinispan - Advanced Datagrid Platform (2/8) What does Infinispan provide? • Cache memory for distributed environments • Supports three modes: Local, Replicated and Distributed • Network communication based on JGroups a) TCP / UDP network protocol b) Multicast / Unicast c) Auto discovery of cluster members d) State recovery upon cluster partitioning • Transaction support (JTA compliant) • Eviction algorithms to control memory usage • Persisting state to configurable cache stores Page  18
  • 19. Infinispan - Advanced Datagrid Platform (3/8) Abstraction: distributed key/value Map Same interface, same semantics Map<String, Object> myMap = new HashMap<String, Object>(); myMap.put("id01", myObject); MyObject x = (MyObject) myMap.get("id01"); assert myMap.size() == 1; CacheManager cacheManager = new DefaultCacheManager(); Cache cache = cacheManager.getDefaultCache(); cache.put("id01", myObject); MyObject x = (MyObject) cache.get("id01"); assert cache.size() == 1; Easy to extend features and capabilities to existent code Page  19
  • 20. Infinispan - Advanced Datagrid Platform (4/8) Data is saved in backup copies for fault tolerance Infinispan node 1 DS1 ds3 Infinispan node 3 DS3 ds2 Infinispan node 2 DS2 ds1 Page  20
  • 21. Infinispan - Advanced Datagrid Platform (5/8) Common issues in a distributed environment • Where to put objects in a cluster? Is there a “right node”? • Uniform distribution across al nodes • Where to look for a key? Multicast? Metadata? Routing? • What if a node crashes? • What if a new node joins the cluster? Consistent Hashing • A deterministic algorithm that maps keys to nodes • No look up, no multicast, no waste of network traffic • Based on distance, all keys maps to the nearest bucket • Cluster changes involve only a small subset of data transfer • When a node joins, it takes some of its neighborhood keys • When a node leaves its data is spread among its neighborhoods Page  21
  • 22. Infinispan - Advanced Datagrid Platform (6/8) Chord: a consistent hashing algorithm e e e e d d d d a a a a c c c c b b b b Hash function maps data to an integer 0…N Keys and nodes can be seen as points on the edge of a circle A key belongs to the clockwise next node When a new node enters, part of the keys are reassigned When a node crashes its data goes to the next-in-line node Page  22
  • 23. Infinispan - Advanced Datagrid Platform (7/8) Peer-to-Peer vs Client/Server User User C++ Java App App Load balancer Client Client App App App Server Server Embedded Embedded Embedded Infinispan Infinispan Infinispan Infinispan Infinispan App Server 1 App Server 2 App Server 3 App Server 1 App Server 2 Clustered Clustered Page  23
  • 24. Infinispan - Advanced Datagrid Platform (8/8) Protocol comparison Protocol Type Client Availability Clustered Smart Load Balancing / Fail Routing over REST Text Tons Yes No Any Http Load Balancer Memcached In server mode support Yes Text Tons different No Only with protocols, can be used with different predefined list of (non-JVM) languages server Hot Rod Binary Right now, Yes Yes Yes, dynamic via only Java Hot Rod client Web Socket Text Javascript only Yes No Any Http Load Balancer Taken from http://community.jboss.org/wiki/InfinispanServerModules Page  24
  • 25. Hazelcast - In-memory datagrid computing (1/4) • www.hazelcast.com • Open source, by Hazel Bilisim Ltd • Current version is 1.9 • Apache License 2.0 • Hosted on google code @ http://code.google.com/p/hazelcast/ • Entirely written in Java, works on Java 6 machines Page  25
  • 26. Hazelcast - In-memory datagrid computing (2/4) What does Hazelcast provide? • Specifically targeted for distributed environment, works only in distributed mode • Distributed API for Lists, Queues, (Multi)Maps, Sets • Not only data: Locks, Tasks execution, Events and Messages • Ad hoc network communication with auto discovery and monitoring tools • Easy configuration, easy to use. Hides major aspects of distribution • Configurable number of backups for replication and fault-tolerance • Concurrency, Transaction support, state persistency • Peer to peer communication only, with super clients (nodes without data) Page  26
  • 27. Hazelcast - In-memory datagrid computing (3/4) More than data = distributed computing Distributed Object Queries import com.hazelcast.core.Hazelcast; import com.hazelcast.core.Imap; import com.hazelcast.query.SqlPredicate; import java.util.Collection; IMap map = Hazelcast.getMap("employees"); map.addIndex("active", false); map.addIndex("name", false); map.addIndex("age", true); Collection<Employee> employees = map.values(new SqlPredicate("active AND age <= 30")); Objects that satisfy criteria are efficiently fetched from the cluster and return by reference in a collection Page  27
  • 28. Hazelcast - In-memory datagrid computing (4/4) More than data = distributed computing Distributed Task Execution import java.util.concurrent.Callable; import java.util.concurrent.ExecutorService; import java.util.concurrent.FutureTask; import com.hazelcast.core.DistributedTask; import com.hazelcast.core.Hazelcast; FutureTask<String> task = new DistributedTask<String>(new Callable<String>() { @Override public String call() throws Exception { String result = null; // Do something useful here return result; } }); ExecutorService executorService = Hazelcast.getExecutorService(); executorService.execute(task); String result = task.get(); Execution is sent over the cluster Page  28
  • 29. Memcached Overview • http://memcached.org/ • Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) • Memcached server is wrote in C/C++, and It’s a daemon service installable on different O.S. • Its API is available for most popular languages. • It’s designed to take advantage of free memory. Java Client xmemcached : http://code.google.com/p/xmemcached/ Page  29
  • 30. Memcached & Xmemcached Java client Telnet Interface Java Client Interface (xmemcached) > telnet localhost 11211 MemcachedClientBuilder builder = new XMemcachedClientBuilder( > set key 0 900 13 AddrUtil.getAddresses("localhost:11211")); > data_to_store MemcachedClient memcachedClient = STORED builder.build(); > get key memcachedClient.add("key", 0,”Hello”); VALUE key 0 13 data_to_store memcachedClient.get("key"); END memcachedClient.shutdown(); • Supports connection pool.You can create more connections to one memcached server with java.nio.* • Dynamically add/remove server • Data compression (because Memcached is inefficient when you store large data) • Fast Integration with Hibernate-memcached Page  30
  • 31. Memcached – Xmemcached Example Weighted Server MemcachedClientBuilder builder = new XMemcachedClientBuilder(AddrUtil.getAddresses("localhost:12000 localhost:12001"), new int[]{1,3}); MemcachedClient memcachedClient=builder.build(); You can change the weight dynamically through JMX public interface XMemcachedClientMBean { public void setServerWeight(String server, int weight); } XMemcached can adjust weight of node for balance the load of memcached server, the weight is more high, the memcached server will store more data, and receive more load. Use counter to increment / decrement ... MemcachedClient memcachedClient=builder.build(); Counter counter= memcachedClient.getCounter("counter",0); counter.incrementAndGet(); counter.decrementAndGet(); counter.addAndGet(-10); You can use MemcachedClient's incr/decr methods to increase or decrease counter,but xmemcached has a counter which encapsulate the incr/decr methods,you can use counter just like AtomicLong: Page  31
  • 32. Memcached & MySQL (multiple Mechached servers and a stand-AloneMySQLServers) MySQL Enterprise included memcached for a cluster configurations. There are various ways to design scalable architectures using memcached and MySQL Page  32
  • 33. Memcached & MySQL (Multiple Memcached Servers with a Master and multiple Slave MySQL) This architecture allows for scaling a read intensive application. Page  33
  • 34. Memcached & MySQL (Sharding, multiple Memcached Servers with a Master and multiple Slave MySQL) With sharding (application partitioning) we partition data across multiple physical servers to gain read and write scalability. Page  34
  • 35. Performance Reports Items 100.000 Objects : String name, String surname, int age, String description Configuration 2 node on same physical machine (localhost) Hardware Intel Core2 Duo CPU - P8400 @2.26 Ghz – 3.48 GB Ram 35 30 25 20 Put() 15 Get() 10 5 0 EHCache 2.2 Infinispan HazelCast Xmemcached (*) Before make a choose is really important execute realistic tests in your environment, with different cluster Page  35 size, network bandwidth, different concurrent access and application type: web application e-commerce like, data intensive processing, near real time apps…
  • 36. CAP Theorem and Cache coherency It states, that though its desirable to have Consistency, High-Availability and Partition-tolerance in every system, unfortunately no system can achieve all three at the same time. This is true also for Distributed cache solutions. Consistency : all nodes see the same data at the same time. Availability : node failures do not prevent survivors from continuing to operate. Partition Tolerance : Risk of data partition; the system continues to operate despite arbitrary message loss. Read and Write Through / Sync Mode: Cache Mode => Replicated Distributed Consistency Yes Yes Availability Yes No Partition Torelance No Yes Write behind / ASync Mode: Cache Mode => Replicated Distributed Consistency No No Availability Yes No Partition Torelance No Yes Page  36
  • 37. Conclusion • Distributed cache are interesting technology, but come at a cost • There is no "perfect solution", every choice must be evaluated • Work greatly on mostly read access data • In-memory state more difficult to monitor than traditional solutions • Replication best fit on small size cluster • Big environment needs actual data distribution Page  37
  • 38. Reference 1/2 Official documentation • EHCache & Terracotta : http://ehcache.org/documentation/index.html • Terracotta : http://www.terracotta.org/platform/ • Infinispan : http://jboss.org/infinispan/docs.html • HazelCast : http://www.hazelcast.com/documentation.jsp • Memcached : http://memcached.org/ • Xmemcached : http://code.google.com/p/xmemcached/ Articles & Blogs • Intro to Caching,Caching algorithms and caching frameworks part 4 : http://javalandscape.blogspot.com/2009/03/intro-to-cachingcaching-algorithms-and.html • Comparison of the Grid/Cloud Computing Frameworks (Hadoop, GridGain, Hazelcast, DAC) - Part II http://java.dzone.com/articles/comparison-gridcloud-computing-0 • Brewers CAP Theorem on distributed systems http://www.hazelcast.com/documentation.jsp • Ehcache - A Java Distributed Cache http://highscalability.com/ehcache-java-distributed-cache • A Matter of Scale: The CAP Theorem and Memory Models http://coverclock.blogspot.com/2010/05/matter-of-scale-cap-theorem-and-memory.html • Consistent Hashing http://www.lexemetech.com/2007/11/consistent-hashing.html • A Couple Minutes With Non-Stop Ehcache http://dsoguy.blogspot.com/2010/05/couple-minutes-with-non-stop-ehcache_07.html • Designing and Implementing Scalable Applications with Memcached and MySQL http://www.mysql.com/why-mysql/white-papers/mysql_wp_memcached.php Page  38
  • 39. Reference 2/2 Presentations • Shopzilla On Concurrency : http://www.slideshare.net/WillGage/shopzilla-on-concurrency-3872625 • Scaling your cache : http://www.slideshare.net/alexmiller/scaling-your-cache • Caching in Distributed Enviroment : http://www.slideshare.net/abhigad/7564192 • Infinispan by Jteam : http://www.jteam.nl/specials/techtalks/011110/attachment/Infinispan.pdf Wikipedia • Brewer’s theorem : http://en.wikipedia.org/wiki/CAP_theorem • Performance tuning : http://en.wikipedia.org/wiki/Performance_tuning Page  39
  • 40. Q&A and Thanks Davide Carnevali Lorenzo Acerbo Email : davide.carnevali at gmail.com Email : lorenzo.acerbo at gmail.com Skype : davide.carnevali Skype : lorenzo.acerbo Page  40