SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
X
The Top Five Six Reasons
         to Use a
  Distributed Data Grid

                   Webinar
              December, 2011
 Bill Bain (wbain@scaleoutsoftware.com)


         Copyright © 2011 by ScaleOut Software, Inc.
Agenda

• About ScaleOut Software
• Overview of Products
• What is a Distributed Data Grid (DDG)?
• The Top Six Reasons
• What to Look for in a DDG Product




2                                     ScaleOut Software, Inc.
Company
• Founded in September 2003, privately funded
• Offices in Bellevue, WA and Beaverton, OR
• Team:
     – Dr. William Bain, Founder & CEO
        • Career focused on parallel computing – Bell Labs, Intel, Microsoft
        • 3 prior start-ups, last acquired by Microsoft and product now ships
          as Network Load Balancing in Windows Server
     – David Brinker, COO
        • 20 years software business and executive management
          experience
        • Mentor Graphics, Cadence, Webridge
• Develops and markets Linux & Windows DDG products.
• Seven years market experience.
 3                                                         ScaleOut Software, Inc.
It’s All About Scaling Performance
• Scaling performance:
                                                SCALE OUT
         CPU


        Memory                Scale Out



        Storage
                                          CPU    CPU         CPU         CPU


                                      Memory    Memory     Memory      Memory
    Scaling out:
    • Has excellent scalability.
                                      Storage   Storage    Storage     Storage
    • But is challenging to
      implement.


4                                                         ScaleOut Software, Inc.
What is a Distributed Data Grid?
(Aka “distributed cache”, “in-memory data grid”)
                                                    Processor       Processor
• A new “vertical” storage tier:                     Cache           Cache


    – Adds missing layer to boost
      performance.                                  L2 Cache        L2 Cache

    – Uses in-memory, out-of-process
      storage.                                      Application     Application
                                                      Memory          Memory
    – Avoids repeated trips to backing             “In-Process”    “In-Process”

      storage.
                                                   Distributed      Distributed
• A new “horizontal” storage tier:                 Data Grid        Data Grid
                                                    “Out-of-         “Out-of-
    –   Allows data sharing among servers.          Process”         Process”


    –   Scales performance & capacity.
    –   Adds high availability.                     Backing
                                                    Storage
    –   Can be used independently of
        backing storage.
5                                                      ScaleOut Software, Inc.
Distributed Data Grids: A Closer Look
• Incorporates a client-side, in-
  process cache (“near cache”):
                                           Application
    – Transparent to the application         Memory
    – Holds recently accessed data.       “In-Process”
                                           Client-side
• Boosts performance:                        Cache
    – Eliminates repeated network data    “In-Process”
                                           Distributed
      transfers & deserialization.         Data Grid
    – Reduces access times to near “in-     “Out-of-
      process” latency.                    Process”

    – Is automatically updated if the
      distributed grid changes.
    – Supports various coherency models
      (coherent, polled, event-driven)
6                                           ScaleOut Software, Inc.
The Need for Memory-Based Storage
Example: Web server farm:
                                                                          Internet
• Load-balancer directs                                                                                    Load-balancer
  incoming client requests
                                                                          POW ER FAU LT DATA AL A RM




                                                                                     Ethernet

  to Web servers.

• Web and app. server
  farms build Web pages         W eb Server
                                              Distributed, In-Memory DataServer W eb Server
                                                W eb Server W eb Server W eb Server W eb
                                                                                         Grid
  and run business logic.                                                            Ethernet




• Database server holds all
  mission-critical, LOB data.
                                                             D atabase   R aid D isk                        D atabase
                                                              Server       Array                             Server                   Bottleneck
• Server farms share fast-                                                Ethernet


  changing data using a                       Distributed, In-Memory Data Grid
  DDG to avoid bottlenecks
  and maximize scalability.                    App. Server      App. Server                            App. Server      App. Server



 7                                                                                                           ScaleOut Software, Inc.
The Need for Memory-Based Storage
Example: Cloud Application:           Cloud Application
                                                     App VS
• Application runs as multiple,       App VS


  virtual servers (VS).              App VS
                                               App VS
                                                         App VS


• Application instances store and
  retrieve LOB data from cloud-                      Grid VS

  based file system or database.     Grid VS
                                               Grid VS



                                     Distributed Data Grid
• Applications need fast, scalable
  storage for fast-changing data.

• Distributed data grid runs as
  multiple, virtual servers to
  provide “elastic,” in-memory
  storage.
                                     Cloud-Based Storage

8                                                        ScaleOut Software, Inc.
Scalability Challenges for Applications
•       “Scaled out” server applications repeatedly access two types of data:
         – Repeatedly referenced database-data (e.g., stock prices) and
         – Fast changing, business-logic data (e.g., session-state, workflow state)
•       Database servers are not designed to meet this need:

            Characteristics:        Typical DBMS data      Application data
            Volume                  High                   Low
            Lifetime/turnover       Long/slow              Short/fast
            Access patterns         Complex                Simple
            Data preservation       Critical               Less critical
            Fast access/update      Less important         More important


•       Scaled-out applications create additional challenges:
         – How to make shared application data quickly accessible by any server
         – How to maintain fast access and avoid bottlenecks as the server farm grows
         – How to keep application data highly available when a server fails

    9                                                                      ScaleOut Software, Inc.
Wide Range of Applications for DDGs
Financial Services            E-commerce
• Portfolio risk analysis     • Session-state storage
• VaR calculations            • Application state storage
• Monte Carlo simulations     • Online banking
• Algorithmic trading         • Loan applications
• Market message caching      • Wealth management
• Derivatives trading         • Online learning
• Pricing calculations        • Hotel reservations
                              • News story caching
Other Applications
• Edge servers: chat, email   • Shopping carts
• Online gaming servers       • Social networking
• Scientific computations     • Service call tracking
• Command and control         • Online surveys

10                                           ScaleOut Software, Inc.
Product: ScaleOut StateServer®
Fully distributed data grid designed for storing application
  data on server farms, compute grids, and the cloud:
• Runs in-memory directly on a farm or grid as a distributed service.
• Automatically:
     – Distributes and shares                                    SOSS
       data across the farm.                                    Service
                                                       Web Server
     – Reduces access time.
     – Scales when                                               SOSS
                                                                Service
       the farm grows.




                                                                          Ethernet
                                            Ethernet
                                                       Web Server
                                Internet
     – Survives when
       a server fails.                                           SOSS
                                                                Service
                                                                                       DBMS
                                                                                       Server

• Cost-effective                                       Web Server

                                                                                       DBMS
• Complements & offloads DBMS.                                   SOSS                Bottleneck
                                                                Service
• Portable across Windows and Linux.                   Web Server



11                                                          ScaleOut Software, Inc.
Product: ScaleOut Remote Client Option
• Allows hosting ScaleOut
  StateServer on a separate
  server farm.                                 Web or Application Server Farm



• Ensures highly         Client
                       Application
                                        Client
                                      Application
                                                               Client
                                                             Application
                                                                                        Client
                                                                                      Application
                                                                                                        Client
                                                                                                      Application

  available
  connectivity to
                       Windows         Windows               Windows                     Linux           Linux
                      Remote Client   Remote Client         Remote Client             Remote Client   Remote Client



  SOSS store.
                                                        Load-balanced Connections


• Automatically
  load-balances access
  requests to minimize
                                        Windows                  Linux                  Windows
                                         SOSS                    SOSS                    SOSS



  response times.
• Uses multiple connections
  to maximize throughput.
                                                      ScaleOut StateServer Farm




12                                                                                  ScaleOut Software, Inc.
Products: Grid Computing Edition
                                             Compute Servers
• Extends ScaleOut
  StateServer for use in high
  performance computing
  (HPC) applications.
• Provides advanced
  capabilities for parallel data
                                                                                   Master
  analysis.
• Includes optional
  management tools.
                                   SOSS




                                                                        ..
                                   Service
• Complements SSI’s                                                         Data
                                                                      Bottleneck

  extended support plans.
                                                         Database Servers




13                                                        ScaleOut Software, Inc.
Products: ScaleOut GeoServer Option
Global, Multi-Site Data Grids
• Extends SOSS across multiple sites.
• Ensures against site-wide failures.
• Replicates data between
  data SOSS farms.
• Employs scalable,
  hi-av connections.
• Automatically handles
  membership changes
  at remote sites.
• Can support both “push”
  and “pull” access models.
14                                      ScaleOut Software, Inc.
Reason #1: Faster Access Time

• Eliminates repeated network data transfers.
• Eliminates repeated object deserialization.

                                  Average Response Time
                                        10KB Objects
                           3500       20:1 Read/Update

                           3000

                           2500
            Microseconds




                           2000

                           1500

                           1000

                            500

                              0
                                   DDG                    DBMS


15                                                               ScaleOut Software, Inc.
Example of Faster API Read Access
• Example for direct API access:
     – 10 KB objects, 20:1 read/update ratio
     – 3-host ScaleOut StateServer store with 3 clients
• Results:
     – Distributed cache provided >6X faster read time than database server.




16                                                        ScaleOut Software, Inc.
Reason #2: Linearly Scalable Throughput
ScaleOut StateServer automatically scales its performance to match
the size and workload of a server farm or HPC compute grid.

                                            Read/Write Throughput
                                                    10KB Objects
     Accesses / Second




                         80,000
                         60,000
                         40,000
                         20,000
                             0
                                       4       16       28       40       52       64        Nodes
                                  16,000 ------------------------------------------- 256,000 #Objects
Tests performed in Microsoft Enterprise Engineering Center
17                                                                                  ScaleOut Software, Inc.
What is Scalable Throughput?
• What it is (a perfect fit for server farms):
     –   Workload W takes time T on 1 server ( 1 W/T).
     –   Workload 2W takes time T on 2 servers (2 W/T).
     –   Workload nW takes time T on n servers (n W/T).
     –   Total completion time (i.e., response time) stays fixed.
• What it is not (common misperception):
     – Workload W takes time T/2 on 2 servers (2 W/T).
     – Workload W takes time T/n on n servers (n W/T).
• Why increase the workload with more servers?
     –   Adding servers adds overhead (e.g., networking).
     –   Increasing workload hides overheads for linear scaling.
     –   DDG must keep overheads low for linear scaling.
     –   Must not let network saturate! (Its throughput is fixed.)
18                                                       ScaleOut Software, Inc.
How SOSS Achieves Scalable Throughput
• Fully peer-to-peer architecture to eliminate bottlenecks.
• Automatically partitioned
  data storage with dynamic                   ScaleOut StateServer Distributed Cache

                                                                                         Object      Copy   Replica

  load-balancing.                    Cache
                                     Service
                                                           Cache
                                                           Service
                                                                                     Cache
                                                                                     Service
                                                                                                            Cache
                                                                                                            Service


• Fixed number of replicas
                                              Heartbeats             Heartbeats               Heartbeats




  per stored object (1 or 2)      Web or
                                Application
                                  Server
                                                        Web or
                                                      Application
                                                        Server
                                                                                  Web or
                                                                                Application
                                                                                  Server
                                                                                                        Web or
                                                                                                      Application
                                                                                                        Server

  to avoid order-n overhead                                          Ethernet




  (storage and latency)
• Patented technique for scaling quorum updates to
  stored objects
• Patented, scalable heart-beating algorithm
19                                                                    ScaleOut Software, Inc.
Integrated, Powerful Platform for Scaling
• All product features benefit from the scalable, hi-av
  architecture:                             Client
                                          Application
                                                               Client
                                                             Application
                                                                             Client
                                                                           Application
                                                                                              Client
                                                                                            Application


     – Ex. Parallel object                 Client             Client        Client           Client
                                           Library            Library       Library          Library
       eventing:                           Cache              Cache        Cache             Cache

         • All hosts handle events.
                                           Service            Service      Service           Service

                                                  ScaleOut StateServer Distributed Cache

         • Event delivery is hi-av.

     – Ex. Global replication:
         • All hosts replicate objects.
         • Caches automatically handle
           membership changes.

                                                     Local
                                                     Farm
                                                                                   Remote
                                                                                    Farm




20                                                                         ScaleOut Software, Inc.
Impact of Scalable TP on Access Latency
• Scalable, distributed data grid scales throughput and
  thereby maintains low latency:
     – DDG scales throughput by
       adding servers.                                     Access Latency vs. Throughput
     – Avoids throughput barrier




                                   Access Latency (msec)
       of a DBMS or file system.
     – Maintains low latency as
       throughput increases.
     – Network bandwidth is
       only throughput limit.
     – Also has inherently lower
                                                              Throughput (accesses / sec)
       latency due to:
        • Memory-based storage
        • Client-side caching                                       SOSS       DBMS



21                                                                         ScaleOut Software, Inc.
Putting it Together: How SOSS Works
• Creating or updating an object:
     – Client connects to a SOSS service instance and makes request.
     – Local SOSS service load-balances request to a selected host.
     – Selected host creates object and one or two remote replicas.




     Client



         SOSS        SOSS        SOSS         SOSS


         Server      Server      Server       Server


22                                                 ScaleOut Software, Inc.
How SOSS Works
• Reading an object:
     –   Client connects to SOSS service and makes request.
     –   Local SOSS service forwards to selected host.
     –   Selected host returns object’s data.
     –   Requesting host caches object for future reads.


                   Client



          SOSS          SOSS       SOSS         SOSS


          Server        Server     Server       Server


23                                                   ScaleOut Software, Inc.
How SOSS Works
• Adding a new host:
     – Neighboring hosts detect SOSS on new host.
     – Hosts automatically establish new membership.
     – Neighbor hosts migrate objects to new host to rebalance load.




        SOSS         SOSS         SOSS          SOSS            SOSS


        Server       Server       Server        Server          Server



24                                                   ScaleOut Software, Inc.
Reason #3: High Availability
• Recovering from a host failure:
     –   Host or NIC fails.
     –   Neighboring hosts detect heartbeat failure.
     –   Hosts establish new membership.
     –   Neighbor host creates new object replica to “self-heal”.




                                    STOP
          SOSS          SOSS         SOSS          SOSS


          Server        Server       Server        Server


25                                                      ScaleOut Software, Inc.
SOSS: Integrated High Availability
• Peer-to-peer architecture for maximum redundancy & scalability
• Fully integrated data replication for data redundancy, scalability, and
  ease of use:
     – Partial replicas ensure scalable storage and throughput.
     – Per-server and per-client caches ensure fast access.
• Self-discovery and self-healing for hi-av and ease of use
• Patented quorum algorithm for reliable updating with scalability
                     Client
                   Application
                                 Retrieve

                    Client Cached
                    Library Copy
                                                     Object   Copy   Replica

                    Cache             Cache       Cache              Cache
                    Service           Service     Service            Service

                           ScaleOut StateServer Distributed Cache




26                                                                             ScaleOut Software, Inc.
Reason #4: Sharing Data Across the Farm
     The first step for server farms (1998): load-balanced,
       stateless, Web applications:
     • Without the ability to share
       data, we need “sticky”
       sessions (no hi av!):                             SOSS
                                                        Service


     • Or we can overload the                  Web Server


       database server:                                  SOSS
                                                        Service




                                                                  Ethernet
                                        Ethernet
                                                   Web Server
                            Internet
                                                         SOSS
                                                        Service
                                                                             DBMS
                                                                             Server
                                                   Web Server
     • Or we can share data                              SOSS
                                                        Service
       across the farm in a
       distributed data grid for                   Web Server
       both scalability & high av.
27                                                     ScaleOut Software, Inc.
The Evolution in DDGs and Data Sharing
                          Drivers:
                          • Scaling data access & analysis are critical to
                            competitiveness.
                                                                                      Cloud Computing
                          • Server farms & the cloud are now mainstream
                                                                              using industry-standard APIs
                            computing platforms.
Market Penetration




                          • Data access is a key bottleneck.
                          • Short dev. cycles are mandatory.
                          • Standard APIs are emerging.
                                                             Expansion to new verticals
                                                               (e.g., financial services)
                                                                  for data & compute grids
                                    Early adoption on
                            Web and app. server farms
                                       for speed and hi-av


                          Session-state Application             Grid            Platform-wide        Data
                            Storage      Caching              Computing             Usage           Analysis

                           2005          2006           2007           2008             2010           2011

                     28                                                                  ScaleOut Software, Inc.
Data Sharing: a Closer Look
• Advantages of sharing data in a distributed data grid:
     – Boosts application performance and offloads the DBMS.
     – Advances & simplifies the programming model:
          • Allows “stateful” business objects
          • Keeps object/relational mapping at the data access layer
• Examples: session & profile data, business objects,
  workflow state
• Requirements of a distributed data grid:
     –   Coherent storage so all clients see a consistent view
     –   Easy-to-use APIs
     –   Integrated object locking to enable coordinated updating
     –   High availability to avoid data loss if a server fails
     –   Advanced features to enable effective use of the grid (e.g.,
         parallel query, map/reduce analysis)

29                                                              ScaleOut Software, Inc.
Basic APIs for Data Access
                .
                                                   key
•    Are easy to use in C#, Java, or C/C++.             Object

•    Store objects in the grid as serialized blobs.
•    Primarily use string or numeric keys to identify objects.
•    Group objects into name spaces (“named caches”).


     // Read and update object:
     MyClass retrievedObj;
     retrievedObj = cache["myObj"] as MyClass;

     retrievedObj.var1 = "Hello, again!";
     cache["myObj"] = retrievedObj;


30                                              ScaleOut Software, Inc.
Example: Named Cache Access (Java)
     static void Main(string argv[])
     {
        // Initialize string object to be stored:
        String s = “Test string”;

         // Create a cache collection:
         SossCache cache = SossCacheFactory.getCache(“MyCache”);

         // Store object in ScaleOut StateServer (SOSS):
         CachedObjectId id = new CachedObjectId(UUID.randomUUID());
         cache.put(id, s);

         // Read object stored in SOSS:
         String answerJNC = (String)cache.get(id);

         // Remove object from SOSS:
         cache.remove(id);
     }



31                                                   ScaleOut Software, Inc.
Example: Named Cache Access (C#)
     static void Main(string[] args)
     {
        // Initialize object to be stored:
        SampleClass sampleObj = new SampleClass();
        sampleObj.var1 = "Hello, SOSS!";

         // Create a cache:
         SossCache cache = CacheFactory.GetCache("myCache");

         // Store object in the distributed cache:
         cache["myObj"] = sampleObj;

         // Read and update object stored in cache:
         SampleClass retrievedObj = null;
         retrievedObj = cache["myObj"] as SampleClass;
         retrievedObj.var1 = "Hello, again!";
         cache["myObj"] = retrievedObj;

         // Remove object from the cache:
         cache.["myObj“] = null;
     }
32                                             ScaleOut Software, Inc.
Fully Distributed Locking
• Goal: synchronize access to a stored object by multiple client
  threads.
• Two mechanisms: pessimistic and optimistic locking
• Pessimistic uses read-modify-write semantics:
     –   Can be set as default for all objects within a named cache.
     –   Reads to locked objects are automatically retried.
     –   Locks have timeouts to handle client failures.
     –   Simple reads and updates can bypass locks.
string myObj = cache.Retrieve("key", true); // read and lock
...
cache.Update("key", “new value", true); // update and unlock

• Optimistic uses object’s version number to allow or inhibit an update:
     – User supplies version number from read to a locking update.
     – Benefit: one trip to the server if high probability of success.
33                                                          ScaleOut Software, Inc.
Advanced API Features
•    Object timeouts
•    Distributed locking for coordinating access
•    Object dependency relationships
•    Asynchronous events on object changes
•    Automatic access to a backing store
•    Object eviction on high memory usage
•    Object metadata
•    Bulk insertion
•    Authentication
•    Custom serialization for compression & encryption
•    Parallel query based on metadata or properties
34                                           ScaleOut Software, Inc.
Parallel Data Analysis
• The goal:
     – Quickly analyze a large set of data for patterns and trends.
     – Take advantage of scalable computing to shorten “time to insight.”
• Applications:
     –   Search
     –   Financial services
     –   Business intelligence
     –   Risk analysis
     –   Weather simulation
     –   Structural modeling
     –   Fluid-flow analysis
     –   Climate modeling
                                 NCAR Community Climate Model
                                  http://www.vets.ucar.edu/vg/IPCC_CCSM3/index.shtml


35                                                                 ScaleOut Software, Inc.
Reason #5: Parallel Data Analysis
• Rapid analysis of large data sets has become a top
  priority.
• Distributed data grids enable fast parallel analysis:
     – Automatically harness the power of many servers and cores.
     – Offer a simple, easy-to-use development model.
     – Deliver top performance for memory-based datasets.
• Key attributes of DDG-based                                              PMI vs. Random Access Throughput Comparison

  data analysis:                                                     600
                                                                                       2mb time series objects

                                                                            SOSS PMI


     – Data is memory-based and                                      500    Random Access




                                                Objects per Second
                                                                     400

       data motion is minimized.                                     300

                                                                     200

     – Programming model is object-                                  100



       oriented; parallelism is automatic.
                                                         0
                                              Number of Nodes 4                  8           12     16     20     24     28     32
                                             Number of Objects 512             1024         1536   2048   2560   3072   3584   4096




36                                                                                      ScaleOut Software, Inc.
Parallel Query
• Goal: identify a set of objects with selected properties.
• Uses all grid servers to scale query performance.
• Uses fast, optimized lookup on each grid server.


     Query the DDG
         in parallel.

                                               Sequentially
                                                 analyze all
                                            queried objects.

     Merge the keys
          into a list.



37                                            ScaleOut Software, Inc.
Parallel Query Example (Java)
• Mark class properties as indexes for SOSS query:
public class Stock   implements Serializable {
    private String   ticker;
    private int      totalShares;
    private double   price;

@SossIndexAttribute
public String getTicker() {
    return ticker;} … }

• Define a query using these properties:
NamedCache cache = CacheFactory.getCache("Stocks",
                                         false);
Set keys = cache.queryKeys(Stock.class,
               or(equal("ticker", "GOOG"),
                  equal("ticker", "ORCL")));

38                                           ScaleOut Software, Inc.
Parallel Query Example (C#)
• Mark class properties as indexes for SOSS query:
class Stock {
      [SossIndex]
      public string Ticker { get; set; }
      public decimal TotalShares { get; set; }
      public decimal Price { get; set; }}

• Define a query using these properties. Objects are
  automatically read into memory:
NamedCache cache = CacheFactory.GetCache("Stocks");
var q = from s in cache.QueryObjects<Stock>()
        where s.Ticker == "GOOG" || s.Ticker == "ORCL"
            select s;

Console.WriteLine("{0} Stocks found", q.Count());

39                                         ScaleOut Software, Inc.
Parallel Method Invocation (“Map/Reduce”)
• Goal: analyze a set of objects with selected properties.
• Executes user’s code in parallel across the grid.
• Uses a parallel query to select objects for analysis.



      Analyze Data (Map)
                                     In-Memory Distributed Data Grid
                                         Runs Map/Reduce Analysis.




         Combine Results
               (Reduce)

40                                            ScaleOut Software, Inc.
Example in Financial Services
Analyze trading strategies across stock histories:
Why?
• Back-testing systems help guard against risks in deploying new
  trading strategies.
• Performance is critical for “first to market” advantage.
• Uses significant amount of market data and computation time.
How?
• Write method E to analyze trading strategies across a single
  stock history.
• Write method M to merge two sets of results.
• Populate the data store with a set of stock histories.
• Run method E in parallel on all stock histories.
• Merge the results with method M to produce a report.
• Refine and repeat…
41                                                 ScaleOut Software, Inc.
Stage the Data for Analysis

• Step 1: Populate the distributed data grid with objects each of which
  represents a price history for a ticker symbol:




42                                                   ScaleOut Software, Inc.
Code the Eval and Merge Methods
•    Step 2: Write a method to evaluate a stock history based on parameters:
       Results EvalStockHistory(StockHistory history, Parameters params)
       {
           <analyze trading strategy for this stock history>
           return results;
       }

•    Step 3: Write a method to merge the results of two evaluations:
       Results MergeResuts(Results results1, Results results2)
       {
           <merge both results>
           return results;
       }

•    Notes:
      – This code can be run a sequential calculation on in-memory data.
      – No explicit accesses to the distributed data grid are used.



43                                                           ScaleOut Software, Inc.
Run the Analysis
 • Step 4: Invoke parallel evaluation and merging of results:
      Results Invoke(EvalStockHistory, MergeResults, querySpec,
      params);


EvalStockHistory()




      MergeResults()


 44                                                    ScaleOut Software, Inc.
Start parallel
  analysis

                                                 .eval()


         stock                stock     stock                 stock     stock                stock
        history              history   history               history   history              history




        results              results   results               results   results              results




                  .merge()                       .merge()                        .merge()


                   results                         results                        results




                                                 .merge()

  results returned                                 results
      to client
   45                                                                   ScaleOut Software, Inc.
Advantages of Using PMI
• Fast
                                               PMI Engine
     – Automatically scales application
       performance across grid servers.         Core     Core

     – Automatically uses all server cores.     Core     Core
     – Minimizes data motion between
       servers.
     – API-based invocation delivers very
       low latency.
• Easy to Use:
     – User writes simple, “in memory”
       code; all grid accesses are implicit.
                                                Grid Service
     – Matches Java/C# model of object-
       oriented collections.
     – Requires no tuning.
46                                             ScaleOut Software, Inc.
Comparison of DDGs and File-Based M/R
                    DDG                      File-Based M/R
Data set size       Gigabytes->terabytes     Terabytes->petabytes
Data repository     In-memory                File / database
Data view           Queried object collection File-based key/value
                                              pairs
Development time    Low                      High
Automatic           Yes                      Application
scalability                                  dependent
Best use            Quick-turn analysis of   Complex analysis of
                    memory-based data        large datasets
I/O overhead        Low                      High
Cluster mgt.        Simple                   Complex
High availability   Memory-based             File-based

47                                                  ScaleOut Software, Inc.
DDG Minimizes Data Motion
• File-based map/reduce must move data to memory for analysis:
            M/R Server                M/R Server               M/R Server

        E                     E                        E

                                                                                  Server
                                                                                  Memory



                                                                                File System /
      D         D        D    D        D           D   D        D           D     Database



• Memory-based DDG analyzes data in place:
                Grid Server             Grid Server             Grid Server

            E                     E                        E

                                                                                 Distributed
       D        D        D    D         D          D   D         D          D    Data Grid



48                                                                ScaleOut Software, Inc.
Start parallel
  analysis

                                                 .eval()
                                                  File I/O

         stock                stock     stock                 stock     stock                stock
        history              history   history               history   history              history




        results              results   results               results   results              results




                  .merge()                       .merge()                        .merge()
                                                  File I/O

                   results                         results                        results

                                                  File I/O

                                                 .merge()

  results returned                                 results
      to client
   49                                                                   ScaleOut Software, Inc.
Performance Impact of Data Motion
     Measured random access to DDG data to simulate file I/O:




50                                                   ScaleOut Software, Inc.
PMI Delivers 16X Speedup Over Hadoop

                                  Throughput Comparison
                            800
                            700
     Throughput (Obj/Sec)




                            600                               SOSS PMI
                            500
                                                              Hadoop/SOSS
                            400
                                                              Hadoop
                            300
                            200
                            100
                              0
                                  4          6            8
                                      Number of Servers



51                                                            ScaleOut Software, Inc.
Reason # 6: Simplify Data Migration
• DDGs enable seamless data migration across on-
  premise sites and the cloud:
     – Automatically access
       remote data as needed.
     – Efficiently manage
       WAN bandwidth.
     – Enable full data
       synchronization
       across sites.




                                In-Memory Distributed Data Grid

52                                              ScaleOut Software, Inc.
Example: Web Farm Cloud-Bursting
• DDGs bridge on-premise and cloud-based in-memory storage of
  Web session state.
• DDG automatically migrates session-state objects into the cloud
  on demand.
• This enables seamless access to data across multiple sites.

                                  Cloud Application                  Web Load Balancer
            Cloud Application VS
                            App              App VS

        App VS               App VS     App VS
                              App VS
                                                 App VS
                   App VS                                                           On-Premise Application 2
        App VS               App VS
                                                                                    Server App        Server App
                                                                                          On-Premise Application 2
                                             SOSS VS
                                                                                         Server App      Server App
                                       SOSS VS
                            SOSS VSVS
                              SOSS                                    Aut
                                                                          o
                 SOSS VS                                             Mig matic
                                                                        rate ally
                            Cloud-Based Distributed Automatically
                                                    Cache                   Da
                                                                               ta   SOSS Host         SOSS Host
                                                                                                      SOSS Host
         SOSS VS                                      Migrate Data
                                                                                            SOSS Host
              Cloud hosted Cloud of Virtual Servers                                        On-Premise                       Backing
          Distributed Data Grid                                                              On-Premise Cache
                                                                                      Distributed Data Grid                  Store

                                                                                            User’s On-Premise Application
        Cloud of Virtual Servers                                                     User’s On-Premise Application

                            Virtual Distributed Data Grid
53                                                                                                                    ScaleOut Software, Inc.
Example: Global Access to Shared Data


      Mirrored Data Centers
                                 SOSS SVR                              Satellite Data Centers
                          SOSS SVR
                    SOSS SVR
                                                                                          SOSS SVR
                    Distributed Data Grid                                          SOSS SVR
                      SOSS SVR
                                                                             SOSS SVR
               SOSS SVR
         SOSS SVR                                                            Distributed Data Grid

         Distributed Data Grid
                                                                                             SOSS SVR
                                                                                        SOSS SVR
                                                                               SOSS SVR

                                                                                Distributed Data Grid

                                            Global Distributed Data Grid




54                                                                         ScaleOut Software, Inc.
What to Look for in a DDG Product
                         • SSI's products have an unusually high level of integration and
     Ease of Use              focus on automatic operation. This dramatically simplifies
                               deployment and management of a distributed data grid.



     Performance          • In direct comparison tests, SSI demonstrates faster access
                                       performance and scalability in key benchmarks.


                              • SSI’s architecture integrates both scalability and high
     Architecture         availability and uniformly applies key architectural principles,
                                                            such as peer-to-peer design.


                         • Seamless interoperability across Windows and Unix (Linux,
      Portability              Solaris, etc.) operating systems was designed into SSI’s
                                                             architecture from the outset.


                          • Advanced capabilities for "map/reduce"-style parallel data
     Data Analysis      analysis open up important new applications for distributed data
                                                                                   grids.


                     • SSI’s comprehensive tools for managing distributed data grids,
     Manageability     such as its object browser and parallel backup and restore utility,
                                                               are unique in the industry.


55                                                            ScaleOut Software, Inc.
SOSS Maximizes Ease of Use
   Grid servers self-aggregate, self-heal, and automatically load-balance.




Tree list shows:                                                           Host
   • Store status                                                configuration
       • Host list                                                        pane:
    • Host status                                                  Just need to
• Remote stores                                                   select subnet
 • Remote client                                                  shared by all
    configuration                                                         hosts.




  56                                                    ScaleOut Software, Inc.
Real-time Performance Charting




57                           ScaleOut Software, Inc.
SOSS Object Browser
• Simplifies development.
• Provides extremely useful visibility into grid usage.
• Allows grid objects to be analyzed and managed.




58                                           ScaleOut Software, Inc.
SOSS Parallel Backup and Restore
• Enables grid contents (or portions) to be backed up or
  restored in parallel either to:
     – Separate file systems on all caching servers or
     – A single network file share

• Creates backups or snapshots for later analysis.
• Makes full use of SOSS’s parallel implementation to
  deliver highly scalable performance and high availability.
                         Ethernet                                   Ethernet




        SOSS      SOSS        SOSS       SOSS      SOSS      SOSS        SOSS       SOSS




         Server    Server       Server    Server    Server    Server       Server    Server
                         Ethernet                                   Ethernet




59                                                                              ScaleOut Software, Inc.
Recap: Top 6 Reasons to Use a DDG
1. Faster access time for business logic state or database data
2. Scalable throughput to match a growing workload and keep
   response times low
3. High availability to prevent data loss if a grid server (or network
   link) fails
                                                              Access Latency vs. Throughput
4. Shared access to data across




                                      Access Latency (msec)
   the server farm                                              Grid     DBMS


5. Advanced capabilities
   for quickly and easily mining
   data using scalable,
   “map/reduce,” analysis
6. Transparent data migration
   across multiple sites and the                                Throughput (accesses / sec)

   cloud.
60                                                                          ScaleOut Software, Inc.
Thank you for joining us today!




            Distributed Data Grids for
  Server Farms & High Performance Computing

          www.scaleoutsoftware.com

Contenu connexe

Tendances

A Hybrid Technology Platform for Increasing the Speed of Operational Analytics
A Hybrid Technology Platform for Increasing the Speed of Operational AnalyticsA Hybrid Technology Platform for Increasing the Speed of Operational Analytics
A Hybrid Technology Platform for Increasing the Speed of Operational AnalyticsIBMGovernmentCA
 
Migrate to share point 2013 with avepoint 2.14.13
Migrate to share point 2013 with avepoint 2.14.13Migrate to share point 2013 with avepoint 2.14.13
Migrate to share point 2013 with avepoint 2.14.13Mary Leigh Mackie
 
Storage Options in Windows Server 2012
Storage Options in Windows Server 2012Storage Options in Windows Server 2012
Storage Options in Windows Server 2012Lai Yoong Seng
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Tony Pearson
 
VMware Hyper-Converged: EVO:RAIL Overview
VMware Hyper-Converged: EVO:RAIL OverviewVMware Hyper-Converged: EVO:RAIL Overview
VMware Hyper-Converged: EVO:RAIL OverviewRolta AdvizeX
 
Software Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture TechnologiesSoftware Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture TechnologiesOdinot Stanislas
 
Cloud Computing - Making IT Simple
 Cloud Computing - Making IT Simple Cloud Computing - Making IT Simple
Cloud Computing - Making IT SimpleBob Rhubart
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics systemModusOptimum
 
Voith increases performance and saves license and maintenance costs by introd...
Voith increases performance and saves license and maintenance costs by introd...Voith increases performance and saves license and maintenance costs by introd...
Voith increases performance and saves license and maintenance costs by introd...IBM India Smarter Computing
 
ITCamp 2012 - Adrian Stoian - Migrating from CFG MGR 2007 to CFG MGR 2012
ITCamp 2012 - Adrian Stoian - Migrating from CFG MGR 2007 to CFG MGR 2012ITCamp 2012 - Adrian Stoian - Migrating from CFG MGR 2007 to CFG MGR 2012
ITCamp 2012 - Adrian Stoian - Migrating from CFG MGR 2007 to CFG MGR 2012ITCamp
 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageIBM Power Systems
 
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Andrew Miller
 
Gluster Blog 11.15.2010
Gluster Blog 11.15.2010Gluster Blog 11.15.2010
Gluster Blog 11.15.2010GlusterFS
 
My sql competitive update
My sql competitive updateMy sql competitive update
My sql competitive updatexKinAnx
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
Superior Cloud Economics with Power Systems
Superior Cloud Economics with Power Systems Superior Cloud Economics with Power Systems
Superior Cloud Economics with Power Systems IBM Power Systems
 
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseMark Ginnebaugh
 
Storage simplicity value_110810
Storage simplicity value_110810Storage simplicity value_110810
Storage simplicity value_110810rjmurphyslideshare
 

Tendances (20)

A Hybrid Technology Platform for Increasing the Speed of Operational Analytics
A Hybrid Technology Platform for Increasing the Speed of Operational AnalyticsA Hybrid Technology Platform for Increasing the Speed of Operational Analytics
A Hybrid Technology Platform for Increasing the Speed of Operational Analytics
 
Migrate to share point 2013 with avepoint 2.14.13
Migrate to share point 2013 with avepoint 2.14.13Migrate to share point 2013 with avepoint 2.14.13
Migrate to share point 2013 with avepoint 2.14.13
 
Storage Options in Windows Server 2012
Storage Options in Windows Server 2012Storage Options in Windows Server 2012
Storage Options in Windows Server 2012
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4
 
VMware Hyper-Converged: EVO:RAIL Overview
VMware Hyper-Converged: EVO:RAIL OverviewVMware Hyper-Converged: EVO:RAIL Overview
VMware Hyper-Converged: EVO:RAIL Overview
 
Software Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture TechnologiesSoftware Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture Technologies
 
Cloud Computing - Making IT Simple
 Cloud Computing - Making IT Simple Cloud Computing - Making IT Simple
Cloud Computing - Making IT Simple
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
Voith increases performance and saves license and maintenance costs by introd...
Voith increases performance and saves license and maintenance costs by introd...Voith increases performance and saves license and maintenance costs by introd...
Voith increases performance and saves license and maintenance costs by introd...
 
ITCamp 2012 - Adrian Stoian - Migrating from CFG MGR 2007 to CFG MGR 2012
ITCamp 2012 - Adrian Stoian - Migrating from CFG MGR 2007 to CFG MGR 2012ITCamp 2012 - Adrian Stoian - Migrating from CFG MGR 2007 to CFG MGR 2012
ITCamp 2012 - Adrian Stoian - Migrating from CFG MGR 2007 to CFG MGR 2012
 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems Advantage
 
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
 
Gluster Blog 11.15.2010
Gluster Blog 11.15.2010Gluster Blog 11.15.2010
Gluster Blog 11.15.2010
 
2018 jk
2018 jk2018 jk
2018 jk
 
My sql competitive update
My sql competitive updateMy sql competitive update
My sql competitive update
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Superior Cloud Economics with Power Systems
Superior Cloud Economics with Power Systems Superior Cloud Economics with Power Systems
Superior Cloud Economics with Power Systems
 
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
 
Storage simplicity value_110810
Storage simplicity value_110810Storage simplicity value_110810
Storage simplicity value_110810
 

Similaire à Top 6 Reasons to Use a Distributed Data Grid

Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud applicationNoam Sheffer
 
VMware - Snapshot sessions - Deploy and manage tomorrow's applications today
VMware - Snapshot sessions  - Deploy and manage tomorrow's applications todayVMware - Snapshot sessions  - Deploy and manage tomorrow's applications today
VMware - Snapshot sessions - Deploy and manage tomorrow's applications todayAnnSteyaert_vmware
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Odinot Stanislas
 
Elastic Caching for a Smarter Planet - Make Every Transaction Count
Elastic Caching for a Smarter Planet - Make Every Transaction CountElastic Caching for a Smarter Planet - Make Every Transaction Count
Elastic Caching for a Smarter Planet - Make Every Transaction CountYakura Coffee
 
Scalable Resilient Web Services In .Net
Scalable Resilient Web Services In .NetScalable Resilient Web Services In .Net
Scalable Resilient Web Services In .NetBala Subra
 
Scaling With Sun Systems For MySQL Jan09
Scaling With Sun Systems For MySQL Jan09Scaling With Sun Systems For MySQL Jan09
Scaling With Sun Systems For MySQL Jan09Steve Staso
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowAndrew Miller
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsMark Slingsby
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAmazon Web Services
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Dataexponential-inc
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352sflynn073
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applicationsGigaSpaces
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareData Con LA
 

Similaire à Top 6 Reasons to Use a Distributed Data Grid (20)

Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
VMware - Snapshot sessions - Deploy and manage tomorrow's applications today
VMware - Snapshot sessions  - Deploy and manage tomorrow's applications todayVMware - Snapshot sessions  - Deploy and manage tomorrow's applications today
VMware - Snapshot sessions - Deploy and manage tomorrow's applications today
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
 
Elastic Caching for a Smarter Planet - Make Every Transaction Count
Elastic Caching for a Smarter Planet - Make Every Transaction CountElastic Caching for a Smarter Planet - Make Every Transaction Count
Elastic Caching for a Smarter Planet - Make Every Transaction Count
 
Scalable Resilient Web Services In .Net
Scalable Resilient Web Services In .NetScalable Resilient Web Services In .Net
Scalable Resilient Web Services In .Net
 
Scaling With Sun Systems For MySQL Jan09
Scaling With Sun Systems For MySQL Jan09Scaling With Sun Systems For MySQL Jan09
Scaling With Sun Systems For MySQL Jan09
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - Varrow
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web Apps
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
High Performance Databases
High Performance DatabasesHigh Performance Databases
High Performance Databases
 
SQL Azure for ITPros
SQL Azure for ITProsSQL Azure for ITPros
SQL Azure for ITPros
 
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applications
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 

Dernier

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Dernier (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Top 6 Reasons to Use a Distributed Data Grid

  • 1. X The Top Five Six Reasons to Use a Distributed Data Grid Webinar December, 2011 Bill Bain (wbain@scaleoutsoftware.com) Copyright © 2011 by ScaleOut Software, Inc.
  • 2. Agenda • About ScaleOut Software • Overview of Products • What is a Distributed Data Grid (DDG)? • The Top Six Reasons • What to Look for in a DDG Product 2 ScaleOut Software, Inc.
  • 3. Company • Founded in September 2003, privately funded • Offices in Bellevue, WA and Beaverton, OR • Team: – Dr. William Bain, Founder & CEO • Career focused on parallel computing – Bell Labs, Intel, Microsoft • 3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server – David Brinker, COO • 20 years software business and executive management experience • Mentor Graphics, Cadence, Webridge • Develops and markets Linux & Windows DDG products. • Seven years market experience. 3 ScaleOut Software, Inc.
  • 4. It’s All About Scaling Performance • Scaling performance: SCALE OUT CPU Memory Scale Out Storage CPU CPU CPU CPU Memory Memory Memory Memory Scaling out: • Has excellent scalability. Storage Storage Storage Storage • But is challenging to implement. 4 ScaleOut Software, Inc.
  • 5. What is a Distributed Data Grid? (Aka “distributed cache”, “in-memory data grid”) Processor Processor • A new “vertical” storage tier: Cache Cache – Adds missing layer to boost performance. L2 Cache L2 Cache – Uses in-memory, out-of-process storage. Application Application Memory Memory – Avoids repeated trips to backing “In-Process” “In-Process” storage. Distributed Distributed • A new “horizontal” storage tier: Data Grid Data Grid “Out-of- “Out-of- – Allows data sharing among servers. Process” Process” – Scales performance & capacity. – Adds high availability. Backing Storage – Can be used independently of backing storage. 5 ScaleOut Software, Inc.
  • 6. Distributed Data Grids: A Closer Look • Incorporates a client-side, in- process cache (“near cache”): Application – Transparent to the application Memory – Holds recently accessed data. “In-Process” Client-side • Boosts performance: Cache – Eliminates repeated network data “In-Process” Distributed transfers & deserialization. Data Grid – Reduces access times to near “in- “Out-of- process” latency. Process” – Is automatically updated if the distributed grid changes. – Supports various coherency models (coherent, polled, event-driven) 6 ScaleOut Software, Inc.
  • 7. The Need for Memory-Based Storage Example: Web server farm: Internet • Load-balancer directs Load-balancer incoming client requests POW ER FAU LT DATA AL A RM Ethernet to Web servers. • Web and app. server farms build Web pages W eb Server Distributed, In-Memory DataServer W eb Server W eb Server W eb Server W eb Server W eb Grid and run business logic. Ethernet • Database server holds all mission-critical, LOB data. D atabase R aid D isk D atabase Server Array Server Bottleneck • Server farms share fast- Ethernet changing data using a Distributed, In-Memory Data Grid DDG to avoid bottlenecks and maximize scalability. App. Server App. Server App. Server App. Server 7 ScaleOut Software, Inc.
  • 8. The Need for Memory-Based Storage Example: Cloud Application: Cloud Application App VS • Application runs as multiple, App VS virtual servers (VS). App VS App VS App VS • Application instances store and retrieve LOB data from cloud- Grid VS based file system or database. Grid VS Grid VS Distributed Data Grid • Applications need fast, scalable storage for fast-changing data. • Distributed data grid runs as multiple, virtual servers to provide “elastic,” in-memory storage. Cloud-Based Storage 8 ScaleOut Software, Inc.
  • 9. Scalability Challenges for Applications • “Scaled out” server applications repeatedly access two types of data: – Repeatedly referenced database-data (e.g., stock prices) and – Fast changing, business-logic data (e.g., session-state, workflow state) • Database servers are not designed to meet this need: Characteristics: Typical DBMS data Application data Volume High Low Lifetime/turnover Long/slow Short/fast Access patterns Complex Simple Data preservation Critical Less critical Fast access/update Less important More important • Scaled-out applications create additional challenges: – How to make shared application data quickly accessible by any server – How to maintain fast access and avoid bottlenecks as the server farm grows – How to keep application data highly available when a server fails 9 ScaleOut Software, Inc.
  • 10. Wide Range of Applications for DDGs Financial Services E-commerce • Portfolio risk analysis • Session-state storage • VaR calculations • Application state storage • Monte Carlo simulations • Online banking • Algorithmic trading • Loan applications • Market message caching • Wealth management • Derivatives trading • Online learning • Pricing calculations • Hotel reservations • News story caching Other Applications • Edge servers: chat, email • Shopping carts • Online gaming servers • Social networking • Scientific computations • Service call tracking • Command and control • Online surveys 10 ScaleOut Software, Inc.
  • 11. Product: ScaleOut StateServer® Fully distributed data grid designed for storing application data on server farms, compute grids, and the cloud: • Runs in-memory directly on a farm or grid as a distributed service. • Automatically: – Distributes and shares SOSS data across the farm. Service Web Server – Reduces access time. – Scales when SOSS Service the farm grows. Ethernet Ethernet Web Server Internet – Survives when a server fails. SOSS Service DBMS Server • Cost-effective Web Server DBMS • Complements & offloads DBMS. SOSS Bottleneck Service • Portable across Windows and Linux. Web Server 11 ScaleOut Software, Inc.
  • 12. Product: ScaleOut Remote Client Option • Allows hosting ScaleOut StateServer on a separate server farm. Web or Application Server Farm • Ensures highly Client Application Client Application Client Application Client Application Client Application available connectivity to Windows Windows Windows Linux Linux Remote Client Remote Client Remote Client Remote Client Remote Client SOSS store. Load-balanced Connections • Automatically load-balances access requests to minimize Windows Linux Windows SOSS SOSS SOSS response times. • Uses multiple connections to maximize throughput. ScaleOut StateServer Farm 12 ScaleOut Software, Inc.
  • 13. Products: Grid Computing Edition Compute Servers • Extends ScaleOut StateServer for use in high performance computing (HPC) applications. • Provides advanced capabilities for parallel data Master analysis. • Includes optional management tools. SOSS .. Service • Complements SSI’s Data Bottleneck extended support plans. Database Servers 13 ScaleOut Software, Inc.
  • 14. Products: ScaleOut GeoServer Option Global, Multi-Site Data Grids • Extends SOSS across multiple sites. • Ensures against site-wide failures. • Replicates data between data SOSS farms. • Employs scalable, hi-av connections. • Automatically handles membership changes at remote sites. • Can support both “push” and “pull” access models. 14 ScaleOut Software, Inc.
  • 15. Reason #1: Faster Access Time • Eliminates repeated network data transfers. • Eliminates repeated object deserialization. Average Response Time 10KB Objects 3500 20:1 Read/Update 3000 2500 Microseconds 2000 1500 1000 500 0 DDG DBMS 15 ScaleOut Software, Inc.
  • 16. Example of Faster API Read Access • Example for direct API access: – 10 KB objects, 20:1 read/update ratio – 3-host ScaleOut StateServer store with 3 clients • Results: – Distributed cache provided >6X faster read time than database server. 16 ScaleOut Software, Inc.
  • 17. Reason #2: Linearly Scalable Throughput ScaleOut StateServer automatically scales its performance to match the size and workload of a server farm or HPC compute grid. Read/Write Throughput 10KB Objects Accesses / Second 80,000 60,000 40,000 20,000 0 4 16 28 40 52 64 Nodes 16,000 ------------------------------------------- 256,000 #Objects Tests performed in Microsoft Enterprise Engineering Center 17 ScaleOut Software, Inc.
  • 18. What is Scalable Throughput? • What it is (a perfect fit for server farms): – Workload W takes time T on 1 server ( 1 W/T). – Workload 2W takes time T on 2 servers (2 W/T). – Workload nW takes time T on n servers (n W/T). – Total completion time (i.e., response time) stays fixed. • What it is not (common misperception): – Workload W takes time T/2 on 2 servers (2 W/T). – Workload W takes time T/n on n servers (n W/T). • Why increase the workload with more servers? – Adding servers adds overhead (e.g., networking). – Increasing workload hides overheads for linear scaling. – DDG must keep overheads low for linear scaling. – Must not let network saturate! (Its throughput is fixed.) 18 ScaleOut Software, Inc.
  • 19. How SOSS Achieves Scalable Throughput • Fully peer-to-peer architecture to eliminate bottlenecks. • Automatically partitioned data storage with dynamic ScaleOut StateServer Distributed Cache Object Copy Replica load-balancing. Cache Service Cache Service Cache Service Cache Service • Fixed number of replicas Heartbeats Heartbeats Heartbeats per stored object (1 or 2) Web or Application Server Web or Application Server Web or Application Server Web or Application Server to avoid order-n overhead Ethernet (storage and latency) • Patented technique for scaling quorum updates to stored objects • Patented, scalable heart-beating algorithm 19 ScaleOut Software, Inc.
  • 20. Integrated, Powerful Platform for Scaling • All product features benefit from the scalable, hi-av architecture: Client Application Client Application Client Application Client Application – Ex. Parallel object Client Client Client Client Library Library Library Library eventing: Cache Cache Cache Cache • All hosts handle events. Service Service Service Service ScaleOut StateServer Distributed Cache • Event delivery is hi-av. – Ex. Global replication: • All hosts replicate objects. • Caches automatically handle membership changes. Local Farm Remote Farm 20 ScaleOut Software, Inc.
  • 21. Impact of Scalable TP on Access Latency • Scalable, distributed data grid scales throughput and thereby maintains low latency: – DDG scales throughput by adding servers. Access Latency vs. Throughput – Avoids throughput barrier Access Latency (msec) of a DBMS or file system. – Maintains low latency as throughput increases. – Network bandwidth is only throughput limit. – Also has inherently lower Throughput (accesses / sec) latency due to: • Memory-based storage • Client-side caching SOSS DBMS 21 ScaleOut Software, Inc.
  • 22. Putting it Together: How SOSS Works • Creating or updating an object: – Client connects to a SOSS service instance and makes request. – Local SOSS service load-balances request to a selected host. – Selected host creates object and one or two remote replicas. Client SOSS SOSS SOSS SOSS Server Server Server Server 22 ScaleOut Software, Inc.
  • 23. How SOSS Works • Reading an object: – Client connects to SOSS service and makes request. – Local SOSS service forwards to selected host. – Selected host returns object’s data. – Requesting host caches object for future reads. Client SOSS SOSS SOSS SOSS Server Server Server Server 23 ScaleOut Software, Inc.
  • 24. How SOSS Works • Adding a new host: – Neighboring hosts detect SOSS on new host. – Hosts automatically establish new membership. – Neighbor hosts migrate objects to new host to rebalance load. SOSS SOSS SOSS SOSS SOSS Server Server Server Server Server 24 ScaleOut Software, Inc.
  • 25. Reason #3: High Availability • Recovering from a host failure: – Host or NIC fails. – Neighboring hosts detect heartbeat failure. – Hosts establish new membership. – Neighbor host creates new object replica to “self-heal”. STOP SOSS SOSS SOSS SOSS Server Server Server Server 25 ScaleOut Software, Inc.
  • 26. SOSS: Integrated High Availability • Peer-to-peer architecture for maximum redundancy & scalability • Fully integrated data replication for data redundancy, scalability, and ease of use: – Partial replicas ensure scalable storage and throughput. – Per-server and per-client caches ensure fast access. • Self-discovery and self-healing for hi-av and ease of use • Patented quorum algorithm for reliable updating with scalability Client Application Retrieve Client Cached Library Copy Object Copy Replica Cache Cache Cache Cache Service Service Service Service ScaleOut StateServer Distributed Cache 26 ScaleOut Software, Inc.
  • 27. Reason #4: Sharing Data Across the Farm The first step for server farms (1998): load-balanced, stateless, Web applications: • Without the ability to share data, we need “sticky” sessions (no hi av!): SOSS Service • Or we can overload the Web Server database server: SOSS Service Ethernet Ethernet Web Server Internet SOSS Service DBMS Server Web Server • Or we can share data SOSS Service across the farm in a distributed data grid for Web Server both scalability & high av. 27 ScaleOut Software, Inc.
  • 28. The Evolution in DDGs and Data Sharing Drivers: • Scaling data access & analysis are critical to competitiveness. Cloud Computing • Server farms & the cloud are now mainstream using industry-standard APIs computing platforms. Market Penetration • Data access is a key bottleneck. • Short dev. cycles are mandatory. • Standard APIs are emerging. Expansion to new verticals (e.g., financial services) for data & compute grids Early adoption on Web and app. server farms for speed and hi-av Session-state Application Grid Platform-wide Data Storage Caching Computing Usage Analysis 2005 2006 2007 2008 2010 2011 28 ScaleOut Software, Inc.
  • 29. Data Sharing: a Closer Look • Advantages of sharing data in a distributed data grid: – Boosts application performance and offloads the DBMS. – Advances & simplifies the programming model: • Allows “stateful” business objects • Keeps object/relational mapping at the data access layer • Examples: session & profile data, business objects, workflow state • Requirements of a distributed data grid: – Coherent storage so all clients see a consistent view – Easy-to-use APIs – Integrated object locking to enable coordinated updating – High availability to avoid data loss if a server fails – Advanced features to enable effective use of the grid (e.g., parallel query, map/reduce analysis) 29 ScaleOut Software, Inc.
  • 30. Basic APIs for Data Access . key • Are easy to use in C#, Java, or C/C++. Object • Store objects in the grid as serialized blobs. • Primarily use string or numeric keys to identify objects. • Group objects into name spaces (“named caches”). // Read and update object: MyClass retrievedObj; retrievedObj = cache["myObj"] as MyClass; retrievedObj.var1 = "Hello, again!"; cache["myObj"] = retrievedObj; 30 ScaleOut Software, Inc.
  • 31. Example: Named Cache Access (Java) static void Main(string argv[]) { // Initialize string object to be stored: String s = “Test string”; // Create a cache collection: SossCache cache = SossCacheFactory.getCache(“MyCache”); // Store object in ScaleOut StateServer (SOSS): CachedObjectId id = new CachedObjectId(UUID.randomUUID()); cache.put(id, s); // Read object stored in SOSS: String answerJNC = (String)cache.get(id); // Remove object from SOSS: cache.remove(id); } 31 ScaleOut Software, Inc.
  • 32. Example: Named Cache Access (C#) static void Main(string[] args) { // Initialize object to be stored: SampleClass sampleObj = new SampleClass(); sampleObj.var1 = "Hello, SOSS!"; // Create a cache: SossCache cache = CacheFactory.GetCache("myCache"); // Store object in the distributed cache: cache["myObj"] = sampleObj; // Read and update object stored in cache: SampleClass retrievedObj = null; retrievedObj = cache["myObj"] as SampleClass; retrievedObj.var1 = "Hello, again!"; cache["myObj"] = retrievedObj; // Remove object from the cache: cache.["myObj“] = null; } 32 ScaleOut Software, Inc.
  • 33. Fully Distributed Locking • Goal: synchronize access to a stored object by multiple client threads. • Two mechanisms: pessimistic and optimistic locking • Pessimistic uses read-modify-write semantics: – Can be set as default for all objects within a named cache. – Reads to locked objects are automatically retried. – Locks have timeouts to handle client failures. – Simple reads and updates can bypass locks. string myObj = cache.Retrieve("key", true); // read and lock ... cache.Update("key", “new value", true); // update and unlock • Optimistic uses object’s version number to allow or inhibit an update: – User supplies version number from read to a locking update. – Benefit: one trip to the server if high probability of success. 33 ScaleOut Software, Inc.
  • 34. Advanced API Features • Object timeouts • Distributed locking for coordinating access • Object dependency relationships • Asynchronous events on object changes • Automatic access to a backing store • Object eviction on high memory usage • Object metadata • Bulk insertion • Authentication • Custom serialization for compression & encryption • Parallel query based on metadata or properties 34 ScaleOut Software, Inc.
  • 35. Parallel Data Analysis • The goal: – Quickly analyze a large set of data for patterns and trends. – Take advantage of scalable computing to shorten “time to insight.” • Applications: – Search – Financial services – Business intelligence – Risk analysis – Weather simulation – Structural modeling – Fluid-flow analysis – Climate modeling NCAR Community Climate Model http://www.vets.ucar.edu/vg/IPCC_CCSM3/index.shtml 35 ScaleOut Software, Inc.
  • 36. Reason #5: Parallel Data Analysis • Rapid analysis of large data sets has become a top priority. • Distributed data grids enable fast parallel analysis: – Automatically harness the power of many servers and cores. – Offer a simple, easy-to-use development model. – Deliver top performance for memory-based datasets. • Key attributes of DDG-based PMI vs. Random Access Throughput Comparison data analysis: 600 2mb time series objects SOSS PMI – Data is memory-based and 500 Random Access Objects per Second 400 data motion is minimized. 300 200 – Programming model is object- 100 oriented; parallelism is automatic. 0 Number of Nodes 4 8 12 16 20 24 28 32 Number of Objects 512 1024 1536 2048 2560 3072 3584 4096 36 ScaleOut Software, Inc.
  • 37. Parallel Query • Goal: identify a set of objects with selected properties. • Uses all grid servers to scale query performance. • Uses fast, optimized lookup on each grid server. Query the DDG in parallel. Sequentially analyze all queried objects. Merge the keys into a list. 37 ScaleOut Software, Inc.
  • 38. Parallel Query Example (Java) • Mark class properties as indexes for SOSS query: public class Stock implements Serializable { private String ticker; private int totalShares; private double price; @SossIndexAttribute public String getTicker() { return ticker;} … } • Define a query using these properties: NamedCache cache = CacheFactory.getCache("Stocks", false); Set keys = cache.queryKeys(Stock.class, or(equal("ticker", "GOOG"), equal("ticker", "ORCL"))); 38 ScaleOut Software, Inc.
  • 39. Parallel Query Example (C#) • Mark class properties as indexes for SOSS query: class Stock { [SossIndex] public string Ticker { get; set; } public decimal TotalShares { get; set; } public decimal Price { get; set; }} • Define a query using these properties. Objects are automatically read into memory: NamedCache cache = CacheFactory.GetCache("Stocks"); var q = from s in cache.QueryObjects<Stock>() where s.Ticker == "GOOG" || s.Ticker == "ORCL" select s; Console.WriteLine("{0} Stocks found", q.Count()); 39 ScaleOut Software, Inc.
  • 40. Parallel Method Invocation (“Map/Reduce”) • Goal: analyze a set of objects with selected properties. • Executes user’s code in parallel across the grid. • Uses a parallel query to select objects for analysis. Analyze Data (Map) In-Memory Distributed Data Grid Runs Map/Reduce Analysis. Combine Results (Reduce) 40 ScaleOut Software, Inc.
  • 41. Example in Financial Services Analyze trading strategies across stock histories: Why? • Back-testing systems help guard against risks in deploying new trading strategies. • Performance is critical for “first to market” advantage. • Uses significant amount of market data and computation time. How? • Write method E to analyze trading strategies across a single stock history. • Write method M to merge two sets of results. • Populate the data store with a set of stock histories. • Run method E in parallel on all stock histories. • Merge the results with method M to produce a report. • Refine and repeat… 41 ScaleOut Software, Inc.
  • 42. Stage the Data for Analysis • Step 1: Populate the distributed data grid with objects each of which represents a price history for a ticker symbol: 42 ScaleOut Software, Inc.
  • 43. Code the Eval and Merge Methods • Step 2: Write a method to evaluate a stock history based on parameters: Results EvalStockHistory(StockHistory history, Parameters params) { <analyze trading strategy for this stock history> return results; } • Step 3: Write a method to merge the results of two evaluations: Results MergeResuts(Results results1, Results results2) { <merge both results> return results; } • Notes: – This code can be run a sequential calculation on in-memory data. – No explicit accesses to the distributed data grid are used. 43 ScaleOut Software, Inc.
  • 44. Run the Analysis • Step 4: Invoke parallel evaluation and merging of results: Results Invoke(EvalStockHistory, MergeResults, querySpec, params); EvalStockHistory() MergeResults() 44 ScaleOut Software, Inc.
  • 45. Start parallel analysis .eval() stock stock stock stock stock stock history history history history history history results results results results results results .merge() .merge() .merge() results results results .merge() results returned results to client 45 ScaleOut Software, Inc.
  • 46. Advantages of Using PMI • Fast PMI Engine – Automatically scales application performance across grid servers. Core Core – Automatically uses all server cores. Core Core – Minimizes data motion between servers. – API-based invocation delivers very low latency. • Easy to Use: – User writes simple, “in memory” code; all grid accesses are implicit. Grid Service – Matches Java/C# model of object- oriented collections. – Requires no tuning. 46 ScaleOut Software, Inc.
  • 47. Comparison of DDGs and File-Based M/R DDG File-Based M/R Data set size Gigabytes->terabytes Terabytes->petabytes Data repository In-memory File / database Data view Queried object collection File-based key/value pairs Development time Low High Automatic Yes Application scalability dependent Best use Quick-turn analysis of Complex analysis of memory-based data large datasets I/O overhead Low High Cluster mgt. Simple Complex High availability Memory-based File-based 47 ScaleOut Software, Inc.
  • 48. DDG Minimizes Data Motion • File-based map/reduce must move data to memory for analysis: M/R Server M/R Server M/R Server E E E Server Memory File System / D D D D D D D D D Database • Memory-based DDG analyzes data in place: Grid Server Grid Server Grid Server E E E Distributed D D D D D D D D D Data Grid 48 ScaleOut Software, Inc.
  • 49. Start parallel analysis .eval() File I/O stock stock stock stock stock stock history history history history history history results results results results results results .merge() .merge() .merge() File I/O results results results File I/O .merge() results returned results to client 49 ScaleOut Software, Inc.
  • 50. Performance Impact of Data Motion Measured random access to DDG data to simulate file I/O: 50 ScaleOut Software, Inc.
  • 51. PMI Delivers 16X Speedup Over Hadoop Throughput Comparison 800 700 Throughput (Obj/Sec) 600 SOSS PMI 500 Hadoop/SOSS 400 Hadoop 300 200 100 0 4 6 8 Number of Servers 51 ScaleOut Software, Inc.
  • 52. Reason # 6: Simplify Data Migration • DDGs enable seamless data migration across on- premise sites and the cloud: – Automatically access remote data as needed. – Efficiently manage WAN bandwidth. – Enable full data synchronization across sites. In-Memory Distributed Data Grid 52 ScaleOut Software, Inc.
  • 53. Example: Web Farm Cloud-Bursting • DDGs bridge on-premise and cloud-based in-memory storage of Web session state. • DDG automatically migrates session-state objects into the cloud on demand. • This enables seamless access to data across multiple sites. Cloud Application Web Load Balancer Cloud Application VS App App VS App VS App VS App VS App VS App VS App VS On-Premise Application 2 App VS App VS Server App Server App On-Premise Application 2 SOSS VS Server App Server App SOSS VS SOSS VSVS SOSS Aut o SOSS VS Mig matic rate ally Cloud-Based Distributed Automatically Cache Da ta SOSS Host SOSS Host SOSS Host SOSS VS Migrate Data SOSS Host Cloud hosted Cloud of Virtual Servers On-Premise Backing Distributed Data Grid On-Premise Cache Distributed Data Grid Store User’s On-Premise Application Cloud of Virtual Servers User’s On-Premise Application Virtual Distributed Data Grid 53 ScaleOut Software, Inc.
  • 54. Example: Global Access to Shared Data Mirrored Data Centers SOSS SVR Satellite Data Centers SOSS SVR SOSS SVR SOSS SVR Distributed Data Grid SOSS SVR SOSS SVR SOSS SVR SOSS SVR SOSS SVR Distributed Data Grid Distributed Data Grid SOSS SVR SOSS SVR SOSS SVR Distributed Data Grid Global Distributed Data Grid 54 ScaleOut Software, Inc.
  • 55. What to Look for in a DDG Product • SSI's products have an unusually high level of integration and Ease of Use focus on automatic operation. This dramatically simplifies deployment and management of a distributed data grid. Performance • In direct comparison tests, SSI demonstrates faster access performance and scalability in key benchmarks. • SSI’s architecture integrates both scalability and high Architecture availability and uniformly applies key architectural principles, such as peer-to-peer design. • Seamless interoperability across Windows and Unix (Linux, Portability Solaris, etc.) operating systems was designed into SSI’s architecture from the outset. • Advanced capabilities for "map/reduce"-style parallel data Data Analysis analysis open up important new applications for distributed data grids. • SSI’s comprehensive tools for managing distributed data grids, Manageability such as its object browser and parallel backup and restore utility, are unique in the industry. 55 ScaleOut Software, Inc.
  • 56. SOSS Maximizes Ease of Use Grid servers self-aggregate, self-heal, and automatically load-balance. Tree list shows: Host • Store status configuration • Host list pane: • Host status Just need to • Remote stores select subnet • Remote client shared by all configuration hosts. 56 ScaleOut Software, Inc.
  • 57. Real-time Performance Charting 57 ScaleOut Software, Inc.
  • 58. SOSS Object Browser • Simplifies development. • Provides extremely useful visibility into grid usage. • Allows grid objects to be analyzed and managed. 58 ScaleOut Software, Inc.
  • 59. SOSS Parallel Backup and Restore • Enables grid contents (or portions) to be backed up or restored in parallel either to: – Separate file systems on all caching servers or – A single network file share • Creates backups or snapshots for later analysis. • Makes full use of SOSS’s parallel implementation to deliver highly scalable performance and high availability. Ethernet Ethernet SOSS SOSS SOSS SOSS SOSS SOSS SOSS SOSS Server Server Server Server Server Server Server Server Ethernet Ethernet 59 ScaleOut Software, Inc.
  • 60. Recap: Top 6 Reasons to Use a DDG 1. Faster access time for business logic state or database data 2. Scalable throughput to match a growing workload and keep response times low 3. High availability to prevent data loss if a grid server (or network link) fails Access Latency vs. Throughput 4. Shared access to data across Access Latency (msec) the server farm Grid DBMS 5. Advanced capabilities for quickly and easily mining data using scalable, “map/reduce,” analysis 6. Transparent data migration across multiple sites and the Throughput (accesses / sec) cloud. 60 ScaleOut Software, Inc.
  • 61. Thank you for joining us today! Distributed Data Grids for Server Farms & High Performance Computing www.scaleoutsoftware.com