SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
A Developer’s Guide to Unleashing Your Data
in High-Performance Applications




                                         Marc Jacobs
                                            Director
                                   marcja@lab49.com
To make sure that you’re in the right room
   From the software developer’s view:
     The basics of distributed caches
      ▪ What are they? What services do they provide? What
        purpose do they serve? How do they function?
     Use cases within financial services
      ▪ What types of applications benefit from distributed
        caches? How can they be integrated in an architecture?
     Performance evaluation and optimization
      ▪ Methodology, performance variables, some results
   Given there is more content than time, we
    will not cover:
     Detailed vendor analysis and feature comparisons
     Detailed inner architecture of distributed caches
     Specifying, deploying, or managing caches
     Reliability and fault tolerance features
     Usage of advanced, product-specific developer
     features
YOU ARE:                               YOU UNDERSTAND:

   A software developer                  Software Development
     …in financial services                C# and .NET
     …writing time-sensitive               Network services
      applications in .NET                  Performance metrics and
     …with little to no exposure to         optimization
      distributed caches                  Distributed computing
     …with little to moderate              Parallelizing algorithms
      exposure to distributed               Partitioning data
      computing
                                            Jobs, tasks
To ensure that we all are speaking the same language
   Technique of computing that:
     Unites many machines into a coordinated group
     Uses the group to execute code
     Balances workload across machines in the group
     Manages the lifetime of tasks and jobs
     Reduces bottlenecks on computation
   Loose synonyms:
     Grid-, distributed-, cloud-computing, HPC
   Server product that:
     Unites many machines into a coordinated group
     Uses the group to store granular data
     Balances data across all machines in group
     Diffuses I/O across all machines in group
     Reduces bottlenecks on data movement
   Loose synonyms:
     Data grid, data fabric
   There isn’t a brief, widely used acronym for
    distributed cache products.
   So, here’s a makeshift one for this talk.
SCALABILITY                          RELIABILITY

   Ensure that an HPC system           Ensure that an HPC system
    can tolerate:                        can tolerate:
     Increases in client count           Hardware failure
     Increases in data sizes             Network failure
     Increases in data movement          Server software failure
     Increases in concurrent apps        Denial of service
   Adds bandwidth and load-            Adds self-healing and
    balancing to data sources            resilience to data sources

   To be discussed                     To be omitted
   Though HPC has so far been adopted faster
    than DCP, they will necessarily rise together
     Distinct best-of-breed products in DCP and HPC
     today are likely to coalesce in the coming years
   The reason :
     Parallel clients deluge central data sources
     Many hands make light work (but heavy load)
     HPC creates a problem that DCP nicely solves
   An infinite line waits to get a pint
    of stout from a single tap
     Process: get glass, pour body, let
      settle, pour head, serve glass, accept
      payment, make change, collect tip
   An extra bartender or two
    speeds service as they can serve
    multiple customers at once
     But more creates contention at tap
   Adding more taps permit more
    bartenders with less contention
    and faster service
     Parallelism only scales when both
      bartenders and taps scale
To present motivating examples within financial services
   Financial services is replete with “delightfully
    parallel” apps, for example:
     Portfolio Management
     Fixed Income Pricing
     Reporting
     Simulation and Back-testing
   When run in parallel, these apps generate a
    lot of I/O, esp. from central data sources
Portfolio Management


A hedge fund manages a
corpus of investment portfolios   foreach account:
for its client accounts, each
comprised of many asset types.
                                    foreach market:
Throughout each trading day,           calculate ideal exposure
as markets open and close and
as new market information is           allocate ideal exposure
collected, portfolios are
rebalanced, re-priced, and re-
                                       calculate ideal trades
hedged.                                adjust trades (cost/risk)
                                    adjust trades (cost/risk)
                                    hedge trades
Fixed Income Pricing


An investment bank furnishes
pricing information to traders      foreach bond:
and counterparties for fixed
income instruments.
                                      foreach path:
Throughout the trading day,              foreach month:
market information is revealed
that affects the pricing of fixed          calculate dependencies
income instruments and
structured products.
                                           calculate price
                                         aggregate price
                                      aggregate paths
Reporting


Financial services firms
generate customized reports       foreach account:
for both internal and external
use. Whether daily, monthly, or
                                    foreach report:
quarterly, reports create              foreach report element:
significant demand for data.
                                         calculate element
                                       calculate report
                                       format report
                                       export report
                                    assemble report package
Simulation and
Back-Testing


Any financial services firm that
does any level of quantitative     foreach configuration:
analysis and modeling will
require the ability to simulate
                                     foreach instrument:
models and back-test them               foreach period:
against historical data. This
often involves executing a                calculate model value
model over many years’ worth
of data.
                                          calculate model error
                                        aggregate model value
                                        aggregate model error
                                     aggregate model error
                                   aggregate model error
Many Units of   Large Volume of       Input Data Is
  Parallelism       Input Data         Overlapping
• Accounts       • Reference         • Reference
• Instruments      data                data and
• Samples        • Indicative data     indicative
                 • Parameters          data, for
• Paths
                                       example,
• Shocks                               required
• Periods                              across many
• Parameters                           apps, not just
                                       one
SHARED DATA                       SPECIFIC DATA

   Examples:                        Examples:
     Market quotes                    Account details
     Yield curves                     Portfolio allocations
     Exchange rates                   Bond structure
     Reference data                   Prepayment data
     Model parameters                 Intermediate state
   Characteristics:                 Characteristics:
     Often bandwidth hot spots        Often server hot spot
     Cache-friendly                   Cache-insensitive
   As we exploit parallelism to relieve workload
    bottlenecks on CPU, additional parallelism
    creates bottlenecks on central data sources
     Non-replicated file systems, web servers, and
      databases get deluged in HPC scenarios
     Clusters or load-balanced replicas of the above
      reach early scalability limits and create complexity
      in both IT management and software design
     Read-Write patterns are particularly challenging
Scaling Up                 Scaling Away               Scaling Out

• Means getting bigger,     • Means implementing       • Means enlisting more
  faster hardware             opportunistic              machines working
  better able to tolerate     multilevel caching to      together to share in
  the workload.               protect critical           the workload.
• Offers limited scalable     resources from           • Offers unlimited
  processing power and        repeated reads.            scalable processing
  memory, but no            • When clients               power and limited
  scalable bandwidth.         repeatedly ask for the     scalable memory and
                              same pieces of data,       bandwidth
                              information is
                              returned from nearest
                              cache instead of
                              critical resources.
   Represents a combination of scaling up,
    away, and out techniques
     Leverages the additive bandwidth, memory, and
      processing power of multiple machines
     Offers a simplified programming model over
      replicated server or local copy models
To get up and running
   A varied number of products at different price
    points and with different feature sets:
      Product                Price   Size   Complex   Feature
      Alachisoft NCache       $       &        !        *
      GemStone GemFire       $$$     &&        !!       **
      GigaSpaces XAP         $$$     &&&       !!      ***
      IBM ObjectGrid         $$$     &&&      !!!       **
      Oracle Coherence       $$$     &&        !!       **
      ScaleOut StateServer    $$      &        !        *
   Introducing ScaleOut StateServer (SOSS)
     Written in C/C++ for Windows, offers .NET API
     Straightforward install (either server or client)
   Comparatively:
     Offers similar interfaces and features as others
     Offers more narrow focus on fast/simple DCP
     Offers advanced features and fault tolerance
      support but that’s out of scope for this session
   SOSS runs on each server node
     Server nodes discover each other by multicast
     Once discovered, nodes talk P2P for heartbeat
     and object balancing
   Cache exposed as aggregate local memory
     SOSS assigns objects to particular nodes
     SOSS creates replicas of objects
     SOSS routes requests to nodes owning objects
     SOSS caches objects at multiple points
   If you leverage dictionaries and hash tables,
    you’ll find DCP APIs simple to use
     Objects in the cache have keys and values
   System has basic CRUD semantics:
     Create (aka Add, Insert, Put, Store)
     Retrieve (aka Read, Select, Get, Peek)
     Update
     Delete (aka Remove, Erase)
static void Main(string[] args) {
   // Initialize object to be stored:
   SampleClass sampleObj = new SampleClass();
   sampleObj.var1 = quot;Hello, SOSS!quot;;

    // Create a cache:
    SossCache cache = CacheFactory.GetCache(quot;myCachequot;);

    // Store object in the distributed cache:
    cache[quot;myObjquot;] = sampleObj;

    // Read and update object stored in cache:
    SampleClass retrievedObj = null;
    retrievedObj = cache[quot;myObjquot;] as SampleClass;
    retrievedObj.var1 = quot;Hello, again!quot;;
    cache[quot;myObjquot;] = retrievedObj;

    // Remove object from the cache:
    cache.Remove(quot;myObjquot;);
}
static void Main(string[] args){
   // Initialize object to be stored:
   SampleClass sampleObj = new SampleClass();
   sampleObj.var1 = quot;Hello, SOSS!quot;;

    // Create a data accessor:
    CachedDataAccessor cda = new CachedDataAccessor(“mykey”);

    // Store object in ScaleOut StateServer (SOSS):
    cda.Add(sampleObj, 0, false);

    // Read and update object stored in SOSS:
    SampleClass retrievedObj = null;
    retrievedObj = (SampleClass) cda.Retrieve(true);
    retrievedObj.var1 = quot;Hello, again!quot;;
    cda.Update(retrievedObj);

    // Remove object from SOSS:
    cda.Remove();
}
KEYS                               VALUES

   Uniquely identify an object       The data being stored and
    within the cache                   identified by a key
   SOSS natively uses 256-bit        SOSS stores values as
    binary keys, but:                  opaque BLOBs
     GUID = key
                                        Can be of arbitrary length (but
     SHA-256(string) = key
                                         performance varies)
     SHA-256(*) = key
                                        Most values are created via
   256-bit is an astronomically
                                         serialization
    large keyspace
     Unique value for every ~85
      atoms in the universe
KEYS                             VALUES

   Choose meaningful keys          Choose an appropriate
   Use a naming convention          granularity for objects
    such as URN or URL              Make read-only as many
   Use namespaces where             objects as possible
    appropriate                     Avoid designs that create
   Use namespaces to                hot objects
    differentiate usage policy
To measure distributed cache performance
 Several variables affect                 Network
  cache performance                       Bandwidth


 They are interactive
                             Read/Write
                                                       Object Size
 That said, a lot more       Pattern


  bandwidth and nodes
  will go a long way
                              Serializ-                  Object
                               ation                     Count



                                           Client to
                                          Node Ratio
   This simulation reads many distinct objects
    from many readers simultaneously
     Similar to many pricing workers reading
     information about distinct instruments
   With read caching turned off, it should
    behave similar to database
     Without SQL and multirow results, of course
     With hit on clustered index seek, rowcount = 1
Uniform Read (n obj)
Cache Off
Time to Read


There is some super-linear                10,000
performance on smaller
objects, but it flattens out as
objects get larger.                                  1,000




                                  Elapsed/Iteration (msec)
                                                             100




                                                             10




                                                              1




                                                              0
                                                               1,000   10,000   100,000        1,000,000   10,000,000   100,000,000
                                                                                     Item Size (bytes)

                                                                                      2x1           2x2
                                                                                      2x3           2x4
Uniform Read (n obj)
Cache Off
Throughput


Viewed as throughput, we can        10,000
see that network bandwidth
becomes a limiting factor.



                                            1,000
                               Throughput (Mbps)




                                                   100




                                                   10
                                                     1,000   10,000   100,000        1,000,000   10,000,000   100,000,000
                                                                           Item Size (bytes)

                                                                            2x1           2x2
                                                                            2x3           2x4
   This simulation reads a single object from
    many readers simultaneously
     Similar to many pricing workers reading the same
     common object, such as yield curve
   With read caching turned off, it should
    behave similar to database
     Note that with caching off, requests are being
     routed to host that masters the requested object
Uniform Read (1 obj)
Cache Off
Time to Read


These results are very similar to          10,000
the database model, but with
slightly worse performance due
to contention at node that                             1,000
owns the master replica of the
object being requested.
                                    Elapsed/Iteration (msec)
                                                               100




                                                               10




                                                                1




                                                                0
                                                                 1,000   10,000   100,000        1,000,000   10,000,000   100,000,000
                                                                                       Item Size (bytes)

                                                                                        2x1           2x2
                                                                                        2x3           2x4
Uniform Read (1 obj)
Cache Off
Throughput


Again, this is performance             10,000
consistent, though slower, than
the previous model.



                                               1,000
                                  Throughput (Mbps)




                                                      100




                                                      10
                                                        1,000   10,000   100,000        1,000,000   10,000,000   100,000,000
                                                                              Item Size (bytes)

                                                                               2x1           2x2
                                                                               2x3           2x4
   This simulation reads a single object from
    many readers simultaneously
     Similar to many pricing workers reading the same
     common object, such as yield curve
   With read caching turned on, we should see
    much better performance
     Note that with caching on, requests can be
      handled by any node that has it in cache
     Some overhead attributed to version checking
Uniform Read (1 obj)
Cache On
Time to Read


As expected, read times are flat            1,000
up to a certain threshold as
objects are returned from
cache. Performance suddenly
gets worse when the object                                    100
sizes get large enough to
saturate the network               Elapsed/Iteration (msec)
bandwidth and cause read
queuing.                                                      10




                                                               1




                                                               0
                                                                1,000   10,000   100,000            1,000,000     10,000,000   100,000,000
                                                                                      Item Size (bytes)

                                                                                    1x1       1x2           1x3
                                                                                    1x4       2x1           2x2
Uniform Read/Write
Cache On
Time to Read


This shows the affect of          10,000
periodic updates on the
hotspot object. The
performance threshold is                     1,000
reduced due to cache
invalidation.
                           Elapsed/Iteration (msec)
                                                      100




                                                      10




                                                       1




                                                       0
                                                        1,000   10,000   100,000        1,000,000   10,000,000   100,000,000
                                                                              Item Size (bytes)

                                                                               2x1           2x2
                                                                               2x3           2x4
To squeeze further performance and functionality from caches
   The performance advantage of compression
    is highly contingent on object size,
    compression ratio, and client CPU overhead
     For fast clients, large compressible data streams,
      compression can show significant gains
     For small objects, effect can be counterproductive
   Beyond the object size threshold (usually
    between 100k and 200k), object
    segmentation can be very advantageous
     When serialized stream is greater than threshold,
      object is split, pieces stored with nonce keys
     A directory of the nonce keys is stored under the
      original object key
     Safety features such as hashes/checksums push
      up the segmentation threshold
   As DCP evolve a greater role at financial
    services institutions, a question usually arises:
     Should applications own their own cache or
     should all share a common cache?
   Answer depends on:
     Technology policy
     Security requirements
     Network and product limitations
   Several strategies for using string-based keys
     URN-based locators
      ▪ Ex: “urn:lab49-com:equity:us:msft:close;2008-03-12”
      ▪ Ex: “urn:uuid:2af640cb-098f-4acd-9be3-9a9bd4673983”
     Fully qualified assembly names
      ▪ Ex: “System, Version=1.0.5000.0, Culture=neutral,
        PublicKeyToken=b77a5c561934e089”
   Namespaces can insulate key collisions
    between applications or object types
To find out more information about distributed caches
   ScaleOut Software
     http://www.scaleoutsoftware.com/
   Industry Team Blog
     http://blogs.msdn.com/fsdpe
   Microsoft in Financial Services
     http://www.microsoft.com/financialservices
   Technology Community for Financial Services
     http://www.financialdevelopers.com
     Sign up to receive the free quarterly FS Developer
      Newsletter on top left hand side of site
   Marc Jacobs, Director
     mail: marcja@lab49.com
     blog: http://marcja.wordpress.com
   Lab49, Inc.
     mail: info@lab49.com
     site: http://www.lab49.com
     blog: http://blog.lab49.com

Contenu connexe

Tendances

Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event ProcessingSybase Türkiye
 
Enhancing the Security of Data at Rest with SAP ASE 16
Enhancing the Security of Data at Rest with SAP ASE 16Enhancing the Security of Data at Rest with SAP ASE 16
Enhancing the Security of Data at Rest with SAP ASE 16SAP Technology
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
The Science of DBMS: Query Optimization
The Science of DBMS: Query Optimization The Science of DBMS: Query Optimization
The Science of DBMS: Query Optimization SAP Technology
 
Postgres in Production - Best Practices 2014
Postgres in Production - Best Practices 2014Postgres in Production - Best Practices 2014
Postgres in Production - Best Practices 2014EDB
 
SAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementSAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementDobler Consulting
 
ASE Performance and Tuning Parameters Beyond the cfg File
ASE Performance and Tuning Parameters Beyond the cfg FileASE Performance and Tuning Parameters Beyond the cfg File
ASE Performance and Tuning Parameters Beyond the cfg FileSAP Technology
 
Sizing SAP on x86 IBM PureFlex with Reference Architecture
Sizing SAP on x86 IBM PureFlex with Reference ArchitectureSizing SAP on x86 IBM PureFlex with Reference Architecture
Sizing SAP on x86 IBM PureFlex with Reference ArchitectureDoddi Priyambodo
 

Tendances (9)

Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event Processing
 
Enhancing the Security of Data at Rest with SAP ASE 16
Enhancing the Security of Data at Rest with SAP ASE 16Enhancing the Security of Data at Rest with SAP ASE 16
Enhancing the Security of Data at Rest with SAP ASE 16
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
The Science of DBMS: Query Optimization
The Science of DBMS: Query Optimization The Science of DBMS: Query Optimization
The Science of DBMS: Query Optimization
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Postgres in Production - Best Practices 2014
Postgres in Production - Best Practices 2014Postgres in Production - Best Practices 2014
Postgres in Production - Best Practices 2014
 
SAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementSAP IQ 16 Product Annoucement
SAP IQ 16 Product Annoucement
 
ASE Performance and Tuning Parameters Beyond the cfg File
ASE Performance and Tuning Parameters Beyond the cfg FileASE Performance and Tuning Parameters Beyond the cfg File
ASE Performance and Tuning Parameters Beyond the cfg File
 
Sizing SAP on x86 IBM PureFlex with Reference Architecture
Sizing SAP on x86 IBM PureFlex with Reference ArchitectureSizing SAP on x86 IBM PureFlex with Reference Architecture
Sizing SAP on x86 IBM PureFlex with Reference Architecture
 

En vedette

Unerrorenelcielo
UnerrorenelcieloUnerrorenelcielo
Unerrorenelcieloguest7539aa
 
Get Into Stanford
Get  Into  StanfordGet  Into  Stanford
Get Into StanfordBen Munoz
 
Aswath Rao VON.x Spring 2008 Talk
Aswath Rao VON.x Spring 2008 TalkAswath Rao VON.x Spring 2008 Talk
Aswath Rao VON.x Spring 2008 TalkAswath Rao
 
Meeting 16 Mar08 Team D2008 Bw
Meeting 16 Mar08 Team D2008 BwMeeting 16 Mar08 Team D2008 Bw
Meeting 16 Mar08 Team D2008 BwJoel Butts
 
Mexico Missions 2007
Mexico Missions 2007Mexico Missions 2007
Mexico Missions 2007Jason Brown
 

En vedette (6)

Unerrorenelcielo
UnerrorenelcieloUnerrorenelcielo
Unerrorenelcielo
 
Tufoto 1
Tufoto 1Tufoto 1
Tufoto 1
 
Get Into Stanford
Get  Into  StanfordGet  Into  Stanford
Get Into Stanford
 
Aswath Rao VON.x Spring 2008 Talk
Aswath Rao VON.x Spring 2008 TalkAswath Rao VON.x Spring 2008 Talk
Aswath Rao VON.x Spring 2008 Talk
 
Meeting 16 Mar08 Team D2008 Bw
Meeting 16 Mar08 Team D2008 BwMeeting 16 Mar08 Team D2008 Bw
Meeting 16 Mar08 Team D2008 Bw
 
Mexico Missions 2007
Mexico Missions 2007Mexico Missions 2007
Mexico Missions 2007
 

Similaire à Distributed Caches: A Developer’s Guide to Unleashing Your Data in High-Performance Applications

The Anchor Store: Four Confluence Examples to Root Your Deployment
The Anchor Store: Four Confluence Examples to Root Your DeploymentThe Anchor Store: Four Confluence Examples to Root Your Deployment
The Anchor Store: Four Confluence Examples to Root Your DeploymentAtlassian
 
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...SAP Cloud Platform
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
How to build a good Supply Chain Management - Panchami G
How to build a good Supply Chain Management - Panchami GHow to build a good Supply Chain Management - Panchami G
How to build a good Supply Chain Management - Panchami GPanchamiG
 
Inventing the future Business Programming Language
Inventing the future  Business Programming LanguageInventing the future  Business Programming Language
Inventing the future Business Programming LanguageESUG
 
Cascading concurrent yahoo lunch_nlearn
Cascading concurrent   yahoo lunch_nlearnCascading concurrent   yahoo lunch_nlearn
Cascading concurrent yahoo lunch_nlearnCascading
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2
 
SaaS - Software as a Service - Charles University - Prague - March 2013
SaaS - Software as a Service - Charles University - Prague - March 2013SaaS - Software as a Service - Charles University - Prague - March 2013
SaaS - Software as a Service - Charles University - Prague - March 2013Jaroslav Gergic
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Architecting with power vm
Architecting with power vmArchitecting with power vm
Architecting with power vmCharlie Cler
 
Joe Honan Virtualization Trends
Joe Honan   Virtualization TrendsJoe Honan   Virtualization Trends
Joe Honan Virtualization Trends1velocity
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die() LivePerson
 
C8 Whats New In Versions 3 And 4
C8   Whats New In Versions 3 And 4C8   Whats New In Versions 3 And 4
C8 Whats New In Versions 3 And 4dfwcug
 
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_SparkSunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_SparkMopuru Babu
 
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_SparkSunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_SparkMopuru Babu
 

Similaire à Distributed Caches: A Developer’s Guide to Unleashing Your Data in High-Performance Applications (20)

Seminar - JBoss Migration
Seminar - JBoss MigrationSeminar - JBoss Migration
Seminar - JBoss Migration
 
The Anchor Store: Four Confluence Examples to Root Your Deployment
The Anchor Store: Four Confluence Examples to Root Your DeploymentThe Anchor Store: Four Confluence Examples to Root Your Deployment
The Anchor Store: Four Confluence Examples to Root Your Deployment
 
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
 
XS 2008 Boston Capacity Planning
XS 2008 Boston Capacity PlanningXS 2008 Boston Capacity Planning
XS 2008 Boston Capacity Planning
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
eDocument Sciences SaaS 101
eDocument Sciences SaaS 101eDocument Sciences SaaS 101
eDocument Sciences SaaS 101
 
Resume
ResumeResume
Resume
 
How to build a good Supply Chain Management - Panchami G
How to build a good Supply Chain Management - Panchami GHow to build a good Supply Chain Management - Panchami G
How to build a good Supply Chain Management - Panchami G
 
Inventing the future Business Programming Language
Inventing the future  Business Programming LanguageInventing the future  Business Programming Language
Inventing the future Business Programming Language
 
Cascading concurrent yahoo lunch_nlearn
Cascading concurrent   yahoo lunch_nlearnCascading concurrent   yahoo lunch_nlearn
Cascading concurrent yahoo lunch_nlearn
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
 
SaaS - Software as a Service - Charles University - Prague - March 2013
SaaS - Software as a Service - Charles University - Prague - March 2013SaaS - Software as a Service - Charles University - Prague - March 2013
SaaS - Software as a Service - Charles University - Prague - March 2013
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Architecting with power vm
Architecting with power vmArchitecting with power vm
Architecting with power vm
 
Joe Honan Virtualization Trends
Joe Honan   Virtualization TrendsJoe Honan   Virtualization Trends
Joe Honan Virtualization Trends
 
Measure() or die()
Measure() or die()Measure() or die()
Measure() or die()
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
C8 Whats New In Versions 3 And 4
C8   Whats New In Versions 3 And 4C8   Whats New In Versions 3 And 4
C8 Whats New In Versions 3 And 4
 
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_SparkSunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
 
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_SparkSunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
 

Dernier

Sarlat Advisory - Corporate Brochure - 2024
Sarlat Advisory - Corporate Brochure - 2024Sarlat Advisory - Corporate Brochure - 2024
Sarlat Advisory - Corporate Brochure - 2024Guillaume Ⓥ Sarlat
 
Buy and Sell Urban Tots unlisted shares.pptx
Buy and Sell Urban Tots unlisted shares.pptxBuy and Sell Urban Tots unlisted shares.pptx
Buy and Sell Urban Tots unlisted shares.pptxPrecize Formely Leadoff
 
Work and Pensions report into UK corporate DB funding
Work and Pensions report into UK corporate DB fundingWork and Pensions report into UK corporate DB funding
Work and Pensions report into UK corporate DB fundingHenry Tapper
 
The unequal battle of inflation and the appropriate sustainable solution | Eu...
The unequal battle of inflation and the appropriate sustainable solution | Eu...The unequal battle of inflation and the appropriate sustainable solution | Eu...
The unequal battle of inflation and the appropriate sustainable solution | Eu...Antonis Zairis
 
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdf
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdfLundin Gold March 2024 Corporate Presentation - PDAC v1.pdf
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdfAdnet Communications
 
Mphasis - Schwab Newsletter PDF - Sample 8707
Mphasis - Schwab Newsletter PDF - Sample 8707Mphasis - Schwab Newsletter PDF - Sample 8707
Mphasis - Schwab Newsletter PDF - Sample 8707harshan90
 
Introduction to Entrepreneurship and Characteristics of an Entrepreneur
Introduction to Entrepreneurship and Characteristics of an EntrepreneurIntroduction to Entrepreneurship and Characteristics of an Entrepreneur
Introduction to Entrepreneurship and Characteristics of an Entrepreneurabcisahunter
 
Monthly Market Risk Update: March 2024 [SlideShare]
Monthly Market Risk Update: March 2024 [SlideShare]Monthly Market Risk Update: March 2024 [SlideShare]
Monthly Market Risk Update: March 2024 [SlideShare]Commonwealth
 
What Key Factors Should Risk Officers Consider When Using Generative AI
What Key Factors Should Risk Officers Consider When Using Generative AIWhat Key Factors Should Risk Officers Consider When Using Generative AI
What Key Factors Should Risk Officers Consider When Using Generative AI360factors
 
LIC PRIVATISATION its a bane or boon.pptx
LIC PRIVATISATION its a bane or boon.pptxLIC PRIVATISATION its a bane or boon.pptx
LIC PRIVATISATION its a bane or boon.pptxsonamyadav7097
 
2024.03 Strategic Resources Presentation
2024.03 Strategic Resources Presentation2024.03 Strategic Resources Presentation
2024.03 Strategic Resources PresentationAdnet Communications
 
20240315 _E-Invoicing Digiteal. .pptx
20240315 _E-Invoicing Digiteal.    .pptx20240315 _E-Invoicing Digiteal.    .pptx
20240315 _E-Invoicing Digiteal. .pptxFinTech Belgium
 
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptxSlideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptxOffice for National Statistics
 
Hungarys economy made by Robert Miklos
Hungarys economy   made by Robert MiklosHungarys economy   made by Robert Miklos
Hungarys economy made by Robert Miklosbeduinpower135
 
Contracts with Interdependent Preferences
Contracts with Interdependent PreferencesContracts with Interdependent Preferences
Contracts with Interdependent PreferencesGRAPE
 
20240314 Calibre March 2024 Investor Presentation (FINAL).pdf
20240314 Calibre March 2024 Investor Presentation (FINAL).pdf20240314 Calibre March 2024 Investor Presentation (FINAL).pdf
20240314 Calibre March 2024 Investor Presentation (FINAL).pdfAdnet Communications
 
Stock Market Brief Deck for 3/22/2024.pdf
Stock Market Brief Deck for 3/22/2024.pdfStock Market Brief Deck for 3/22/2024.pdf
Stock Market Brief Deck for 3/22/2024.pdfMichael Silva
 
ACCOUNTING FOR BUSINESS.II DEPARTMENTAL ACCOUNTS.
ACCOUNTING FOR BUSINESS.II DEPARTMENTAL ACCOUNTS.ACCOUNTING FOR BUSINESS.II DEPARTMENTAL ACCOUNTS.
ACCOUNTING FOR BUSINESS.II DEPARTMENTAL ACCOUNTS.KumarJayaraman3
 

Dernier (20)

Sarlat Advisory - Corporate Brochure - 2024
Sarlat Advisory - Corporate Brochure - 2024Sarlat Advisory - Corporate Brochure - 2024
Sarlat Advisory - Corporate Brochure - 2024
 
Buy and Sell Urban Tots unlisted shares.pptx
Buy and Sell Urban Tots unlisted shares.pptxBuy and Sell Urban Tots unlisted shares.pptx
Buy and Sell Urban Tots unlisted shares.pptx
 
Work and Pensions report into UK corporate DB funding
Work and Pensions report into UK corporate DB fundingWork and Pensions report into UK corporate DB funding
Work and Pensions report into UK corporate DB funding
 
The unequal battle of inflation and the appropriate sustainable solution | Eu...
The unequal battle of inflation and the appropriate sustainable solution | Eu...The unequal battle of inflation and the appropriate sustainable solution | Eu...
The unequal battle of inflation and the appropriate sustainable solution | Eu...
 
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdf
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdfLundin Gold March 2024 Corporate Presentation - PDAC v1.pdf
Lundin Gold March 2024 Corporate Presentation - PDAC v1.pdf
 
Mphasis - Schwab Newsletter PDF - Sample 8707
Mphasis - Schwab Newsletter PDF - Sample 8707Mphasis - Schwab Newsletter PDF - Sample 8707
Mphasis - Schwab Newsletter PDF - Sample 8707
 
Introduction to Entrepreneurship and Characteristics of an Entrepreneur
Introduction to Entrepreneurship and Characteristics of an EntrepreneurIntroduction to Entrepreneurship and Characteristics of an Entrepreneur
Introduction to Entrepreneurship and Characteristics of an Entrepreneur
 
Effects & Policies Of Bank Consolidation
Effects & Policies Of Bank ConsolidationEffects & Policies Of Bank Consolidation
Effects & Policies Of Bank Consolidation
 
Monthly Market Risk Update: March 2024 [SlideShare]
Monthly Market Risk Update: March 2024 [SlideShare]Monthly Market Risk Update: March 2024 [SlideShare]
Monthly Market Risk Update: March 2024 [SlideShare]
 
What Key Factors Should Risk Officers Consider When Using Generative AI
What Key Factors Should Risk Officers Consider When Using Generative AIWhat Key Factors Should Risk Officers Consider When Using Generative AI
What Key Factors Should Risk Officers Consider When Using Generative AI
 
LIC PRIVATISATION its a bane or boon.pptx
LIC PRIVATISATION its a bane or boon.pptxLIC PRIVATISATION its a bane or boon.pptx
LIC PRIVATISATION its a bane or boon.pptx
 
2024.03 Strategic Resources Presentation
2024.03 Strategic Resources Presentation2024.03 Strategic Resources Presentation
2024.03 Strategic Resources Presentation
 
20240315 _E-Invoicing Digiteal. .pptx
20240315 _E-Invoicing Digiteal.    .pptx20240315 _E-Invoicing Digiteal.    .pptx
20240315 _E-Invoicing Digiteal. .pptx
 
Monthly Economic Monitoring of Ukraine No.230, March 2024
Monthly Economic Monitoring of Ukraine No.230, March 2024Monthly Economic Monitoring of Ukraine No.230, March 2024
Monthly Economic Monitoring of Ukraine No.230, March 2024
 
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptxSlideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
 
Hungarys economy made by Robert Miklos
Hungarys economy   made by Robert MiklosHungarys economy   made by Robert Miklos
Hungarys economy made by Robert Miklos
 
Contracts with Interdependent Preferences
Contracts with Interdependent PreferencesContracts with Interdependent Preferences
Contracts with Interdependent Preferences
 
20240314 Calibre March 2024 Investor Presentation (FINAL).pdf
20240314 Calibre March 2024 Investor Presentation (FINAL).pdf20240314 Calibre March 2024 Investor Presentation (FINAL).pdf
20240314 Calibre March 2024 Investor Presentation (FINAL).pdf
 
Stock Market Brief Deck for 3/22/2024.pdf
Stock Market Brief Deck for 3/22/2024.pdfStock Market Brief Deck for 3/22/2024.pdf
Stock Market Brief Deck for 3/22/2024.pdf
 
ACCOUNTING FOR BUSINESS.II DEPARTMENTAL ACCOUNTS.
ACCOUNTING FOR BUSINESS.II DEPARTMENTAL ACCOUNTS.ACCOUNTING FOR BUSINESS.II DEPARTMENTAL ACCOUNTS.
ACCOUNTING FOR BUSINESS.II DEPARTMENTAL ACCOUNTS.
 

Distributed Caches: A Developer’s Guide to Unleashing Your Data in High-Performance Applications

  • 1. A Developer’s Guide to Unleashing Your Data in High-Performance Applications Marc Jacobs Director marcja@lab49.com
  • 2. To make sure that you’re in the right room
  • 3. From the software developer’s view:  The basics of distributed caches ▪ What are they? What services do they provide? What purpose do they serve? How do they function?  Use cases within financial services ▪ What types of applications benefit from distributed caches? How can they be integrated in an architecture?  Performance evaluation and optimization ▪ Methodology, performance variables, some results
  • 4. Given there is more content than time, we will not cover:  Detailed vendor analysis and feature comparisons  Detailed inner architecture of distributed caches  Specifying, deploying, or managing caches  Reliability and fault tolerance features  Usage of advanced, product-specific developer features
  • 5. YOU ARE: YOU UNDERSTAND:  A software developer  Software Development  …in financial services  C# and .NET  …writing time-sensitive  Network services applications in .NET  Performance metrics and  …with little to no exposure to optimization distributed caches  Distributed computing  …with little to moderate  Parallelizing algorithms exposure to distributed  Partitioning data computing  Jobs, tasks
  • 6. To ensure that we all are speaking the same language
  • 7. Technique of computing that:  Unites many machines into a coordinated group  Uses the group to execute code  Balances workload across machines in the group  Manages the lifetime of tasks and jobs  Reduces bottlenecks on computation  Loose synonyms:  Grid-, distributed-, cloud-computing, HPC
  • 8. Server product that:  Unites many machines into a coordinated group  Uses the group to store granular data  Balances data across all machines in group  Diffuses I/O across all machines in group  Reduces bottlenecks on data movement  Loose synonyms:  Data grid, data fabric
  • 9. There isn’t a brief, widely used acronym for distributed cache products.  So, here’s a makeshift one for this talk.
  • 10. SCALABILITY RELIABILITY  Ensure that an HPC system  Ensure that an HPC system can tolerate: can tolerate:  Increases in client count  Hardware failure  Increases in data sizes  Network failure  Increases in data movement  Server software failure  Increases in concurrent apps  Denial of service  Adds bandwidth and load-  Adds self-healing and balancing to data sources resilience to data sources  To be discussed  To be omitted
  • 11. Though HPC has so far been adopted faster than DCP, they will necessarily rise together  Distinct best-of-breed products in DCP and HPC today are likely to coalesce in the coming years  The reason :  Parallel clients deluge central data sources  Many hands make light work (but heavy load)  HPC creates a problem that DCP nicely solves
  • 12. An infinite line waits to get a pint of stout from a single tap  Process: get glass, pour body, let settle, pour head, serve glass, accept payment, make change, collect tip  An extra bartender or two speeds service as they can serve multiple customers at once  But more creates contention at tap  Adding more taps permit more bartenders with less contention and faster service  Parallelism only scales when both bartenders and taps scale
  • 13. To present motivating examples within financial services
  • 14. Financial services is replete with “delightfully parallel” apps, for example:  Portfolio Management  Fixed Income Pricing  Reporting  Simulation and Back-testing  When run in parallel, these apps generate a lot of I/O, esp. from central data sources
  • 15. Portfolio Management A hedge fund manages a corpus of investment portfolios foreach account: for its client accounts, each comprised of many asset types. foreach market: Throughout each trading day, calculate ideal exposure as markets open and close and as new market information is allocate ideal exposure collected, portfolios are rebalanced, re-priced, and re- calculate ideal trades hedged. adjust trades (cost/risk) adjust trades (cost/risk) hedge trades
  • 16. Fixed Income Pricing An investment bank furnishes pricing information to traders foreach bond: and counterparties for fixed income instruments. foreach path: Throughout the trading day, foreach month: market information is revealed that affects the pricing of fixed calculate dependencies income instruments and structured products. calculate price aggregate price aggregate paths
  • 17. Reporting Financial services firms generate customized reports foreach account: for both internal and external use. Whether daily, monthly, or foreach report: quarterly, reports create foreach report element: significant demand for data. calculate element calculate report format report export report assemble report package
  • 18. Simulation and Back-Testing Any financial services firm that does any level of quantitative foreach configuration: analysis and modeling will require the ability to simulate foreach instrument: models and back-test them foreach period: against historical data. This often involves executing a calculate model value model over many years’ worth of data. calculate model error aggregate model value aggregate model error aggregate model error aggregate model error
  • 19. Many Units of Large Volume of Input Data Is Parallelism Input Data Overlapping • Accounts • Reference • Reference • Instruments data data and • Samples • Indicative data indicative • Parameters data, for • Paths example, • Shocks required • Periods across many • Parameters apps, not just one
  • 20. SHARED DATA SPECIFIC DATA  Examples:  Examples:  Market quotes  Account details  Yield curves  Portfolio allocations  Exchange rates  Bond structure  Reference data  Prepayment data  Model parameters  Intermediate state  Characteristics:  Characteristics:  Often bandwidth hot spots  Often server hot spot  Cache-friendly  Cache-insensitive
  • 21. As we exploit parallelism to relieve workload bottlenecks on CPU, additional parallelism creates bottlenecks on central data sources  Non-replicated file systems, web servers, and databases get deluged in HPC scenarios  Clusters or load-balanced replicas of the above reach early scalability limits and create complexity in both IT management and software design  Read-Write patterns are particularly challenging
  • 22. Scaling Up Scaling Away Scaling Out • Means getting bigger, • Means implementing • Means enlisting more faster hardware opportunistic machines working better able to tolerate multilevel caching to together to share in the workload. protect critical the workload. • Offers limited scalable resources from • Offers unlimited processing power and repeated reads. scalable processing memory, but no • When clients power and limited scalable bandwidth. repeatedly ask for the scalable memory and same pieces of data, bandwidth information is returned from nearest cache instead of critical resources.
  • 23. Represents a combination of scaling up, away, and out techniques  Leverages the additive bandwidth, memory, and processing power of multiple machines  Offers a simplified programming model over replicated server or local copy models
  • 24. To get up and running
  • 25. A varied number of products at different price points and with different feature sets: Product Price Size Complex Feature Alachisoft NCache $ & ! * GemStone GemFire $$$ && !! ** GigaSpaces XAP $$$ &&& !! *** IBM ObjectGrid $$$ &&& !!! ** Oracle Coherence $$$ && !! ** ScaleOut StateServer $$ & ! *
  • 26. Introducing ScaleOut StateServer (SOSS)  Written in C/C++ for Windows, offers .NET API  Straightforward install (either server or client)  Comparatively:  Offers similar interfaces and features as others  Offers more narrow focus on fast/simple DCP  Offers advanced features and fault tolerance support but that’s out of scope for this session
  • 27. SOSS runs on each server node  Server nodes discover each other by multicast  Once discovered, nodes talk P2P for heartbeat and object balancing  Cache exposed as aggregate local memory  SOSS assigns objects to particular nodes  SOSS creates replicas of objects  SOSS routes requests to nodes owning objects  SOSS caches objects at multiple points
  • 28. If you leverage dictionaries and hash tables, you’ll find DCP APIs simple to use  Objects in the cache have keys and values  System has basic CRUD semantics:  Create (aka Add, Insert, Put, Store)  Retrieve (aka Read, Select, Get, Peek)  Update  Delete (aka Remove, Erase)
  • 29. static void Main(string[] args) { // Initialize object to be stored: SampleClass sampleObj = new SampleClass(); sampleObj.var1 = quot;Hello, SOSS!quot;; // Create a cache: SossCache cache = CacheFactory.GetCache(quot;myCachequot;); // Store object in the distributed cache: cache[quot;myObjquot;] = sampleObj; // Read and update object stored in cache: SampleClass retrievedObj = null; retrievedObj = cache[quot;myObjquot;] as SampleClass; retrievedObj.var1 = quot;Hello, again!quot;; cache[quot;myObjquot;] = retrievedObj; // Remove object from the cache: cache.Remove(quot;myObjquot;); }
  • 30. static void Main(string[] args){ // Initialize object to be stored: SampleClass sampleObj = new SampleClass(); sampleObj.var1 = quot;Hello, SOSS!quot;; // Create a data accessor: CachedDataAccessor cda = new CachedDataAccessor(“mykey”); // Store object in ScaleOut StateServer (SOSS): cda.Add(sampleObj, 0, false); // Read and update object stored in SOSS: SampleClass retrievedObj = null; retrievedObj = (SampleClass) cda.Retrieve(true); retrievedObj.var1 = quot;Hello, again!quot;; cda.Update(retrievedObj); // Remove object from SOSS: cda.Remove(); }
  • 31. KEYS VALUES  Uniquely identify an object  The data being stored and within the cache identified by a key  SOSS natively uses 256-bit  SOSS stores values as binary keys, but: opaque BLOBs  GUID = key  Can be of arbitrary length (but  SHA-256(string) = key performance varies)  SHA-256(*) = key  Most values are created via  256-bit is an astronomically serialization large keyspace  Unique value for every ~85 atoms in the universe
  • 32. KEYS VALUES  Choose meaningful keys  Choose an appropriate  Use a naming convention granularity for objects such as URN or URL  Make read-only as many  Use namespaces where objects as possible appropriate  Avoid designs that create  Use namespaces to hot objects differentiate usage policy
  • 33. To measure distributed cache performance
  • 34.  Several variables affect Network cache performance Bandwidth  They are interactive Read/Write Object Size  That said, a lot more Pattern bandwidth and nodes will go a long way Serializ- Object ation Count Client to Node Ratio
  • 35. This simulation reads many distinct objects from many readers simultaneously  Similar to many pricing workers reading information about distinct instruments  With read caching turned off, it should behave similar to database  Without SQL and multirow results, of course  With hit on clustered index seek, rowcount = 1
  • 36. Uniform Read (n obj) Cache Off Time to Read There is some super-linear 10,000 performance on smaller objects, but it flattens out as objects get larger. 1,000 Elapsed/Iteration (msec) 100 10 1 0 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 Item Size (bytes) 2x1 2x2 2x3 2x4
  • 37. Uniform Read (n obj) Cache Off Throughput Viewed as throughput, we can 10,000 see that network bandwidth becomes a limiting factor. 1,000 Throughput (Mbps) 100 10 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 Item Size (bytes) 2x1 2x2 2x3 2x4
  • 38. This simulation reads a single object from many readers simultaneously  Similar to many pricing workers reading the same common object, such as yield curve  With read caching turned off, it should behave similar to database  Note that with caching off, requests are being routed to host that masters the requested object
  • 39. Uniform Read (1 obj) Cache Off Time to Read These results are very similar to 10,000 the database model, but with slightly worse performance due to contention at node that 1,000 owns the master replica of the object being requested. Elapsed/Iteration (msec) 100 10 1 0 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 Item Size (bytes) 2x1 2x2 2x3 2x4
  • 40. Uniform Read (1 obj) Cache Off Throughput Again, this is performance 10,000 consistent, though slower, than the previous model. 1,000 Throughput (Mbps) 100 10 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 Item Size (bytes) 2x1 2x2 2x3 2x4
  • 41. This simulation reads a single object from many readers simultaneously  Similar to many pricing workers reading the same common object, such as yield curve  With read caching turned on, we should see much better performance  Note that with caching on, requests can be handled by any node that has it in cache  Some overhead attributed to version checking
  • 42. Uniform Read (1 obj) Cache On Time to Read As expected, read times are flat 1,000 up to a certain threshold as objects are returned from cache. Performance suddenly gets worse when the object 100 sizes get large enough to saturate the network Elapsed/Iteration (msec) bandwidth and cause read queuing. 10 1 0 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 Item Size (bytes) 1x1 1x2 1x3 1x4 2x1 2x2
  • 43. Uniform Read/Write Cache On Time to Read This shows the affect of 10,000 periodic updates on the hotspot object. The performance threshold is 1,000 reduced due to cache invalidation. Elapsed/Iteration (msec) 100 10 1 0 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 Item Size (bytes) 2x1 2x2 2x3 2x4
  • 44. To squeeze further performance and functionality from caches
  • 45. The performance advantage of compression is highly contingent on object size, compression ratio, and client CPU overhead  For fast clients, large compressible data streams, compression can show significant gains  For small objects, effect can be counterproductive
  • 46. Beyond the object size threshold (usually between 100k and 200k), object segmentation can be very advantageous  When serialized stream is greater than threshold, object is split, pieces stored with nonce keys  A directory of the nonce keys is stored under the original object key  Safety features such as hashes/checksums push up the segmentation threshold
  • 47. As DCP evolve a greater role at financial services institutions, a question usually arises:  Should applications own their own cache or should all share a common cache?  Answer depends on:  Technology policy  Security requirements  Network and product limitations
  • 48. Several strategies for using string-based keys  URN-based locators ▪ Ex: “urn:lab49-com:equity:us:msft:close;2008-03-12” ▪ Ex: “urn:uuid:2af640cb-098f-4acd-9be3-9a9bd4673983”  Fully qualified assembly names ▪ Ex: “System, Version=1.0.5000.0, Culture=neutral, PublicKeyToken=b77a5c561934e089”  Namespaces can insulate key collisions between applications or object types
  • 49. To find out more information about distributed caches
  • 50. ScaleOut Software  http://www.scaleoutsoftware.com/  Industry Team Blog  http://blogs.msdn.com/fsdpe  Microsoft in Financial Services  http://www.microsoft.com/financialservices  Technology Community for Financial Services  http://www.financialdevelopers.com  Sign up to receive the free quarterly FS Developer Newsletter on top left hand side of site
  • 51. Marc Jacobs, Director  mail: marcja@lab49.com  blog: http://marcja.wordpress.com  Lab49, Inc.  mail: info@lab49.com  site: http://www.lab49.com  blog: http://blog.lab49.com