SlideShare une entreprise Scribd logo
1  sur  99
Télécharger pour lire hors ligne
In-memory databases offer
                     significant gains in               But three issues
                     performance as all data is         have stunted their
                     freely available. There is no      uptake: Address
Traditional disk-                                       spaces only being
                     need to page to and from disk.
oriented database                                       large enough for a
architecture is                                         subset of a typical
showing its age.    This makes joins a                  user’s data. The ‘one
                    problem. When data                  more bit’ problem
                    must be joined across               and durability.
                    multiple machines
                    performance degradation
 Snowflake          is inevitable.                    Distributed in-
 Schemas allow                                        memory databases
 us to mix             But this model only goes       solve these three
 Partitioning          so far. “Connected             problems but at the
 and Replication       Replication” takes us a        price of loosing the
 so joins never        step further allowing us       single address space.
 hit the wire.         to make the best possible
                       use of replication.
The lay of the land: 
  The main architectural
constructs in the database
         industry
Shared
 Disk
ms
   μs
       ns
           ps

1MB Disk/Network
        1MB Main Memory


          0.000,000,000,000
Cross Continental    Main Memory
                  L1 Cache Ref
Round Trip
          Ref
         Cross Network             L2 Cache Ref
         Round Trip
       * L1 ref is about 2 clock cycles or 0.7ns. This is
                           the time it takes light to travel 20cm
Distributed Cache
Taken from “OLTP Through
the Looking Glass, and What
We Found There”
Harizopoulos et al
Shared
                                  Nothing            

                                Teradata, Vertica,
                                 Greenplumb…


                                                                            SN
 Regular                        In-Memory
                                                                        In-Memory
Database
                        Database
                  Drop Disk
                                            Exasol, VoltDB,
Oracle, Sybase,
    MySql
                                Times Ten, HSQL,
                                      KDB
                                                          Distribute
       Hana


                                                                    ODC
                                Distributed
                                 Caching
                               Coherence, Gemfire,
                                  Gigaspaces
Distributed Architecture


 Simplify the Contract.


     Stick to RAM
450 processes
      2TB of RAM
                                         Oracle 
                                        Coherence




Messaging (Topic Based) as a system of record
                (persistence)
Access Layer      Java      Java
                    client
   client

                     API
      API
 Query Layer




                                        Transactions
  Data Layer




                                              Mtms

                                           Cashflows
Persistence Layer
Indexing




Partitioning
               Replication
But your storage is limited by
the memory on a node
Keys Fs-Fz
     Keys Xa-Yd




Scalable storage, bandwidth
and processing
Trader
            Party
         Version 1
        Trade
                 Trader
                           Party
         Version 2
                       Trade
                                Trader
                                          Party
         Version 3
                                      Trade
                                               Trader
                                                         Party
   Version 4
                                                     Trade
…and you need
versioning to do MVCC
Trade
            Trader
         Party


         Party
   Trader
Trade


         Party
                  Trader

Trade
         Party
So better to use
partitioning, spreading
   data around the
         cluster.
Trader
                             Party

                         Trade




Trade
   Trader
                      Party
Trader
                             Party

                         Trade




Trade
   Trader
                      Party
!
This is what using Snowflake Schemas and
  the Connected Replication pattern is all
                   about!
Crosscutting
   Keys




 Common
  Keys
Replicated
Trader
                   Party


          Trade
                            Partitioned
Valuation Legs

             Valuations

art Transaction Mapping

     Cashflow Mapping                                                  Facts:
             Party Alias

            Transaction
                                                                       =>Big, 
             Cashflows                                                 common
                   Legs

                 Parties
                                                                       keys
           Ledger Book

           Source Book
                                                                       Dimensions
            Cost Centre

                Product                                                =>Small,
  Risk Organisation Unit

          Business Unit
                                                                       crosscutting 
             HCS Entity                                                Keys
           Set of Books

                           0   37,500,000   75,000,000   112,500,000          150,000,000
Coherence’s
                     KeyAssociation
                      gives us this
Trades
      MTMs



          Common
            Key
Replicated
Trader
                   Party


          Trade
                            Partitioned
                                 (
Query Layer
Trader
           Party

      Trade




                       Transactions




                                       Data Layer
                            Mtms

                         Cashflows



                    Fact Storage
                    (Partitioned)
Dimensions
                   (repliacte)

   Transactions

        Mtms
                      Facts
     Cashflows
                  (distribute/
                    partition)
Fact Storage
(Partitioned)
Valuation Legs

             Valuations



                                                                       Facts:
art Transaction Mapping

     Cashflow Mapping

             Party Alias
                                                                       =>Big
                                                                       =>Distribute
            Transaction

             Cashflows

                   Legs

                 Parties

           Ledger Book

           Source Book                                                 Dimensions
                                                                       =>Small 
            Cost Centre

                Product

  Risk Organisation Unit
                                                                       => Replicate
          Business Unit

             HCS Entity

           Set of Books

                           0   37,500,000   75,000,000   112,500,000         150,000,000
We use a variant on a
   Snowflake Schema to
 partition big stuff, that has
the same key and replicate
     small stuff that has
     crosscutting keys.
Replicate




Distribute
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre = ‘CC1’
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre = ‘CC1’


                                      LBs[]=getLedgerBooksFor(CC1)
                                      SBs[]=getSourceBooksFor(LBs[])
                                      So we have all the bottom level
                                      dimensions needed to query facts



                                   Transactions


                                         Mtms


                                     Cashflows



                                Partitioned
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre = ‘CC1’


                                      LBs[]=getLedgerBooksFor(CC1)
                                      SBs[]=getSourceBooksFor(LBs[])
                                      So we have all the bottom level
                                      dimensions needed to query facts



                                   Transactions

                         Get all Transactions and
                                         Mtms
                         MTMs (cluster side join) for
                         the passed Source Books
                                      Cashflows



                                Partitioned
Select Transaction, MTM, ReferenceData From
                      MTM, Transaction, Ref Where Cost Centre = ‘CC1’


Populate raw facts                                          LBs[]=getLedgerBooksFor(CC1)
(Transactions) with                                         SBs[]=getSourceBooksFor(LBs[])
dimension data
                                                            So we have all the bottom level
before returning to
                                                            dimensions needed to query facts
client.


                                                         Transactions

                                               Get all Transactions and
                                                               Mtms
                                               MTMs (cluster side join) for
                                               the passed Source Books
                                                            Cashflows



                                                      Partitioned
Replicated                  Partitioned
                  Java
                  client


Dimensions
                   Facts
                   API




We never have to do a distributed join!
So all the big stuff is
  held paritioned



   And we can join
without shipping keys
  around and having
 intermediate results
Trader
                             Party

                         Trade




Trade
   Trader
                      Party
Trader
          Party
         Version 1
      Trade
               Trader
                         Party
         Version 2
                     Trade
                              Trader
                                        Party
         Version 3
                                    Trade
                                             Trader
                                                       Party
   Version 4
                                                   Trade
Trade
            Trader
         Party


         Party
   Trader
Trade


         Party
                  Trader

Trade
         Party
Valuation Legs

            Valuations

rt Transaction Mapping

    Cashflow Mapping

            Party Alias
                                                       Facts
           Transaction

            Cashflows

                  Legs

                Parties       This is a dimension
          Ledger Book
                               •  It has a different
          Source Book

           Cost Centre            key to the Facts.
   Dimensions
               Product
                               •  And it’s BIG
 Risk Organisation Unit

         Business Unit

            HCS Entity

          Set of Books

                          0                                     125,000,000
Party Alias



               Parties



         Ledger Book



         Source Book



          Cost Centre



              Product



Risk Organisation Unit



        Business Unit



           HCS Entity



         Set of Books


                         0   1,250,000   2,500,000   3,750,000   5,000,000
Party Alias



               Parties



         Ledger Book



         Source Book



          Cost Centre



              Product



Risk Organisation Unit



        Business Unit



           HCS Entity



         Set of Books


                         20   1,250,015   2,500,010   3,750,005   5,000,000
So we only replicate
‘Connected’ or ‘Used’
     dimensions
Processing Layer
                       Dimension Caches
                          (Replicated)


                                    Transactions




                                                     Data Layer
As new Facts are added                    Mtms
relevant Dimensions that
they reference are moved
                                      Cashflows
to processing layer caches


                                 Fact Storage
                                 (Partitioned)
Query Layer
     Save Trade
                     (With connected
                                     dimension Caches)

                                     Data Layer
Cache
              Trade
                 (All Normalised)
Store

                                            Partitioned 
              Trigger
   Source             Cache
     Party                        Ccy
     Alias
               Book
Query Layer
                              (With connected
                              dimension Caches)

                              Data Layer
         Trade
               (All Normalised)



Party             Source   Ccy
Alias
             Book
Query Layer
                                        (With connected
                                        dimension Caches)

                                        Data Layer
         Trade
                         (All Normalised)



Party              Source            Ccy
Alias
              Book
 

          Party
            Ledger
           
                 Book
‘Connected Replication’
    A simple pattern which
recurses through the foreign
 keys in the domain model,
 ensuring only ‘Connected’
  dimensions are replicated
Java
               client

Java schema
    API
     Java ‘Stored
                          Procedures’
                         and ‘Triggers’
Partitioned
 Storage
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database

Contenu connexe

Tendances

Couchbase presentation
Couchbase presentationCouchbase presentation
Couchbase presentation
sharonyb
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
Dipti Borkar
 

Tendances (20)

Couchbase presentation
Couchbase presentationCouchbase presentation
Couchbase presentation
 
Scalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query SpeedScalability of Amazon Redshift Data Loading and Query Speed
Scalability of Amazon Redshift Data Loading and Query Speed
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
Getting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex QueriesGetting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex Queries
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
 
Stream Application Development with Apache Kafka
Stream Application Development with Apache KafkaStream Application Development with Apache Kafka
Stream Application Development with Apache Kafka
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
 
NoSQL and Couchbase
NoSQL and CouchbaseNoSQL and Couchbase
NoSQL and Couchbase
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
New Database Migration Services & RDS Updates
New Database Migration Services & RDS UpdatesNew Database Migration Services & RDS Updates
New Database Migration Services & RDS Updates
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
 

Similaire à Balancing Replication and Partitioning in a Distributed Java Database

Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
Ben Stopford
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
Roman Elizarov
 
Memory-Based Cloud Architectures
Memory-Based Cloud ArchitecturesMemory-Based Cloud Architectures
Memory-Based Cloud Architectures
小新 制造
 
Acunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFP
Acunu
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
Christopher Choi
 

Similaire à Balancing Replication and Partitioning in a Distributed Java Database (20)

Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
 
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...
 
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
 
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
Basho and Riak at GOTO Stockholm:  "Don't Use My Database."Basho and Riak at GOTO Stockholm:  "Don't Use My Database."
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
Top Technology Trends
Top Technology Trends Top Technology Trends
Top Technology Trends
 
Databases for Storage Engineers
Databases for Storage EngineersDatabases for Storage Engineers
Databases for Storage Engineers
 
Memory-Based Cloud Architectures
Memory-Based Cloud ArchitecturesMemory-Based Cloud Architectures
Memory-Based Cloud Architectures
 
NoSQL
NoSQLNoSQL
NoSQL
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
OpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosOpenStack and OpenFlow Demos
OpenStack and OpenFlow Demos
 
NoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
NoSQL Data Stores: Introduzione alle Basi di Dati Non RelazionaliNoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
NoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
 
Seattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / CassandraSeattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / Cassandra
 
Top Technology Trends for Virtualization dallas
Top Technology Trends for Virtualization dallasTop Technology Trends for Virtualization dallas
Top Technology Trends for Virtualization dallas
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
 
Re-inventing the Database: What to Keep and What to Throw Away
Re-inventing the Database: What to Keep and What to Throw AwayRe-inventing the Database: What to Keep and What to Throw Away
Re-inventing the Database: What to Keep and What to Throw Away
 
There is NO CLOUD: Geeky Version
There is NO CLOUD: Geeky VersionThere is NO CLOUD: Geeky Version
There is NO CLOUD: Geeky Version
 
Software Defined Data Centers - June 2012
Software Defined Data Centers - June 2012Software Defined Data Centers - June 2012
Software Defined Data Centers - June 2012
 
Acunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFP
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
 

Plus de Ben Stopford

NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
Ben Stopford
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojure
Ben Stopford
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
Ben Stopford
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
Ben Stopford
 

Plus de Ben Stopford (20)

10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices Generation
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka Streams
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful Streams
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojure
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
Big Data & the Enterprise
Big Data & the EnterpriseBig Data & the Enterprise
Big Data & the Enterprise
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?
 
Ideas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental DivideIdeas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental Divide
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 

Dernier

Dernier (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Balancing Replication and Partitioning in a Distributed Java Database

  • 1.
  • 2. In-memory databases offer significant gains in But three issues performance as all data is have stunted their freely available. There is no uptake: Address Traditional disk- spaces only being need to page to and from disk. oriented database large enough for a architecture is subset of a typical showing its age. This makes joins a user’s data. The ‘one problem. When data more bit’ problem must be joined across and durability. multiple machines performance degradation Snowflake is inevitable. Distributed in- Schemas allow memory databases us to mix But this model only goes solve these three Partitioning so far. “Connected problems but at the and Replication Replication” takes us a price of loosing the so joins never step further allowing us single address space. hit the wire. to make the best possible use of replication.
  • 3.
  • 4.
  • 5. The lay of the land: The main architectural constructs in the database industry
  • 6.
  • 8.
  • 9.
  • 10. ms μs ns ps 1MB Disk/Network 1MB Main Memory 0.000,000,000,000 Cross Continental Main Memory L1 Cache Ref Round Trip Ref Cross Network L2 Cache Ref Round Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 20.
  • 21. Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al
  • 22.
  • 23. Shared Nothing Teradata, Vertica, Greenplumb… SN Regular In-Memory In-Memory Database Database Drop Disk Exasol, VoltDB, Oracle, Sybase, MySql Times Ten, HSQL, KDB Distribute Hana ODC Distributed Caching Coherence, Gemfire, Gigaspaces
  • 24. Distributed Architecture Simplify the Contract. Stick to RAM
  • 25. 450 processes 2TB of RAM Oracle Coherence Messaging (Topic Based) as a system of record (persistence)
  • 26. Access Layer Java Java client client API API Query Layer Transactions Data Layer Mtms Cashflows Persistence Layer
  • 27. Indexing Partitioning Replication
  • 28. But your storage is limited by the memory on a node
  • 29. Keys Fs-Fz Keys Xa-Yd Scalable storage, bandwidth and processing
  • 30.
  • 31.
  • 32.
  • 33. Trader Party Version 1 Trade Trader Party Version 2 Trade Trader Party Version 3 Trade Trader Party Version 4 Trade …and you need versioning to do MVCC
  • 34. Trade Trader Party Party Trader Trade Party Trader Trade Party
  • 35. So better to use partitioning, spreading data around the cluster.
  • 36. Trader Party Trade Trade Trader Party
  • 37. Trader Party Trade Trade Trader Party
  • 38.
  • 39.
  • 40.
  • 41. ! This is what using Snowflake Schemas and the Connected Replication pattern is all about!
  • 42.
  • 43.
  • 44. Crosscutting Keys Common Keys
  • 45. Replicated Trader Party Trade Partitioned
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52. Valuation Legs Valuations art Transaction Mapping Cashflow Mapping Facts: Party Alias Transaction =>Big, Cashflows common Legs Parties keys Ledger Book Source Book Dimensions Cost Centre Product =>Small, Risk Organisation Unit Business Unit crosscutting HCS Entity Keys Set of Books 0 37,500,000 75,000,000 112,500,000 150,000,000
  • 53.
  • 54. Coherence’s KeyAssociation gives us this Trades MTMs Common Key
  • 55. Replicated Trader Party Trade Partitioned (
  • 56. Query Layer Trader Party Trade Transactions Data Layer Mtms Cashflows Fact Storage (Partitioned)
  • 57. Dimensions (repliacte) Transactions Mtms Facts Cashflows (distribute/ partition) Fact Storage (Partitioned)
  • 58. Valuation Legs Valuations Facts: art Transaction Mapping Cashflow Mapping Party Alias =>Big =>Distribute Transaction Cashflows Legs Parties Ledger Book Source Book Dimensions =>Small Cost Centre Product Risk Organisation Unit => Replicate Business Unit HCS Entity Set of Books 0 37,500,000 75,000,000 112,500,000 150,000,000
  • 59. We use a variant on a Snowflake Schema to partition big stuff, that has the same key and replicate small stuff that has crosscutting keys.
  • 61. Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’
  • 62.
  • 63. Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ LBs[]=getLedgerBooksFor(CC1) SBs[]=getSourceBooksFor(LBs[]) So we have all the bottom level dimensions needed to query facts Transactions Mtms Cashflows Partitioned
  • 64. Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ LBs[]=getLedgerBooksFor(CC1) SBs[]=getSourceBooksFor(LBs[]) So we have all the bottom level dimensions needed to query facts Transactions Get all Transactions and Mtms MTMs (cluster side join) for the passed Source Books Cashflows Partitioned
  • 65.
  • 66. Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ Populate raw facts LBs[]=getLedgerBooksFor(CC1) (Transactions) with SBs[]=getSourceBooksFor(LBs[]) dimension data So we have all the bottom level before returning to dimensions needed to query facts client. Transactions Get all Transactions and Mtms MTMs (cluster side join) for the passed Source Books Cashflows Partitioned
  • 67.
  • 68. Replicated Partitioned Java client Dimensions Facts API We never have to do a distributed join!
  • 69. So all the big stuff is held paritioned And we can join without shipping keys around and having intermediate results
  • 70. Trader Party Trade Trade Trader Party
  • 71. Trader Party Version 1 Trade Trader Party Version 2 Trade Trader Party Version 3 Trade Trader Party Version 4 Trade
  • 72. Trade Trader Party Party Trader Trade Party Trader Trade Party
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78. Valuation Legs Valuations rt Transaction Mapping Cashflow Mapping Party Alias Facts Transaction Cashflows Legs Parties This is a dimension Ledger Book •  It has a different Source Book Cost Centre key to the Facts. Dimensions Product •  And it’s BIG Risk Organisation Unit Business Unit HCS Entity Set of Books 0 125,000,000
  • 79.
  • 80.
  • 81.
  • 82.
  • 83. Party Alias Parties Ledger Book Source Book Cost Centre Product Risk Organisation Unit Business Unit HCS Entity Set of Books 0 1,250,000 2,500,000 3,750,000 5,000,000
  • 84. Party Alias Parties Ledger Book Source Book Cost Centre Product Risk Organisation Unit Business Unit HCS Entity Set of Books 20 1,250,015 2,500,010 3,750,005 5,000,000
  • 85.
  • 86. So we only replicate ‘Connected’ or ‘Used’ dimensions
  • 87. Processing Layer Dimension Caches (Replicated) Transactions Data Layer As new Facts are added Mtms relevant Dimensions that they reference are moved Cashflows to processing layer caches Fact Storage (Partitioned)
  • 88.
  • 89. Query Layer Save Trade (With connected dimension Caches) Data Layer Cache Trade (All Normalised) Store Partitioned Trigger Source Cache Party Ccy Alias Book
  • 90. Query Layer (With connected dimension Caches) Data Layer Trade (All Normalised) Party Source Ccy Alias Book
  • 91. Query Layer (With connected dimension Caches) Data Layer Trade (All Normalised) Party Source Ccy Alias Book Party Ledger Book
  • 92. ‘Connected Replication’ A simple pattern which recurses through the foreign keys in the domain model, ensuring only ‘Connected’ dimensions are replicated
  • 93.
  • 94. Java client Java schema API Java ‘Stored Procedures’ and ‘Triggers’
  • 95.
  • 96.

Notes de l'éditeur

  1. Big data sets are held distributed and only joined on the grid to collocated objects. Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)
  2. Big data sets are held distributed and only joined on the grid to collocated objects. Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)