SlideShare une entreprise Scribd logo
1  sur  55
Achieving 100,000
Transactions Per Second
 with a NoSQL Database



         Eric David Bloch
           @eedeebee

           19 jun 2012
A bit about me

• I’ve written software used by millions of people.

    Apps, libraries, compilers, device drivers, operating systems

• This is my second QCon and my first talk

• I’m the Community Director at MarkLogic, last 2 years.

• Born here in NY in 1965; now in CA

• I survived having 3 kids in less than 2 years.
Me, 1967
Me, today
Musical Form for the Talk

A: Why?


B: How?
  Whirl-wind database architecture tour
  Melody from part A again
  Techniques to get to 100K
 It’s about money.




 Top 5 bank needed to manage trades
 Trades look more like documents than tables
 Schemas for trades change all the time
 Transactions
 Scale and velocity (“Big Data”)
Trades and Positions


 1 million trades per day
 Followed by 1 million position reports at end of day
    Roll up trades of current date for each “book, instrument” pair
    Group-by, with key = “date, book, instrument”

                  1M positions
         1M trades




               Day         Day   Day    Day    Day     ...
                1           2     3      4      5
Trades and Positions

<trade>
   <quantity>8540882</quantity>
   <quantity2>1193.71</quantity2>
   <instrument>WASAX</instrument>
   <book>679</book>
   <trade-date>2011-03-13-07:00</trade-date>
   <settle-date>2011-03-17-07:00</settle-date>
</trade>


<position>
  <instrument>EAAFX</instrument>
  <book>679</book>
  <quantity>3</quantity>
  <business-date>2011-03-25Z</business-date>
  <position-date>2011-03-24Z</position-date>
</position>
Now show us

• 15K inserts per second
• Linear scalability
Requirements

   NoSQL flexibility,
 performance & scale

    Enterprise-grade
transactional guarantees
What NoSQL, OldSQL, or NewSQL
database out there can we use?
Your database has a large
effect on how you see the world
What is MarkLogic?


 Non-relational, document-oriented, distributed
  database
            Shared nothing clustering, linear scale out
            Multi-Version Concurrency Control (MVCC)
            Transactions (ACID)


 Search Engine
               Web scale (Big Data)
               Inverted indexes (term lists)
               Real-time updates
               Compose-able queries

Slide 18       Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Architecture


                                                                                    Question
 Data model
                                                                             What does data model
 Indexing                                                                 have to do with scalability?

 Clustering
 Query execution
 Transactions



Slide 19   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Key (URI)                    Value (Document)

/trade/153748994             <trade>
                              <id>8</id>
                              <time>2012-02-20T14:00:00</time>
                              <instrument>BYME AAA</instrument>
                              <price cur=“usd”>600.27</price>
                             </trade>



/user/eedeebee               {
                                 “name” : “Eric Bloch”,
                                 “age” : 47,
                                 “hair” : “gray”,
                                 “kids” : [ “Grace”, “Ryan”, “Owen” ]
                             }

/book5293                    It was the best of times, it was the worst of times, it
                             was the age of wisdom, ...


/2012-02-20T14:47:53/01445   .mp3
                             .avi
                             [your favorite binary format]
Inverted Index



“which”                                          123, 127, 129, 152, 344, 791 . . .

“uniquely”                                       122, 125, 126, 129, 130, 167 . . .

“identify”                                       123, 126, 130, 142, 143, 167 . . .

“each”                                           123, 130, 131, 135, 162, 177 . . .   Document
“uniquely identify”
                                                 126, 130, 167, 212, 219, 377 . . .   References
<article>                                        ...
                                                                                      126, 130, 167, 212, 219, 377 . . .
article/abstract/@author                         ...

<product>IMS</product>




  Slide 21   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Range Index

                  <trade>
                   <trader_id>8</trader_id>                                       Value     Docid
                   <time>2012-02-20T14:00:00</time>
                   <instrument>IBM</instrument>                                   0         287
                   …
                  </trade>                                                        8         1129
                                                                                  13        531    Docid Value
                  <trade>
                   <trader_id>13</trader_id>                                      …         …      287    0
                   <time>2012-02-20T14:30:00</time>
Rows               <instrument>AAPL</instrument>                                  …         …      531    13
                   …                                                                               1129   8
                  </trade>
                                                                                                   …      …
                  <trade>
                   <trader_id>0</trader_id>                                                        …      …
                   <time>2012-02-20T15:30:00</time>
                   <instrument>GOOG</instrument>                                      • Column Oriented
                   …                                                                  • Memory Mapped
                  </trade>


       Slide 22   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Shared-Nothing Clustering


                                                                           Host E1




     Host D1                                        Host D2                          Host D3   …   Host Dj




Slide 23   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Query Evaluation – “Map”


                                         Q
      Host E1                                         Host E2                      Host E3   …       Host Ei




         Host D1                                        Host D2                    Host D3            Host Dj
                                                                                             …

Q                                        Q                                     Q                 Q
      F1 … Fn                                       F1 … Fn                        F1 … Fn           F1 … Fn
    Slide 24   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Query Evaluation – “Reduce”

                                           Top
                                           10
        Host E1                                         Host E2                    Host E3   …     Host Ei




           Host D1                                        Host D2                  Host D3             Host Dj
                                                                                             …
Top                                             Top                              Top             Top
10                                              10                               10              10
        F1 … Fn                                       F1 … Fn                      F1 … Fn             F1 … Fn
      Slide 25   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Queries/Updates with MVCC


 Every query has a timestamp
 Documents do not change

    Reads are lock-free
    Inserts – see next slide
    Deletes – mark as deleted
    Edits –
            copy
            edit
            insert the copy
            mark the original as deleted

Slide 26    Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Insert Mechanics


1)         New URI+Document arrive at E-node
2)         URI Probe – determine whether URI exists in any forest
3)         URI Lock – write locks taken on D node(s)
4)         Forest Assignment – URI is deterministically placed in Forest
5)         Indexing
6)         Journaling
7)         Commit – transaction complete
8)         Release URI Locks – D node(s) are notified to release lock




Slide 27   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Save-and-merge
       (Log Structured Tree Merge)

                                                                                  Database
                          Insert/
                          Update



                                                            Forest 1                              Forest 2
                                                 S0
Memory                         Save

Disk
       Journaled
                                                 S1                 S2                       S1     S2
                                                                                    S3                       S3

                                          Merge
                  J

       Slide 28   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Back to
the money
Trades and Positions

 1 million trades per day
 followed by 1 million position reports at end of day
        Roll up trades of the current “date” for each “book:instrument” pair
        Group-by, with key = “book:date:instrument”


                           1M positions
                  1M trades




                                Day                    Day                 Day   Day   Day   ...
                                 1                      2                   3     4     5


Slide 30   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Naive Query Pseudocode



for each book
for each instrument in that book
  position = position(yesterday, book, instrument)
  for each trade of that instrument in this book
   position += trade(today, book, instrument).quantity
  insert(today, book, instrument, position)




    Slide 32   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Initial results




Single node – 19,000 inserts per second




Slide 33   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Initial results with cluster


           70000


           60000


           50000


           40000
 doc/sec




           30000                                                                                 Report Query 2


           20000


           10000


               0
                                      1DE                                        2DE       3DE
                                                                              # of nodes




Slide 34      Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Techniques to get to 100K


 Insert Query for Computing New Positions
            Materialized compound key, Co-Occurrence Query and Aggregation
 Insert of New Positions
            Batching
            Optimized insert mechanics




Slide 35    Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Materializing a compound key

<trade>

   <quantity2>1193.71</quantity2>
   <instrument>WASAX</instrument>
   <book>679</book>
   <trade-date>2011-03-13-07:00</trade-date>
   <settle-date>2011-03-17-07:00</settle-date>
</trade>




  Slide 36   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Materializing a compound key

<trade>
   <roll-up book-date-instrument=“151333445566782303”/>
   <quantity2>1193.71</quantity2>
   <instrument>WASAX</instrument>
   <book>679</book>
   <trade-date>2011-03-13-07:00</trade-date>
   <settle-date>2011-03-17-07:00</settle-date>
</trade>




 Slide 37   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Co-Occurrence and
       Distributed Aggregation
<trade>
 <roll-up
    book-date-instrument=“151373445566703”/>                                       V        D
                                                                                       D         V
 <quantity>8540882</quantity>                                                      …        …
 <instrument>WASAX</instrument>                                                        …        …
                                                                                   …        1129
 <book>679</book>                                                                      …        …
 <trade-date>2011-03-13-07:00</trade-date>                                         …         …
                                                                                       15137…    …
 <settle-date>2011-03-17-07:00</settle-date>
                                                                                   …        …
</trade>                                                                               …         …
                                                                                   …        …
                                                                                       …         …

• Co-occurrences:
                                                                                                 V         D
     Find pairings of range indexed values                                                            D        V
                                                                                                 …         …
• Aggregate on the D nodes (Map/Reduce):
                                                                                                 8540882
                                                                                                       …   1129 …
     Sum up the quantities above
                                                                                                      …        …
                                                                                                 …         …
                                                                                                      …        8540882
• Similar to a Group-by,                                                                         …         …
     • in a column-oriented, in-memory                                                           …    …    …   …
     database                                                                                         …        …


        Slide 38   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Initial results with new query




      > 30K inserts/second on a single node




Slide 39   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Co-Occurrence + Aggregate versus
Naïve approach

           70000


           60000


           50000


           40000
 doc/sec




                                                                                                 Report Query 3
           30000
                                                                                                 Report Query 2

           20000


           10000


               0
                                      1DE                                        2DE       3DE
                                                                              # of nodes




Slide 40      Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Techniques


 Computing Positions
            Materialized compound key, Co-Occurrence Query and Aggregation
 Updates
            Batching
            Optimized insert mechanics




Slide 41    Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Transaction Size and
Throughput

                  35000


                  30000


                  25000
 #docs per sec.




                  20000


                  15000                                                                                                   insert throughput


                  10000


                   5000


                      0
                                1           2            4            8              16   32   100   500   1000   10000
                                                                   # doc inserts per transaction




Slide 42             Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Techniques


 Computing Positions
            Materialized compound key, Co-Occurrence Query and Aggregation
 Updates (Transaction)
            Batching
            Optimized insert mechanics




Slide 43    Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Insert Mechanics, Again


1)         New URI+Document arrive at E-node
2)         *URI Probe – determine whether URI exists in any forest
3)         *URI Lock – write locks are created
4)         Forest Assignment – URI is deterministically placed in Forest
5)         Indexing
6)         Journaling
7)         Commit – transaction complete
8)         *Release URI Locks – D nodes are notified to release lock



* Overhead of these operations increases with cluster size


Slide 44   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Deterministic Placement


4) Forest Assignment – URI is deterministically placed in Forest

                             Hash                                          64-bit   Bucketed        Fi
  URI
                             function                                      number   into a Forest




  • Done in C++ within server
  • But…
       • Can also be done in the client
       • Server allows queries to be evaluated against only one
       forest…




Slide 45   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Optimized Insert


                                                        Q
                                                                           E




                                        D1                                              D2


                                                                               Q
                         F1                                F2                      F3        F4



Slide 46   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Optimized Insert Mechanics


1) New URI+Document arrive at E-node
           a)      Compute Fi using
           b)      Ask server to evaluated the insert query only against Fi
2)         URI Probe – Fi Only
3)         URI Lock – Fi Only
4)         Forest Assignment – Fi Only
5)         Indexing
6)         Journaling
7)         Commit – transaction complete
8)         Lock Release - Fi Only



Slide 47    Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Regular Insert Vs. Optimized Insert


               70000


               60000


               50000
 docs/second




               40000


                                                                                                     regular insert
               30000
                                                                                                     in-forest-eval
                                                                                                     In-Forest Eval

               20000


               10000


                   0
                                           1DE                                       2DE       3DE
                                                                                  # of nodes




Slide 48          Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Tada! Achieving 100K Updates




                                                                           In-Forest Eval




Slide 49   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
We weren’t content with 15K


So we showed them…




                     A quote from the bank
“We threw everything we had at MarkLogic and it didn’t break a sweat”


Slide 50   Copyright © 2012 MarkLogic® Corporation. All rights reserved.
Enterprise-grade NoSQL document-oriented database

   with real-time full-text search and transactions



        Doing 100K transactions per second
Thank You!

@eedeebee
http://community.marklogic.com
eric.bloch@marklogic.com




Slide 55   Copyright © 2012 MarkLogic® Corporation. All rights reserved.

Contenu connexe

Similaire à Edb 100k-trans-qcon-rev1

O leary2012 comp_ppt_ch07
O leary2012 comp_ppt_ch07O leary2012 comp_ppt_ch07
O leary2012 comp_ppt_ch07
Dalia Saeed
 
Internal training - Eda
Internal training - EdaInternal training - Eda
Internal training - Eda
Tony Vo
 
Bending The Curve
Bending The CurveBending The Curve
Bending The Curve
finteligent
 
Model Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & FutureModel Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & Future
elliando dias
 
Mock Objects from Concept to Code
Mock Objects from Concept to CodeMock Objects from Concept to Code
Mock Objects from Concept to Code
Rob Myers
 
DIADEM WWW 2012
DIADEM WWW 2012DIADEM WWW 2012
DIADEM WWW 2012
Giorgio Orsi
 

Similaire à Edb 100k-trans-qcon-rev1 (20)

O leary2012 comp_ppt_ch07
O leary2012 comp_ppt_ch07O leary2012 comp_ppt_ch07
O leary2012 comp_ppt_ch07
 
Internal training - Eda
Internal training - EdaInternal training - Eda
Internal training - Eda
 
Community based harversting for USDL
Community based harversting for USDLCommunity based harversting for USDL
Community based harversting for USDL
 
Domain Driven Design Development Spring Portfolio
Domain Driven Design Development Spring PortfolioDomain Driven Design Development Spring Portfolio
Domain Driven Design Development Spring Portfolio
 
Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...
Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...
Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...
 
Seed capital
Seed capitalSeed capital
Seed capital
 
Bending The Curve
Bending The CurveBending The Curve
Bending The Curve
 
Model Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & FutureModel Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & Future
 
UN/EDIFACT Interchange Processing with Smooks v1.4
UN/EDIFACT Interchange Processing  with Smooks v1.4UN/EDIFACT Interchange Processing  with Smooks v1.4
UN/EDIFACT Interchange Processing with Smooks v1.4
 
Project tracking and metrics on share point
Project tracking and metrics on share pointProject tracking and metrics on share point
Project tracking and metrics on share point
 
Elveego circuits
Elveego circuitsElveego circuits
Elveego circuits
 
Database Security Overview
Database Security OverviewDatabase Security Overview
Database Security Overview
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Building software defined clouds - Boyan Ivanov
Building software defined clouds - Boyan Ivanov  Building software defined clouds - Boyan Ivanov
Building software defined clouds - Boyan Ivanov
 
Salesforce & SAP Integration
Salesforce & SAP IntegrationSalesforce & SAP Integration
Salesforce & SAP Integration
 
Penerapan TIK Bagi Dunia Pendidikan
Penerapan TIK Bagi Dunia PendidikanPenerapan TIK Bagi Dunia Pendidikan
Penerapan TIK Bagi Dunia Pendidikan
 
Sap edi idoc
Sap edi idocSap edi idoc
Sap edi idoc
 
Mock Objects from Concept to Code
Mock Objects from Concept to CodeMock Objects from Concept to Code
Mock Objects from Concept to Code
 
DIADEM WWW 2012
DIADEM WWW 2012DIADEM WWW 2012
DIADEM WWW 2012
 
Fast track to the 9s via the cloud
Fast track to the 9s via the cloudFast track to the 9s via the cloud
Fast track to the 9s via the cloud
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

Edb 100k-trans-qcon-rev1

  • 1. Achieving 100,000 Transactions Per Second with a NoSQL Database Eric David Bloch @eedeebee 19 jun 2012
  • 2. A bit about me • I’ve written software used by millions of people. Apps, libraries, compilers, device drivers, operating systems • This is my second QCon and my first talk • I’m the Community Director at MarkLogic, last 2 years. • Born here in NY in 1965; now in CA • I survived having 3 kids in less than 2 years.
  • 5. Musical Form for the Talk A: Why? B: How? Whirl-wind database architecture tour Melody from part A again Techniques to get to 100K
  • 6.
  • 7.  It’s about money.  Top 5 bank needed to manage trades  Trades look more like documents than tables  Schemas for trades change all the time  Transactions  Scale and velocity (“Big Data”)
  • 8. Trades and Positions  1 million trades per day  Followed by 1 million position reports at end of day  Roll up trades of current date for each “book, instrument” pair  Group-by, with key = “date, book, instrument” 1M positions 1M trades Day Day Day Day Day ... 1 2 3 4 5
  • 9. Trades and Positions <trade> <quantity>8540882</quantity> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> <position> <instrument>EAAFX</instrument> <book>679</book> <quantity>3</quantity> <business-date>2011-03-25Z</business-date> <position-date>2011-03-24Z</position-date> </position>
  • 10. Now show us • 15K inserts per second • Linear scalability
  • 11. Requirements NoSQL flexibility, performance & scale Enterprise-grade transactional guarantees
  • 12. What NoSQL, OldSQL, or NewSQL database out there can we use?
  • 13.
  • 14.
  • 15.
  • 16. Your database has a large effect on how you see the world
  • 17.
  • 18. What is MarkLogic?  Non-relational, document-oriented, distributed database  Shared nothing clustering, linear scale out  Multi-Version Concurrency Control (MVCC)  Transactions (ACID)  Search Engine  Web scale (Big Data)  Inverted indexes (term lists)  Real-time updates  Compose-able queries Slide 18 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 19. Architecture Question  Data model What does data model  Indexing have to do with scalability?  Clustering  Query execution  Transactions Slide 19 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 20. Key (URI) Value (Document) /trade/153748994 <trade> <id>8</id> <time>2012-02-20T14:00:00</time> <instrument>BYME AAA</instrument> <price cur=“usd”>600.27</price> </trade> /user/eedeebee { “name” : “Eric Bloch”, “age” : 47, “hair” : “gray”, “kids” : [ “Grace”, “Ryan”, “Owen” ] } /book5293 It was the best of times, it was the worst of times, it was the age of wisdom, ... /2012-02-20T14:47:53/01445 .mp3 .avi [your favorite binary format]
  • 21. Inverted Index “which” 123, 127, 129, 152, 344, 791 . . . “uniquely” 122, 125, 126, 129, 130, 167 . . . “identify” 123, 126, 130, 142, 143, 167 . . . “each” 123, 130, 131, 135, 162, 177 . . . Document “uniquely identify” 126, 130, 167, 212, 219, 377 . . . References <article> ... 126, 130, 167, 212, 219, 377 . . . article/abstract/@author ... <product>IMS</product> Slide 21 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 22. Range Index <trade> <trader_id>8</trader_id> Value Docid <time>2012-02-20T14:00:00</time> <instrument>IBM</instrument> 0 287 … </trade> 8 1129 13 531 Docid Value <trade> <trader_id>13</trader_id> … … 287 0 <time>2012-02-20T14:30:00</time> Rows <instrument>AAPL</instrument> … … 531 13 … 1129 8 </trade> … … <trade> <trader_id>0</trader_id> … … <time>2012-02-20T15:30:00</time> <instrument>GOOG</instrument> • Column Oriented … • Memory Mapped </trade> Slide 22 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 23. Shared-Nothing Clustering Host E1 Host D1 Host D2 Host D3 … Host Dj Slide 23 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 24. Query Evaluation – “Map” Q Host E1 Host E2 Host E3 … Host Ei Host D1 Host D2 Host D3 Host Dj … Q Q Q Q F1 … Fn F1 … Fn F1 … Fn F1 … Fn Slide 24 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 25. Query Evaluation – “Reduce” Top 10 Host E1 Host E2 Host E3 … Host Ei Host D1 Host D2 Host D3 Host Dj … Top Top Top Top 10 10 10 10 F1 … Fn F1 … Fn F1 … Fn F1 … Fn Slide 25 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 26. Queries/Updates with MVCC  Every query has a timestamp  Documents do not change  Reads are lock-free  Inserts – see next slide  Deletes – mark as deleted  Edits –  copy  edit  insert the copy  mark the original as deleted Slide 26 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 27. Insert Mechanics 1) New URI+Document arrive at E-node 2) URI Probe – determine whether URI exists in any forest 3) URI Lock – write locks taken on D node(s) 4) Forest Assignment – URI is deterministically placed in Forest 5) Indexing 6) Journaling 7) Commit – transaction complete 8) Release URI Locks – D node(s) are notified to release lock Slide 27 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 28. Save-and-merge (Log Structured Tree Merge) Database Insert/ Update Forest 1 Forest 2 S0 Memory Save Disk Journaled S1 S2 S1 S2 S3 S3 Merge J Slide 28 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 30. Trades and Positions  1 million trades per day  followed by 1 million position reports at end of day  Roll up trades of the current “date” for each “book:instrument” pair  Group-by, with key = “book:date:instrument” 1M positions 1M trades Day Day Day Day Day ... 1 2 3 4 5 Slide 30 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 31.
  • 32. Naive Query Pseudocode for each book for each instrument in that book position = position(yesterday, book, instrument) for each trade of that instrument in this book position += trade(today, book, instrument).quantity insert(today, book, instrument, position) Slide 32 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 33. Initial results Single node – 19,000 inserts per second Slide 33 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 34. Initial results with cluster 70000 60000 50000 40000 doc/sec 30000 Report Query 2 20000 10000 0 1DE 2DE 3DE # of nodes Slide 34 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 35. Techniques to get to 100K  Insert Query for Computing New Positions  Materialized compound key, Co-Occurrence Query and Aggregation  Insert of New Positions  Batching  Optimized insert mechanics Slide 35 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 36. Materializing a compound key <trade> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> Slide 36 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 37. Materializing a compound key <trade> <roll-up book-date-instrument=“151333445566782303”/> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> Slide 37 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 38. Co-Occurrence and Distributed Aggregation <trade> <roll-up book-date-instrument=“151373445566703”/> V D D V <quantity>8540882</quantity> … … <instrument>WASAX</instrument> … … … 1129 <book>679</book> … … <trade-date>2011-03-13-07:00</trade-date> … … 15137… … <settle-date>2011-03-17-07:00</settle-date> … … </trade> … … … … … … • Co-occurrences: V D Find pairings of range indexed values D V … … • Aggregate on the D nodes (Map/Reduce): 8540882 … 1129 … Sum up the quantities above … … … … … 8540882 • Similar to a Group-by, … … • in a column-oriented, in-memory … … … … database … … Slide 38 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 39. Initial results with new query > 30K inserts/second on a single node Slide 39 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 40. Co-Occurrence + Aggregate versus Naïve approach 70000 60000 50000 40000 doc/sec Report Query 3 30000 Report Query 2 20000 10000 0 1DE 2DE 3DE # of nodes Slide 40 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 41. Techniques  Computing Positions  Materialized compound key, Co-Occurrence Query and Aggregation  Updates  Batching  Optimized insert mechanics Slide 41 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 42. Transaction Size and Throughput 35000 30000 25000 #docs per sec. 20000 15000 insert throughput 10000 5000 0 1 2 4 8 16 32 100 500 1000 10000 # doc inserts per transaction Slide 42 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 43. Techniques  Computing Positions  Materialized compound key, Co-Occurrence Query and Aggregation  Updates (Transaction)  Batching  Optimized insert mechanics Slide 43 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 44. Insert Mechanics, Again 1) New URI+Document arrive at E-node 2) *URI Probe – determine whether URI exists in any forest 3) *URI Lock – write locks are created 4) Forest Assignment – URI is deterministically placed in Forest 5) Indexing 6) Journaling 7) Commit – transaction complete 8) *Release URI Locks – D nodes are notified to release lock * Overhead of these operations increases with cluster size Slide 44 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 45. Deterministic Placement 4) Forest Assignment – URI is deterministically placed in Forest Hash 64-bit Bucketed Fi URI function number into a Forest • Done in C++ within server • But… • Can also be done in the client • Server allows queries to be evaluated against only one forest… Slide 45 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 46. Optimized Insert Q E D1 D2 Q F1 F2 F3 F4 Slide 46 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 47. Optimized Insert Mechanics 1) New URI+Document arrive at E-node a) Compute Fi using b) Ask server to evaluated the insert query only against Fi 2) URI Probe – Fi Only 3) URI Lock – Fi Only 4) Forest Assignment – Fi Only 5) Indexing 6) Journaling 7) Commit – transaction complete 8) Lock Release - Fi Only Slide 47 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 48. Regular Insert Vs. Optimized Insert 70000 60000 50000 docs/second 40000 regular insert 30000 in-forest-eval In-Forest Eval 20000 10000 0 1DE 2DE 3DE # of nodes Slide 48 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 49. Tada! Achieving 100K Updates In-Forest Eval Slide 49 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 50. We weren’t content with 15K So we showed them… A quote from the bank “We threw everything we had at MarkLogic and it didn’t break a sweat” Slide 50 Copyright © 2012 MarkLogic® Corporation. All rights reserved.
  • 51. Enterprise-grade NoSQL document-oriented database with real-time full-text search and transactions Doing 100K transactions per second
  • 52.
  • 53.
  • 54.
  • 55. Thank You! @eedeebee http://community.marklogic.com eric.bloch@marklogic.com Slide 55 Copyright © 2012 MarkLogic® Corporation. All rights reserved.

Notes de l'éditeur

  1. Pain pointsThree tiered architectureHouse keeping &amp; AvailabilityStraw in a swimming pool
  2. Pain pointsThree tiered architectureHouse keeping &amp; AvailabilityStraw in a swimming pool
  3. Pain pointsThree tiered architectureHouse keeping &amp; AvailabilityStraw in a swimming pool
  4. word and phrase search, booleansearch, proximity, wildcarding, stemming, tokenization, decompounding, case-sensitivity options, punctuation-sensitivity options, diacritic-sensitivity options, document quality settings, numerous relevance algorithms, individual term weighting, topic clustering, faceted navigation, custom-indexed fields, and more.