13. Key Features
•Indexes on any attribute
•Dynamic Query Language
•Aggregation Framework
•Dynamic & Flexible Schemas
•Atomic Updates to documents
•Impedance Mismatch removed
•High Availability & Failover
•Strong consistency of data
•Horizonal Scale Out
•Hadoop Integration
14. Use Cases
Content%Management Operational%Intelligence E<Commerce
User%Data%Management High%Volume%Data%Feeds Mobile
15. Financial'Services'Use'Cases
•High Volume Data Feeds
•Tick Data capture
•Risk Analytics & Reporting
•Product Catalogs & Trade Capture
•P&L Reporting
•Reference Data Management
•Portfolio Management
•Quantitative Analysis
•Automated Trading
16. High Volume Data Feeds
Use Case:
•Ingesting data from different feeds sources - Internal / External
•Risk data, Market data, Order data etc
Example,)FIX)to)JSON
•Any format Fix / FpML / Swift or own
"NewOrder-Single" : {
•Formats vary over time "Header" : {
"BeginString" : "FIX.4.2",
• Eg Fix 4.2 to 5.0 "BodyLength" : 190,
"MsgType" : "D",
•Tick Data "HeaderFields" : {
"SenderCompID" : "Client",
•Aggregate data from feeds "TargetCompID" : "TradingGateway",
"MsgSeqNum" : 4,
"SendingTime" : {
"UTCFormat_2051-100" :
Why MongoDB? "Fri Jun 01 09:36:26 BST 2012"
}.....
•High ingestion rates of data }
}
•Flexible schema - no db change when messages change
•eg Single collection can maintain multiple FIX formats
•Query:
•Query Language / Aggregation Framework
•Or use internal MR or hadoop to batch process - quantative analysis
17. Risk Analytics & Reporting
Use Case:
•Collect and aggregate risk data
•Calculate risk / exposures
•Potentially real time
Why MongoDB?
•Collect data from a single or multiple sources
•Different formats
•Documents used to create ‘pre-aggregated’ reports
•Real Time
•Aggregation Framework for reporting
•e.g. exposure for a counter party
•Internal MR or Hadoop connector
•Batch process risk data
18. Product Catalogs and Trade
Capture
Use Case:
•Catalogs of complex financial products
•‘Exotics’ difficult to model in relational db.
•‘On-boarding’ new products in hours.
•RDBMS less flexible for complex products that may require >~50 tables.
What’s the impact from technology
•Once create how do we capture the details of a new trade?
Why MongoDB?
•Flexible schema means we don’t need to go back to the database
when we have a product to sell.
•Single collection for all products... even if they vary greatly
•Trades potentially exist for long periods
•newer trade can have different data with not impact on the db.
19. Portfolio / Position reporting
Use Case:
•Store positions or portfolio information
•Query to find current positions/portfolios
•Query by client or trader
Why MongoDB?
•Customer/client my have many different products
•Aggregation Framework to calculate values and views
•Work on extremely large data sets
•Current and historic data
20. Reference Data Management
Use Case:
•Global distribute Reference Data across organisation
•Manage replication across of data centres
•Provide fast read access
Why MongoDB?
•Sharding and replication to distribute data
•Access locally for high performance reads
•Fast replication of data
•Unstructured reference data easily replicated.
•New items/formats replicated without schema migrations
NYC LON HK
Primary Secondary Secondary
Secondary Primary Secondary
21. Quantative Analysis / Automated
Trading
Use Case:
• Real time and history tick data. BID/OFFER)to)Candlesticks
• Strategy testing {
"_id" : ObjectId("4f4b8916fb1c80e141ea6201"),
• Automated signals "ask" : 1.30028,
"bid" : 1.3002,
"ts" : ISODate("2012-02-16T12:48:00Z")
}
Why MongoDB?
•Aggregation Framework for shape data
18018
•Bid/Offer -> Candlesticks
•MR for batch processing data
•Internal MR or Hadoop
23. Aggregation Framework
•Much simpler and faster than MongoDB map reduce
•Replaces common MR use cases in MongoDB
•Native operators in the MongoDB core
db.portfolio.aggregate(
{ $match : { userid : “roberts123” } } ,
{ $group : { _id : "$position" ,
total : { $sum : “$val” } } } )
24. Sharded MongoDB + Hadoop
Shard&1 Shard&2 Shard&3 Shard&4 Shard&5
c z t f v w y
a s u g e h d b x
Hadoop Hadoop Hadoop Hadoop Hadoop
Node Node Node Node Node
Hadoop Hadoop Hadoop Hadoop
Node Node Node Node
25. Summary
Document-Oriented
Dynamic schema
High Volume Data Feeds
Agile Tick Data capture
Flexible Risk Analytics
High Performance Product Catalogs & Trade
Highly Available Capture
Horizontal Scale Out P&L Reporting
Reference Data Management
Portfolio Management
Quantitative Analysis
Automated Trading