SlideShare une entreprise Scribd logo
1  sur  22
MongoDB
        Use Cases
 Healthcare, CMS, Analytics

     Thomas O‟Rourke
 Upstream Innovations Ltd.
       Oulu / Seattle
www.dashwire.com
Dashwire Dashconfig
• Users configure their mobile phones on PC.
   o Email accounts, wallpapers, ringtones, bookmarks, contacts, etc.
   o Generates a lot of data!


• Wanted: Google Analytics + Splunk + BI.
   o Sensitive data:
       • Can‟t send out => No Google Analytics.
   o Many sources
       • (Server log files, SQS, Web analytics, etc.)
   o internal error report &
       • UI issues (powerful paradigm)
   o Real time vs. Reports/Enterprise
• ~500,000 events a day
   o Store for year
Solution
• Eco-system in Mongo
   o Evolved


• Layered architecture
   o L1. Store - “De-duplication.
       • Streaming live (syslog)
       • Playback of log files
   o L2. Parsing into key/value pairs.
   o L3. Processing.
   o L4. Reports.


• Trade-offs for real-time
   o Reconciler
   o Trade offs for real time and offline
Tools
•   MongoDB 
•   Ruby
•   Sinatra
•   Ruby driver
    o (Connection pooling, multithreaded, replica set support)
•   Event machine + em-mongo
•   ZeroMQ
•   Sinatra/Rack/Thin
•   Mixpanel
•   Server density
•   Excel
•   Highcharts
•   softlayer
Eco system
                            Syslog
                                                      Playback



Integrity                        Store strings with
Checks                              timestamps
                                  No Duplicates
Once day


                             Process to key/value pairs




               Sanitize/
             intermediate
                                                                  Real time
                                       External
                                                                   charts
                                       interface

             App specific
               reports




     Excel, etc.            Daily/weekly
Parsing logs
"2012-08-17 13:08:11 app02 Passngr[20167]: I script(www-data) --
{”analytics":{"scenario":"three","initial scenario":"three","phone":”Cool
Phone","name":"Facebook","time":"2012-08-17 18:08:11.399 UTC","event":"Bookmark
Added","browser_tracking_id":"857b307a4d1xxxxx08ebca70f6","browser_time":"2012-08-17
18:08:14.794 UTC","browser_event":1,"session_id":"68528379d5xxxxxxxcda27fd625fe"}}"




                              JSON.parse( )

                                                         Collection =
                                                         Event_Bookmark_Added
                   {
                       scenario: “three”,
                       phone : “Cool phone”,
                       event : “Bookmark Added”,
                       session_id :
                   .
                   .. }
De-duplication
• Multikey index
    o Integers perform well
        • MD5 of entire log line as string (only use half of result)
        • Unix time stamp (seconds)
        • Fraction of second (if one is present)
   • Better to use millisecond but not required

@collections[collection].create_index(
         [ [:ts, Mongo::ASCENDING],
         [:ts_frac, Mongo::ASCENDING],
         [:dhash, Mongo::ASCENDING ] ],
         { :unique => true, :drop_dups => true} )
Process pattern

    Pre allocate “processed : 0”
    At insert time (creation)
         @collections[collection].insert( doc )


                                                  Index (no dup)




                       process
Reports
• Needed both Real time and Enterprise (Excel Reports)
   o We use MongoDB for both and all intermediate tables
• Reports
   o Map/Reduce for Reports and Graphs
   o Considered MySQL but rejected as unnecessary
   o Write Excel (*.xlsx) directly using Ruby and accessing MongoBD.
      • https://github.com/randym/axlsx
• Real-time
   o Incremental Map/Reduce gives performance to do real time graphs.
       • http://www.highcharts.com
Server Density
PART 2
       Technical Discussion
•   Performance
•   Durability
•   Replica sets
•   Maintenance
•   Transactions
•   Drivers and Languages
•   Demos
Performance
• ~3000 inserts a second for unsafe mode.
• < 1000 for safe mode.
• Indexes = memory.
• Use slaves when possible for reads (note:
  consistency)
• Your driver makes a HUGE difference.
• Pre-allocate for updates!
• Safe mode is much slower
    o Not everything is required to be 100% safe
    o Not everything is unsafe.
    o Think! ARCHITECT your durability where you need it!
Durability                 majority        SAFE /
                                                                             SLOWER
           Replica set
 Cluster
           Single




                         Unsafe    Safe                       n - writes
FAST/                                             Journal
                                                            (with journal)
UNSAFE
                                     Safe modes
Replica set uses
• Redundancy
   o Data is at multiple nodes
   o n-seconds behind mode, is an „ass‟ saver (it‟s very easy to accidentally drop a
     collection!)
• Failover
   o Sleep at night
• Maintenance
   o Backup slaves
   o Build indexes on slaves and promote them
• Load balancing
   o Reads on slaves



   @collection.insert(doc, :safe => { :w => “majority” } )
   Journal + replicate (journal only applies to primary) but guarantees the rollback
   will be available if failed before replication.
Maintenance
• Backup/Maintenance
  o Backup by stopping slave, copy files, start slave
      • /data/*
      • Can be copied and backed up and compressed
      • Compression is high! (Can be 70%!) because fields names are not
        compressed
  o Mongo export and import BSON can be run while database is running
  o Server density
      • Nodes health
      • Slave lag - time behind
      • Index size
      • Etc.
Transactions
• findAndUpdate().
   o Atomic update and return it in same document

• Upserts and indexes .
• Planning for failure not assuming transactions.
Driver and language
• Driver and Language
  o Use a dynamic language! Ruby, Python, etc.
  o Driver support for replica set, and connection pool preferred.
  o A Simple ORM/Mapper, etc. works great.
      • Mongoid
      • MongoMapper
      • Or even just plain driver (Mongo Ruby driver)
  o Learn Javascript!
      • Shell Javascript commands and Ruby driver methods are very similar
            o findOne vs find_one
      • Map/Reduce –is always Javascript
      • Everything is a Map/Reduce – get used to it.
      • (It‟s not difficult for these purposes!)
Demos
• https://github.com/tomjoro/mongo_browser
  o JQuery tree view
  o Sinatra
  o Mongo

• Cool
  o Integrating R with MongoDB
  o Highcharts

• Contact information:
  o http://www.linkedin.com/in/tomjor
  o thomas.orourke@solvitron.com

Contenu connexe

Tendances

Zabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet MensZabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet MensNETWAYS
 
ソーシャルゲームログ解析基盤のHadoop活用事例
ソーシャルゲームログ解析基盤のHadoop活用事例ソーシャルゲームログ解析基盤のHadoop活用事例
ソーシャルゲームログ解析基盤のHadoop活用事例知教 本間
 
Expert JavaScript Programming
Expert JavaScript ProgrammingExpert JavaScript Programming
Expert JavaScript ProgrammingYoshiki Shibukawa
 
Odoo Performance Limits
Odoo Performance LimitsOdoo Performance Limits
Odoo Performance LimitsOdoo
 
Pharo Update
Pharo Update Pharo Update
Pharo Update ESUG
 
OS-autoinst: Testing with Perl and openCV
OS-autoinst: Testing with Perl and openCVOS-autoinst: Testing with Perl and openCV
OS-autoinst: Testing with Perl and openCVAlex-P. Natsios
 
JDD 2017: Brace yourself! Storm is coming! (Łukasz Gebel, Michał Koziorowski)
JDD 2017: Brace yourself! Storm is coming! (Łukasz Gebel, Michał Koziorowski)JDD 2017: Brace yourself! Storm is coming! (Łukasz Gebel, Michał Koziorowski)
JDD 2017: Brace yourself! Storm is coming! (Łukasz Gebel, Michał Koziorowski)PROIDEA
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introductionOwen Wu
 
Don’t block the event loop!
Don’t block the event loop!Don’t block the event loop!
Don’t block the event loop!hujinpu
 
Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014DataStax Academy
 
Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존동수 장
 
Developing High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoDeveloping High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoChris Stivers
 
Django district pip, virtualenv, virtualenv wrapper & more
Django district  pip, virtualenv, virtualenv wrapper & moreDjango district  pip, virtualenv, virtualenv wrapper & more
Django district pip, virtualenv, virtualenv wrapper & moreJacqueline Kazil
 
(JVM) Garbage Collection - Brown Bag Session
(JVM) Garbage Collection - Brown Bag Session(JVM) Garbage Collection - Brown Bag Session
(JVM) Garbage Collection - Brown Bag SessionJens Hadlich
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Toolsm_richardson
 

Tendances (16)

Zabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet MensZabbix LLD from a C Module by Jan-Piet Mens
Zabbix LLD from a C Module by Jan-Piet Mens
 
ソーシャルゲームログ解析基盤のHadoop活用事例
ソーシャルゲームログ解析基盤のHadoop活用事例ソーシャルゲームログ解析基盤のHadoop活用事例
ソーシャルゲームログ解析基盤のHadoop活用事例
 
Expert JavaScript Programming
Expert JavaScript ProgrammingExpert JavaScript Programming
Expert JavaScript Programming
 
Odoo Performance Limits
Odoo Performance LimitsOdoo Performance Limits
Odoo Performance Limits
 
Pharo Update
Pharo Update Pharo Update
Pharo Update
 
OS-autoinst: Testing with Perl and openCV
OS-autoinst: Testing with Perl and openCVOS-autoinst: Testing with Perl and openCV
OS-autoinst: Testing with Perl and openCV
 
mtl_rubykaigi
mtl_rubykaigimtl_rubykaigi
mtl_rubykaigi
 
JDD 2017: Brace yourself! Storm is coming! (Łukasz Gebel, Michał Koziorowski)
JDD 2017: Brace yourself! Storm is coming! (Łukasz Gebel, Michał Koziorowski)JDD 2017: Brace yourself! Storm is coming! (Łukasz Gebel, Michał Koziorowski)
JDD 2017: Brace yourself! Storm is coming! (Łukasz Gebel, Michał Koziorowski)
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introduction
 
Don’t block the event loop!
Don’t block the event loop!Don’t block the event loop!
Don’t block the event loop!
 
Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014
 
Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존
 
Developing High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoDeveloping High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & Go
 
Django district pip, virtualenv, virtualenv wrapper & more
Django district  pip, virtualenv, virtualenv wrapper & moreDjango district  pip, virtualenv, virtualenv wrapper & more
Django district pip, virtualenv, virtualenv wrapper & more
 
(JVM) Garbage Collection - Brown Bag Session
(JVM) Garbage Collection - Brown Bag Session(JVM) Garbage Collection - Brown Bag Session
(JVM) Garbage Collection - Brown Bag Session
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 

En vedette

Mongo DB in Health Care Part 1
Mongo DB in Health Care Part 1Mongo DB in Health Care Part 1
Mongo DB in Health Care Part 1VulcanMinds
 
Webinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBWebinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBMongoDB
 
MongoDB at Medtronic
MongoDB at MedtronicMongoDB at Medtronic
MongoDB at MedtronicMongoDB
 
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBMongoDB
 
MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB
 
Taking content strategy to people who already think they have one
Taking content strategy to people who already think they have oneTaking content strategy to people who already think they have one
Taking content strategy to people who already think they have oneMartin Belam
 
Accelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBAccelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBMongoDB
 
01 dbms-introduction
01 dbms-introduction01 dbms-introduction
01 dbms-introductionToktok Tukta
 
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...MongoDB
 
Breaking the oracle tie
Breaking the oracle tieBreaking the oracle tie
Breaking the oracle tieagiamas
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)MongoSF
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
Hospital Records Management System
Hospital Records Management SystemHospital Records Management System
Hospital Records Management SystemAcheng Doris
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
 

En vedette (14)

Mongo DB in Health Care Part 1
Mongo DB in Health Care Part 1Mongo DB in Health Care Part 1
Mongo DB in Health Care Part 1
 
Webinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDBWebinar: How Leading Healthcare Companies use MongoDB
Webinar: How Leading Healthcare Companies use MongoDB
 
MongoDB at Medtronic
MongoDB at MedtronicMongoDB at Medtronic
MongoDB at Medtronic
 
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDB
 
MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in Healthcare
 
Taking content strategy to people who already think they have one
Taking content strategy to people who already think they have oneTaking content strategy to people who already think they have one
Taking content strategy to people who already think they have one
 
Accelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBAccelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDB
 
01 dbms-introduction
01 dbms-introduction01 dbms-introduction
01 dbms-introduction
 
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...
MongoDB San Francisco 2013:Geo Searches for Healthcare Pricing Data presented...
 
Breaking the oracle tie
Breaking the oracle tieBreaking the oracle tie
Breaking the oracle tie
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
Hospital Records Management System
Hospital Records Management SystemHospital Records Management System
Hospital Records Management System
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 

Similaire à MongoDB Use Cases for Healthcare Analytics and CMS

Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data RacesAzul Systems Inc.
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure wayBahadir Cambel
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLArnab Biswas
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
So you want to liberate your data?
So you want to liberate your data?So you want to liberate your data?
So you want to liberate your data?Mogens Heller Grabe
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)MongoDB
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
1.6 米嘉 gobuildweb
1.6 米嘉 gobuildweb1.6 米嘉 gobuildweb
1.6 米嘉 gobuildwebLeo Zhou
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futureTakayuki Muranushi
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSkills Matter
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
Ruby performance - The low hanging fruit
Ruby performance - The low hanging fruitRuby performance - The low hanging fruit
Ruby performance - The low hanging fruitBruce Werdschinski
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment StrategiesMongoDB
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 

Similaire à MongoDB Use Cases for Healthcare Analytics and CMS (20)

Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data Races
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
So you want to liberate your data?
So you want to liberate your data?So you want to liberate your data?
So you want to liberate your data?
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
1.6 米嘉 gobuildweb
1.6 米嘉 gobuildweb1.6 米嘉 gobuildweb
1.6 米嘉 gobuildweb
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_future
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Server Tips
Server TipsServer Tips
Server Tips
 
Ruby performance - The low hanging fruit
Ruby performance - The low hanging fruitRuby performance - The low hanging fruit
Ruby performance - The low hanging fruit
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
Zero mq logs
Zero mq logsZero mq logs
Zero mq logs
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 

Plus de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Plus de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

MongoDB Use Cases for Healthcare Analytics and CMS

  • 1. MongoDB Use Cases Healthcare, CMS, Analytics Thomas O‟Rourke Upstream Innovations Ltd. Oulu / Seattle
  • 2.
  • 4. Dashwire Dashconfig • Users configure their mobile phones on PC. o Email accounts, wallpapers, ringtones, bookmarks, contacts, etc. o Generates a lot of data! • Wanted: Google Analytics + Splunk + BI. o Sensitive data: • Can‟t send out => No Google Analytics. o Many sources • (Server log files, SQS, Web analytics, etc.) o internal error report & • UI issues (powerful paradigm) o Real time vs. Reports/Enterprise • ~500,000 events a day o Store for year
  • 5. Solution • Eco-system in Mongo o Evolved • Layered architecture o L1. Store - “De-duplication. • Streaming live (syslog) • Playback of log files o L2. Parsing into key/value pairs. o L3. Processing. o L4. Reports. • Trade-offs for real-time o Reconciler o Trade offs for real time and offline
  • 6. Tools • MongoDB  • Ruby • Sinatra • Ruby driver o (Connection pooling, multithreaded, replica set support) • Event machine + em-mongo • ZeroMQ • Sinatra/Rack/Thin • Mixpanel • Server density • Excel • Highcharts • softlayer
  • 7. Eco system Syslog Playback Integrity Store strings with Checks timestamps No Duplicates Once day Process to key/value pairs Sanitize/ intermediate Real time External charts interface App specific reports Excel, etc. Daily/weekly
  • 8. Parsing logs "2012-08-17 13:08:11 app02 Passngr[20167]: I script(www-data) -- {”analytics":{"scenario":"three","initial scenario":"three","phone":”Cool Phone","name":"Facebook","time":"2012-08-17 18:08:11.399 UTC","event":"Bookmark Added","browser_tracking_id":"857b307a4d1xxxxx08ebca70f6","browser_time":"2012-08-17 18:08:14.794 UTC","browser_event":1,"session_id":"68528379d5xxxxxxxcda27fd625fe"}}" JSON.parse( ) Collection = Event_Bookmark_Added { scenario: “three”, phone : “Cool phone”, event : “Bookmark Added”, session_id : . .. }
  • 9. De-duplication • Multikey index o Integers perform well • MD5 of entire log line as string (only use half of result) • Unix time stamp (seconds) • Fraction of second (if one is present) • Better to use millisecond but not required @collections[collection].create_index( [ [:ts, Mongo::ASCENDING], [:ts_frac, Mongo::ASCENDING], [:dhash, Mongo::ASCENDING ] ], { :unique => true, :drop_dups => true} )
  • 10. Process pattern Pre allocate “processed : 0” At insert time (creation) @collections[collection].insert( doc ) Index (no dup) process
  • 11.
  • 12. Reports • Needed both Real time and Enterprise (Excel Reports) o We use MongoDB for both and all intermediate tables • Reports o Map/Reduce for Reports and Graphs o Considered MySQL but rejected as unnecessary o Write Excel (*.xlsx) directly using Ruby and accessing MongoBD. • https://github.com/randym/axlsx • Real-time o Incremental Map/Reduce gives performance to do real time graphs. • http://www.highcharts.com
  • 14. PART 2 Technical Discussion • Performance • Durability • Replica sets • Maintenance • Transactions • Drivers and Languages • Demos
  • 15. Performance • ~3000 inserts a second for unsafe mode. • < 1000 for safe mode. • Indexes = memory. • Use slaves when possible for reads (note: consistency) • Your driver makes a HUGE difference. • Pre-allocate for updates! • Safe mode is much slower o Not everything is required to be 100% safe o Not everything is unsafe. o Think! ARCHITECT your durability where you need it!
  • 16. Durability majority SAFE / SLOWER Replica set Cluster Single Unsafe Safe n - writes FAST/ Journal (with journal) UNSAFE Safe modes
  • 17. Replica set uses • Redundancy o Data is at multiple nodes o n-seconds behind mode, is an „ass‟ saver (it‟s very easy to accidentally drop a collection!) • Failover o Sleep at night • Maintenance o Backup slaves o Build indexes on slaves and promote them • Load balancing o Reads on slaves @collection.insert(doc, :safe => { :w => “majority” } ) Journal + replicate (journal only applies to primary) but guarantees the rollback will be available if failed before replication.
  • 18. Maintenance • Backup/Maintenance o Backup by stopping slave, copy files, start slave • /data/* • Can be copied and backed up and compressed • Compression is high! (Can be 70%!) because fields names are not compressed o Mongo export and import BSON can be run while database is running o Server density • Nodes health • Slave lag - time behind • Index size • Etc.
  • 19. Transactions • findAndUpdate(). o Atomic update and return it in same document • Upserts and indexes . • Planning for failure not assuming transactions.
  • 20. Driver and language • Driver and Language o Use a dynamic language! Ruby, Python, etc. o Driver support for replica set, and connection pool preferred. o A Simple ORM/Mapper, etc. works great. • Mongoid • MongoMapper • Or even just plain driver (Mongo Ruby driver) o Learn Javascript! • Shell Javascript commands and Ruby driver methods are very similar o findOne vs find_one • Map/Reduce –is always Javascript • Everything is a Map/Reduce – get used to it. • (It‟s not difficult for these purposes!)
  • 21.
  • 22. Demos • https://github.com/tomjoro/mongo_browser o JQuery tree view o Sinatra o Mongo • Cool o Integrating R with MongoDB o Highcharts • Contact information: o http://www.linkedin.com/in/tomjor o thomas.orourke@solvitron.com

Notes de l'éditeur

  1. Thomas O’RourkeOuluSeattle – Seattle’s a great place. Amazon, Microsoft, Facebook, Google. Big Data is here. Gave me confidence to try MongoDB to hear some of the worlds architects tell you “It’s all a big hash table” or You can’t do global relations anyways – de-normalize.Cassandra, Hadoop, Riak, Redis, CouchDB All are good.MongoDB is EASIEST to work with and get started. And BROADEST use cases because of document architecture and indexing.Fun to hear horror stories – I’m afraid I don’t have any  Or maybe a few. Stand on shoulders.Visa cards. Just reconcile at the end of the day.
  2. One year to know “Has this ID ever been seen before for the entire year”.The data structure needs to be flexible.
  3. Time stamps might be 1 second. Or MS where there are 2.Do it!MongoDB was easy to get started Deduplication. An index built on a partial md5 string hash and a timestamp. (2 numbers into a compound index).L1. “De-duplication” – Log lines must be unique. Indexes that Hold Only Recent Values in RAMPlayback of log files in case of problemsL2. Parsing into key/value pairs JSON.parse()L3. Processing L4. Reports can work from slaves.
  4. No duplicates: Integer index are fastPreallocate scheme100% Mongo. Collections make more collections which refine collections… etc. (See example)Use the dynamic nature of creating collections ! It’s not a relational DB BE DYNAMIC FOR GOODNESS SAKE!Like “Event_&lt;NAME&gt;” Create a collection with the event name. Might need to do some cleanup. So what.Playback is SUPER IMPORTANT. Verify everything is therre. AUTO INTEGRITY CHECKS.
  5. We actually write JSON to our log files for events we want to capture.These can be parsed with one line of codeDynamic creation of collectionsThen it can directly into MongoDB
  6. Say you have a collection (red) that you want to “process”Preallocate a processed (may be many).In processed collection store the the source_id and create an index with no duplicates.This way you can have many target collections, but you will never process twice. - ONLY UPDATE the procssed flag after 100% sure we have inserted the processed. BUT you might want to update severalThe processed flag does not have to be safe updated.GUARANTEEDAlso because of playback high, low water didn’t work.
  7. Almost everything was map/reduceMySQL was considered for reports – but 100% mongo was easier!
  8. You can read from a replica (use tags), but they might be 50ms behind. Only primary is writing and guaranteed to be consistent.(comparable to MySQL).. People who run benchmarks need to consider this.Connection pool! Yes.Multithread or Non-blocking I/O! (Eventmachine/tornado). Yes!
  9. Journaling is on by default.The oplog and journal write are done in an Atomic transaction (how is that possible).After n-operations or single operationIf you are using MULTITHREADED driver Your Write and Read might not be consistent.getLastError() -&gt; per thread! So driver…N-writes = majorityFor us, we don’t have n-writes because we have the integrity checks.Journal means written to disk. And you can combine with write concern.
  10. A write is not committed until it hits a majority of nodes. (even with journaling)..(JOURNAL is default)All writes that were never replicated will trigger a rollback. The changes are stored in a “rollback” sub-directoryUSE write concern to wait until it is replicated to majority.  After every write, or after a series of writes.
  11. Nothing greater than to wake up and see it failed over without intervention.
  12. A . Is an event in 200 secondsAn 0 is no event in 200 seconds.Entire month of data.5 million events.30 seconds to map-reduce this.