MongoDB Use Cases: Healthcare, CMS, Analytics

MongoDB
Use Cases
Healthcare, CMS, Analytics

Thomas O‟Rourke
Upstream Innovations Ltd.
Oulu / Seattle

Dashwire Dashconfig
• Users configure their mobile phones on PC.
o Email accounts, wallpapers, ringtones, bookmarks, contacts, etc.
o Generates a lot of data!

• Wanted: Google Analytics + Splunk + BI.
o Sensitive data:
• Can‟t send out => No Google Analytics.
o Many sources
• (Server log files, SQS, Web analytics, etc.)
o internal error report &
• UI issues (powerful paradigm)
o Real time vs. Reports/Enterprise
• ~500,000 events a day
o Store for year

Solution
• Eco-system in Mongo
o Evolved

• Layered architecture
o L1. Store - “De-duplication.
• Streaming live (syslog)
• Playback of log files
o L2. Parsing into key/value pairs.
o L3. Processing.
o L4. Reports.

• Trade-offs for real-time
o Reconciler
o Trade offs for real time and offline

Tools
• MongoDB 
• Ruby
• Sinatra
• Ruby driver
o (Connection pooling, multithreaded, replica set support)
• Event machine + em-mongo
• ZeroMQ
• Sinatra/Rack/Thin
• Mixpanel
• Server density
• Excel
• Highcharts
• softlayer

Eco system
Syslog
Playback

Integrity Store strings with
Checks timestamps
No Duplicates
Once day

Process to key/value pairs

Sanitize/
intermediate
Real time
External
charts
interface

App speciﬁc
reports

Excel, etc. Daily/weekly

Parsing logs
"2012-08-17 13:08:11 app02 Passngr[20167]: I script(www-data) --
{”analytics":{"scenario":"three","initial scenario":"three","phone":”Cool
Phone","name":"Facebook","time":"2012-08-17 18:08:11.399 UTC","event":"Bookmark
Added","browser_tracking_id":"857b307a4d1xxxxx08ebca70f6","browser_time":"2012-08-17
18:08:14.794 UTC","browser_event":1,"session_id":"68528379d5xxxxxxxcda27fd625fe"}}"

JSON.parse( )

Collection =
Event_Bookmark_Added
{
scenario: “three”,
phone : “Cool phone”,
event : “Bookmark Added”,
session_id :
.
.. }

De-duplication
• Multikey index
o Integers perform well
• MD5 of entire log line as string (only use half of result)
• Unix time stamp (seconds)
• Fraction of second (if one is present)
• Better to use millisecond but not required

@collections[collection].create_index(
[ [:ts, Mongo::ASCENDING],
[:ts_frac, Mongo::ASCENDING],
[:dhash, Mongo::ASCENDING ] ],
{ :unique => true, :drop_dups => true} )

Process pattern

Pre allocate “processed : 0”
At insert time (creation)
@collections[collection].insert( doc )

Index (no dup)

process

Reports
• Needed both Real time and Enterprise (Excel Reports)
o We use MongoDB for both and all intermediate tables
• Reports
o Map/Reduce for Reports and Graphs
o Considered MySQL but rejected as unnecessary
o Write Excel (*.xlsx) directly using Ruby and accessing MongoBD.
• https://github.com/randym/axlsx
• Real-time
o Incremental Map/Reduce gives performance to do real time graphs.
• http://www.highcharts.com

PART 2
Technical Discussion
• Performance
• Durability
• Replica sets
• Maintenance
• Transactions
• Drivers and Languages
• Demos

Performance
• ~3000 inserts a second for unsafe mode.
• < 1000 for safe mode.
• Indexes = memory.
• Use slaves when possible for reads (note:
consistency)
• Your driver makes a HUGE difference.
• Pre-allocate for updates!
• Safe mode is much slower
o Not everything is required to be 100% safe
o Not everything is unsafe.
o Think! ARCHITECT your durability where you need it!

Durability majority SAFE /
SLOWER
Replica set
Cluster
Single

Unsafe Safe n - writes
FAST/ Journal
(with journal)
UNSAFE
Safe modes

Replica set uses
• Redundancy
o Data is at multiple nodes
o n-seconds behind mode, is an „ass‟ saver (it‟s very easy to accidentally drop a
collection!)
• Failover
o Sleep at night
• Maintenance
o Backup slaves
o Build indexes on slaves and promote them
• Load balancing
o Reads on slaves

@collection.insert(doc, :safe => { :w => “majority” } )
Journal + replicate (journal only applies to primary) but guarantees the rollback
will be available if failed before replication.

Maintenance
• Backup/Maintenance
o Backup by stopping slave, copy files, start slave
• /data/*
• Can be copied and backed up and compressed
• Compression is high! (Can be 70%!) because fields names are not
compressed
o Mongo export and import BSON can be run while database is running
o Server density
• Nodes health
• Slave lag - time behind
• Index size
• Etc.

Transactions
• findAndUpdate().
o Atomic update and return it in same document

• Upserts and indexes .
• Planning for failure not assuming transactions.

Driver and language
• Driver and Language
o Use a dynamic language! Ruby, Python, etc.
o Driver support for replica set, and connection pool preferred.
o A Simple ORM/Mapper, etc. works great.
• Mongoid
• MongoMapper
• Or even just plain driver (Mongo Ruby driver)
o Learn Javascript!
• Shell Javascript commands and Ruby driver methods are very similar
o findOne vs find_one
• Map/Reduce –is always Javascript
• Everything is a Map/Reduce – get used to it.
• (It‟s not difficult for these purposes!)

Demos
• https://github.com/tomjoro/mongo_browser
o JQuery tree view
o Sinatra
o Mongo

• Cool
o Integrating R with MongoDB
o Highcharts

• Contact information:
o http://www.linkedin.com/in/tomjor
o thomas.orourke@solvitron.com

MongoDB Use Cases: Healthcare, CMS, Analytics

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (16)

En vedette

En vedette (14)

Similaire à MongoDB Use Cases: Healthcare, CMS, Analytics

Similaire à MongoDB Use Cases: Healthcare, CMS, Analytics (20)

Plus de MongoDB

Plus de MongoDB (20)

MongoDB Use Cases: Healthcare, CMS, Analytics

Notes de l'éditeur