"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
1. MongoDB at
Medtronic
Jeff Lemmerman
Matt Chimento
1
Medtronic Energy and Component Center
*Disclaimer: We are not currently managing any Device Patient Data in MongoDB
21. Our experience with MongoDB
• Consulting/Training has been excellent
• Support agreement has been under-utilized
– Emails for security updates etc. are prompt
– Release cycle is frequent
• MongoDB Monitoring Service
– Potential concerns storing db stats externally
– MMS can now be hosted locally
• MongoDB Certification now available
– Udacity course, “Data Wrangling with MongoDB”
22. Gaps
• Enterprise acceptance of “new” approach
• Integration with off-the-shelf reporting and
analytics tools
• User interface for managing MongoDB cluster
• Developer familiarity with JSON and MongoDB
• 21 CFR Part 11 Compliance – Audit Tracing
25. Key Features
• Data stored as documents (JSON-like BSON)
– Flexible-schema
– In schema design, think about optimizing for read vs. storage
• Full CRUD support (Create, Read, Update, Delete)
– Atomic in-place updates
– Ad-hoc queries: Equality, RegEx, Ranges, Geospatial
• Secondary indexes
• Replication – redundancy, failover
• Sharding – partitioning for read/write scalability
• Terminology
– Collection = Table
– Index = Index
– Document = Row
– Column = Field
– Joining = Embedding & Linking
26. Creating Components
• .insert() will always try to create new document
• .save() if _id already exists will update
• If document doesn’t have _id field it is added
26
Our roles and how long we’ve been at the company, how long we’ve faced the problem
How the project came to be
The project team
Last year, more than 9 million people around the world benefitted from Medtronic products or therapies.
Its our job to demonstrate the quality and reliability of the components we manufacture
Approx 2.75M Batteries, 19M Feedthroughs, 2M Capacitors, 2.7M Headers,
1.1M SPC Samples/Day
30M Lifetest Samples/Day FY15+
Current State --
Systems generating discrete data sources
“raw” data in one source, Metadata in a difference source, Event and process data in yet another source
Each system has a different format
Different types of components with different models
1.1M SPC Samples/Day
30M Lifetest Samples/Day FY15+ and We NEVER delete the data
Enterprise acceptance of new Technology/Solution on top of existing is even more challenging
80% Wrangling, 20% Analyzing
Transformation of Data to Knowledge
Leveraging enterprise tools and enabling new tools
SPACE
Spotfire
Excel
JMP
Business Objects
Each row is a component and the columns are the things we know about each component
Merge history of component at report time
Report could be user driven or off-line reporting, both still slow
Ad-hoc reporting is complex queries and still slow
Hours to get part of the component history
Days to get a more complete history
Pre-Load Data From Discrete Sources Into Central Repo
Repo could be enterprise data warehouse (EDW), RDBMS, or MongoDB
Difficult to store uncontrolled data formats
Scaling via big iron or custom data marts/partitioning schemes
Schema must be known at design time
Impedance mismatch with agile development and deployment techniques
Doesn’t map well to native language constructs
Data is optimized for storage
Data stored is very compact
Rigid schemas have led to powerful query capabilities (very complex queries, consequences of left/right/inner joins)
Generic data types make queries less effective
Robust ecosystem of tools, libraries, integrations
40+ years old!
Ideal Future State
Ideal Future State
5+ tables in a single Mongo document
20 Production Steps
30 Subcomponents
150 Facts
Data storage is optimized for read/write and not space on disk
Udacity course is part of Data Science track
Download and use third party tools (MongoVUE, JSON Studio, etc.)
Still learning about more advanced analytics possibilities
.explain() will give stats on command
Do not convert query results to List() here or from calling method, just iterate through enumerable