Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria. However, the current style of model building is ad-hoc and there is no practical way for a data scientist to manage models that are built over time. In addition, there are no means to run complex queries on models and related data.
In this talk, we present ModelDB, a novel end-to-end system for managing machine learning (ML) models. Using client libraries, ModelDB automatically tracks and versions ML models in their native environments (e.g. spark.ml, scikit-learn). A common set of abstractions enable ModelDB to capture models and pipelines built across different languages and environments. The structured representation of models and metadata then provides a platform for users to issue complex queries across various modeling artifacts. Our rich web frontend provides a way to query ModelDB at varying levels of granularity.
ModelDB has been open-sourced at https://github.com/mitdbg/modeldb.
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk by Manasi Vartak
1. ModelDB: A system
to manage machine
learning models
Manasi Vartak
PhD Student, MIT DB Group
2. People
Manasi Vartak
PhD student, MIT
Srinidhi Viswanathan
MEng, MIT
Samuel Madden
Faculty, MIT
Matei Zaharia
Faculty, Stanford
Harihar Subramanyam
MEng, MIT
Wei-En Lee
MEng student, MIT
3. Building a default
prediction algorithm
Profession Credit History Risk of Default
Politician Reasonable 0.3
Struggling
artist
Poor 0.7
Investor
Has more
money than our
company
0.0
… … … …
Barack
Obama
Lindsay
Lohan
Warren
Buffet
7. df.withColumn(“timesDelayed”, udf1)
.withColumn(“percentPaid”, udf2)
.withColumn(“creditUsed”, udf3)
…
val lrGrid = new ParamGridBuilder()
.addGrid(lr.elasticNetParam, Array(0.01, 0.1, 0.5, 0.7))
val scaler = new StandardScaler()
.setInputCol(“features”)
…
val labelIndexer1 = new LabelIndexer()
val labelIndexer2 = new LabelIndexer()
…
Model 50
val udf1: (Int => Int) = (delayed..)
val udf2: (String, Int) = …
credit-default-clean.csv
8. No one in here tracks (all of)
their models
…and this is not unusual
I’m willing to bet…
9. Why is this a problem?
• No record of experiments
• Insights lost along the way
• Difficult to reproduce results
• Cannot search for or query models
• Difficult to collaborate
Did my colleague do that
already?
How did normalization
affect my ROC?
How does someone review
your model?
Where’s the LR
model I tried last
week with featureX?
What params did I use?
10. Model Management
track, store and index modeling artifacts
so that they may subsequently be
reproduced, shared, queried, and
analyzed
11. ModelDB: a system to
manage machine
learning models
http://modeldb.csail.mit.edu
12. ModelDB: an end-to-end
model management system
Model artifact
Storage &
Versioning
Query
Ingest models,
metadata
Collaboration,
Reproducibilitytrack
store &
index
query, reproduce++
15. ModelDB Architecture &
Design Decisions
1. Support for diverse
languages and environments
2. Minimal changes to
existing workflows
3. Rich visual interface
4. Support for complex
queries
spark.ml
scikit-learn
ModelDB
Backend
Storage
thrift
Scala
Python
…
ModelDB
Frontend:
vis + query
Native Client
Events
16. ModelDB Features
• Experiment tracking
• Versioning
• Reproducibility
• Comparisons, queries, search
• Collaboration
Log models, params, pipelines
etc. via ModelDB API
Model search, query,
comparison via frontend
Central repository of models
Review models, annotate
All pipeline details, params
logged
Every modeling run = version
17. Ongoing Work
• Unified querying of modeling artifacts
• Mining data in ModelDB
• Model monitoring and retraining