Starting with outlining the history of conventional version control before diving into explaining QoDs (Quantitative Oriented Developers) and the unique problems their ML systems pose from an operations perspective (MLOps). With the only status quo solutions being proprietary in-house pipelines (exclusive to Uber, Google, Facebook) and manual tracking/fragile "glue" code for everyone else.
Datmo works to solve this issue by empowering QoDs in two ways: making MLOps manageable and simple (rather than completely abstracted away) as well as reducing the amount of glue code so to ensure more robust end-to-end pipelines.
This goes through a simple example of using Datmo with an Iris classification dataset. Later workshops will expand to show how Datmo can work with other data pipelining tools.
5. @anandsampat
What is Version Control?
The management of changes to
documents, computer programs, large
web sites, and other collections of
information.
*AKA `Source Control`
“
8. @anandsampat
You’ve probably heard of Git.
Git is a version control system for tracking
changes in computer files and
coordinating work on those files among
multiple people. It is primarily used
for source code management in software
development, but it can be used to keep
track of changes in any set of files.
12. @anandsampat
For developers: For enterprises:
• Self-managed SCM servers
became a thing of the past
• Developers could leverage
industry best practices for their
own personal work
• Community of knowledge
built around a known standard
• Collaboration on Open Source
Software
• Advent of continuous
integration / deployment
• Removed need for external
code issue tracking tool
• Consolidation of code storage
and versioning tool
• Pull Requests, code review,
documentation through
ReadMe
17. @anandsampat
It’s time to talk about MLOps
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-
systems.pdf
18. @anandsampat
MLOps: The Elephant in the Room
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-
systems.pdf
19. @anandsampat
ML systems have a special capacity for incurring
technical debt, because they have all of the
maintenance problems of traditional code plus an
additional set of ML-specific issues. This debt may be
difficult to detect because it exists at the system level.
“
— Google (Sculley et. al, 2015)
20. @anandsampat
Typical methods for paying down code level
technical debt are not sufficient to address
ML-specific technical debt at the system level.
“
— Google (Sculley et. al, 2015)
27. @anandsampat
What is Datmo?
Datmo is a workflow tool for ML, AI,
and Data Science developers. It helps
with managing model version control,
easy environment handling, and
reproducing results through the
power of snapshots.
29. @anandsampat
Why are they important?
Environment
Configuration
Metrics
Datmo Snapshots
Git Commits
Code
Files*
30. @anandsampat
How will it help?
Datmo leverages containers to quickly
spin up perfectly reproducible
developer environments. It tracks this
environment, along with model
metadata inside of snapshots.
31. @anandsampat
From a broad perspective:
Make ML Ops and workflows
manageable and simple, not
completely abstracted away.
Reduce the amount of glue code
so that people can have more
robust pipelines.
32. @anandsampat
From a broad perspective:
Make ML Ops and workflows
manageable and simple, not
completely abstracted away.
Reduce the amount of glue code
so that people can have more
robust pipelines.
33. @anandsampat
GitHub = SCM + Hosting + More
Datmo = Model Versioning +
Environments + Deployment + More
35. @anandsampat
Datmo in today’s example
We’re going to use Datmo to show how we can
quickly iterate on our model and streamline our
workflow.
We’ll go through using snapshots for A/B testing,
saving our tasks, and enabling you all to reproduce
my results/make your own changes to the model.
43. @anandsampat
Fork the model
Fork from Web Platform GUI (top right corner):
https://datmo.com/anands/workshop-iris-classification
44. @anandsampat
Fetch your model from Datmo
$ datmo clone [YOUR-USERNAME]/workshop-iris-classification
Clone the Datmo Model:
$ cd workshop-iris-classification
Jump into this directory:
53. @anandsampat
What just happened?
• Datmo cloned the model from the platform,
bringing all of the necessary resources to local.
• Datmo set your current code to the state of the
desired snapshot.
• Datmo built the environment inside of a container.
• Datmo executed the task inside of the container,
and logged the results.
• Datmo combines the task output files,
environment, code, configs, and metrics into a
snapshot
datmo clone
datmo snapshot
checkout
Command Result
datmo task run
datmo snapshot
task
54. @anandsampat
1. Traditional Source Control isn’t enough for QoD
(Data Science, ML, and AI)
Key Takeaways
2. Think about ML Ops before you’re “in too deep”
3. In the same way GitHub revolutionized Software
Engineering, Datmo does the same for QoD’s
58. @anandsampat
2. Learn more about ML and browse more content
at our blog: https://blog.datmo.com
Next Steps
3. Interested in updates? You’ll be signed up for our
weekly newsletter if you signed up today.
4. Stay tuned for our open source library this
month. It’ll be at https://github.com/datmo/datmo
1. Check out example workflows in our docs to
create your own Datmo project here