Model Registry with Open-Source tools: Git, GitHub and CI/CD
The presentation will help building ML model registry based on Git and the best engineering practices.
Model Registry with Open-Source tools: Git, GitHub and CI/CD
1. Model Registry with
Open-Source tools: Git,
GItHub and CI/CD
Dmitry Petrov, Ph.D.
dmitry@iterative.ai
Co-Founder & CEO at Iterative.ai
iterative.ai
2. About me
Before
Ex-Data Scientist at Microsoft
Now
Creator of Data Version Control
Co-founder, CEO at Iterative.ai
(San Francisco, CA)
Dmitry Petrov
@FullStackML
3. What will you learn?
Git as the single source of truth for ML models
4. What will you learn?
Git as the single source of truth for ML models
6. Centralized model store to collaboratively
manage the full lifecycle of ML models.
⬥ Model lineage & versioning
⬥ Stage transitions: from staging to production
⬥ Model Annotations & Discovery: timestamp,
description, labels
What is Model Registry (1)
7. What is Model Registry (2)
Model Registry
ML Train ML Production
Git, GitHub/Lab
Save
API
Pull
API
com
m
it
8. What is Model Registry (3)
Name Version Status Code Meta-info: labels
churn 3.1 prod 43ba862 customer, xgboost
segment 0.4 - d19ce1f cv, car
cv-class 1.13 staging afd8e35 cv, car, doors, glass
… … … … …
10. Challenge 1: model & code lineage (1)
Model Registry
ML Train ML Production
Git, GitHub/Lab
Save
API
Pull
API
com
m
it
d19ce1f
d19ce1f
11. Name Version Status Code Meta-info: labels
churn 3.1 prod 43ba862 customer, xgboost
segment 0.4 - d19ce1f cv, car
cv-class 1.13 staging afd8e35 cv, car, doors, glass
… … … … …
Challenge 1: model & code lineage (2)
12. Challenge 2: app & model deployment diff (1)
Model Registry
ML Train
ML Production
Git, GitHub/Lab
Save
API
Pull
API
Com
m
it
App Production
CI/CD
Deploy
Push
13. Challenge 2: app & model deployment diff (2)
Model Registry
ML Train
ML Production
Git, GitHub/Lab
Save
API
Pull
API
Com
m
it
App Production
CI/CD
Deploy
Push
ML
App
14. Challenge 3: additional service
Model Registry
ML Train
ML Production
Git, GitHub/Lab
Save
API
Pull
API
Com
m
it
App Production
CI/CD
Deploy
Push
?
15. Git as the source of truth
ML Train
App & ML
Production
Git, GitHub/Lab
Commit
CI/CD
Deploy
Push
16. Git as the source of truth: Git only API
ML Train
App & ML
Production
Git, GitHub/Lab
Commit
CI/CD
Deploy
Push
18. How to save the table in Git?
Name Version Status Code Meta-info: labels
churn 3.1 prod 43ba862 customer, xgboost
segment 0.4 - d19ce1f cv, car
cv-class 1.13 staging afd8e35 cv, car, doors, glass
… … … … …
19. Basic:
⬥ Human-friendly versioning: d19ce → v1.0.1
⬥ Status: prod, staging, dev
Integration to production:
⬥ Push to production
Model registry functionality (1)
20. Basic:
⬥ Human-friendly versioning: d19ce → v1.0.1
⬥ Status: prod, staging, dev
Integration to production:
⬥ Push to production
Advanced:
⬥ Large ML models
⬥ Mono-repo or multiple models
⬥ Annotations
Model registry functionality (2)
21. Basic:
⬥ Human-friendly versioning
⬥ Status
Integration to production:
⬥ Push to production
Advanced:
⬥ Large ML models
⬥ Mono-repo or multiple models
⬥ Annotations
Model registry functionality (3)
Git tags
CI/CD
Meta-file
artifacts.yaml
22. Infrastructure as Code (IaC)
We are creating Model Registry as Code (MRaC)
Advanced:
⬥ Large ML models
⬥ Mono-repo or multiple models
⬥ Annotations
Solution: Git as the source of truth
28. GTO - Git Tag Ops
https://github.com/iterative/gto
Turn your Git Repo into Artifact Registry:
● Register new versions of artifacts marking significant
changes to them
● Promote versions to signal downstream systems to act
● Attach additional info about your artifact with
Enrichments
● Act on new versions and promotions in CI
43. Git: summary
Versioning Git tag `model-v1.0.0`
Status Git tag `model-prod`
Pull & Push to production CI/CD
Large ML models URL in `artifacts.yaml`
Multiple models Sections in `artifacts.yaml`
Annotations Labels in `artifacts.yaml`
46. Model visibility - User Interface
Git good for “inside repository” model registry,
not cross-repository.
Solution: Build an UI for collect models from
all repositories
49. ⬥ Git as the single source of truth for ML models
⬥ No need in external services: only Git, GitHub/Lab
⬥ Limitation - only in repository level
Summary: Model Registry with Git