For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/08/data-versioning-towards-reproducibility-in-machine-learning-a-presentation-from-tryolabs/ Nicolás Eiris, Machine Learning Engineer at Tryolabs, presents the “Data Versioning: Towards Reproducibility in Machine Learning” tutorial at the May 2022 Embedded Vision Summit. Surprisingly in 2022, reproducibility is still a big pain point in most data science workflows. A critical element required for reproducibility is version control. Unfortunately, in machine learning there is a notorious lack of standards for version control, so developers typically resort to crafting ad-hoc workflows. And frequently, developers reinvent the wheel due to a lack of awareness of existing solutions. In this talk, Eiris introduces DVC, short for “Data Version Control,” an open-source tool that Tryolabs has found can significantly alleviate the pain of reproducibility in data science workflows. He covers the motivation for such a tool, digs into its main features and will hopefully convince you that your life will be much better if you integrate it into your next project. Everything is illustrated through a real-world example of an end-to-end ML pipeline.