Reproducible machine learning Steph Locke Reproducible if Data=Same + Analysis=Same Replicable if Data=Different + Analysis=Same Robust if Data=Same + Analysis=Different Generalisable if Data=Different + Analysis=Different It’s reproducible if… With the same ✨ environment With the raw ✨ data With the unmodified ✨ code = Produces exactly the same results Benefits for You/Team - Fewer headaches around environments and data - Less rework of code - Clear standards - Easier operationalisation Stakeholders - Stable results - Auditable - Maintainable / correctable - Easier operationalisation Recommendations FAIR data Findable - Unique name / ID - Documented Accessible - Common access methods - Open protocol - Metadata stored even after data may be removed Interoperable - Common standards - Terms defined Reusable - License and use rights specified - Provenance documented What to use Use - Logging - Version control - Fixed seeds* - Dependency tracking Framework examples - MLFlow - {drake} - Azure ML - FairML