This presentation was made on June 18, 2020.
Video recording of the session can be viewed here: https://youtu.be/YEtDwYSXXJo
For many companies, model documentation is a requirement for any model to be used in the business. For other companies, model documentation is part of a data science team’s best practices. Model documentation includes how a model was created, training and test data characteristics, what alternatives were considered, how the model was evaluated, and information on model performance.
Collecting and documenting this information can take a data scientist days to complete for each model. The model document needs to be comprehensive and consistent across various projects. The process of creating this documentation is tedious for the data scientist and wasteful for the business because the data scientist could be using that time to build additional models and create more value. Inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.
In this virtual meetup, we will learn how to create comprehensive, high-quality model documentation in minutes that saves time, increases productivity, and improves model governance.
Speaker's Bio:
Nikhil Shekhar: Nikhil is a Machine Learning Engineer at H2O.ai. He is currently working on our automatic machine learning platform, Driverless AI. He graduated from the University of Buffalo majoring in Artificial Intelligence and is interested in developing scalable machine learning algorithms.
3. Confidential3
ML AutoDoc Overview
• Automatically generates editable Word doc to document model creation (algos, techniques,
data, etc)
• Save Data Science Resources
– automatically build required documentation
• Customize to your business needs
• Simple to use
9. Confidential9 Confidential9
• Steam exposes a service to generate model report (ML AutoDoc) for OSS
H2O-3 models (e.g., GBM, AutoML)
– The same service is used in Driverless AI to generate documentation for
DAI models
Steam: ML AutoDoc
AutoDoc for OSS H2O-
3
AutoML model
11. Confidential11
Scikit-learn ML AutoDocs
• ML AutoDoc has initial support of 3rd party models: Scikit Learn
• AutoDocs for Supervised Learning Models
• Scikit-Learn Linear Models:
– LogisticRegression
• Scikit Learn Ensemble Methods:
– RandomForestClassifier
– GradientBoostingClassifier
– GradientBoostingRegressor
AutoDoc for OSS
Scikit
Gradient Boosting
Model
14. Confidential14
AutoDocs in Driverless AI
Algorithms Supported
• XGBoost, LightGBM, Tensorflow,
Additional Features
• Included in Driverless AI
• Explainability
• Customized Reports
33. Confidential34
Customization
Model Diagnostics Model Interpretability
• Additional Performance Metrics
• Population Stability Index
• Prediction Statistics per Quantile
• Actual vs Predicted Plots
• GINI Plot
• Diagnostics on New Datasets
• Perform Model Diagnostics on a list of new datasets
• Partial Dependence Plots
• Generate partial dependence plots on the n most
important features
• Includes histogram with frequency of each feature
• Individual Conditional Expectation Plot
• Add partial dependence plot for specific records
only
• Variable Importance
• Calculate variable importance on original features
using Permutation Importance
• Filter variables to top relative importance or top n
features
35. Confidential36
Prediction Statistics per Quantile
# Enable the Prediction Statistics for each dataset split
config_overrides += "nautodoc_prediction_stats=true"