The day the ML application is deployed to production and begins facing the real world is the best and the worst day in the life of the model builder. The joy of seeing accurate predictions is quickly overshadowed by a myriad of operational challenges. Debugging, troubleshooting & monitoring takes over the majority of their day, leaving little time for model building. In DevOps, software operations are taken to a level of an art. Sophisticated tools enable engineers to quickly identify and resolve issues, continuously improving software stability and robustness. In the ML world, operations are still largely a manual process that involves Jupyter notebooks and shell scripts. One of the cornerstones of the DevOps toolchain is logging. Traces and metrics are built on top of logs enabling monitoring and feedback loops. What does logging look like in an ML system?
In this talk we will demonstrate how to enable data logging for an AI application using MLflow in a matter of minutes. We will discuss how something so simple enables testing, monitoring and debugging in an AI application that handles TBs of data and runs in real-time. Attendees will leave the talk equipped with tools and best practices to supercharge MLOps in their team.
14. Feature name count max min stddev nunique null_count quantile_0.0000 … quantile_1.0000
chlorides 1199.0 0.611 0.012 0.044 134.0 0.0 0.012 … 0.611
quality 1199 8.000 3.000 0.785 6.0 0.0 3.000 … 8.000
alcohol 1199 14.900 8.400 1.060 65.0 0.0 8.400 … 14.900
density 1199 1.004 0.997 0.001 390.0 0.0 0.990 … 1.004
pH 1199 4.010 2.890 0.153 82.0 0.0 2.890 … 4.010
Log rich statistics for each feature
Each data log captures summary statistics, counters, distributions, metadata and custom metrics
Sample of a flattened data log captured by whylogs on the Wine Quality dataset
15. Track data statistics across batches
Distribution plot for one of the columns in the model input, collected at inference time
Distribution of “free sulfur dioxide” feature over 20 inference batches of the Wine Quality model
16. Dataset Size # of entries # of features Memory consumption Output size
Lending Club 1.6G 2.2M 151 14MB 7.4MB
NYC Tickets 1.9G 10.8 43 14MB 2.3MB
Pain pills 75GB 178M 42 15MB 2MB
Run data logging without overhead
Using streaming algorithms to capture data statistics, whylogs ensures a constant memory footprint,
scales with the number of features in the dataframe, and outputs lightweight log files (json, protobuf, etc).
Sample of whylogs benchmarks on public datasets
17. Whylogs profiles 100% of the data to accurately capture distributions. Calculating distributions from randomly sampled data is significantly
less accurate. The chart presents median error for distributions estimated with whylogs vs. random sampling techniques.
0
0.1
0.2
0.2
0.3
0.4
Normal Normal discrete Normal outlier Uniform discrete Uniform Pareto
Profiling Sampling
Capture accurate data distributions
19. Whylogs captures mergeable histograms for each feature. To catch distribution drift,
continuously compare training distribution of a feature to the serving distribution.
0
200
400
600
800
1000
Training Serving
Use case: training-serving distribution drift
20. Logging enables all key MLOps activities
Once data is logged systematically, whylogs outputs can be used to test, monitor, and debug data.
Use whylogs at any point of the ML stack and through the lifecycle of the ML application.