This talk includes the following items:
1) discussion of the various stages of ML application life cycle - problem formulation, data definitions, modeling, production system design & implementation, testing, deployment & maintenance, online evaluation & evolution.
2) getting the ML problem formulation right
3) key tenets for different stages of application cycle.
Audio for the talk:
https://youtu.be/oBR8flk2TjQ?t=19207
12. System Objectives
• Effectiveness w.r.t. business metrics
• Ethical compliance
• Fidelity wr.t. distributional assumptions
• Reproducibility
• Auditability
• Reusability
• Security
• Graceful failure
• ….
12
Can achieve these only with a formal approach
with checklists, templates & tests for each stage!
14. ML & Data Science Learning Programs
14
Problem
Formulation
Data
Learning
Algorithms
ML
Pipelines
Modeling
Process
Deployment
Issues
Lot of emphasis on algorithms,
ML tools & modeling!
15. Factors for Success of ML Systems
15
Problem
Formulation
Data
Learning
Algorithms
ML
Pipelines
Modeling
Process
Deployment
Issues
Problem formulation & data
become more critical !
16. Problem Formulation
Business Problem: Optimize a decision process to improve business metrics
• Sub-optimal decisions due to missing information
• Solution strategy: predict missing information from available data using ML
Decision
Process
Decisions
External
Response
Business
Metrics
ML
Model
ML
Model
ML
Model
Ask “why?” to arrive at the right ML problem(s) !
17. Reseller Fraud Example
• Bulk buys during sale days on e-commerce websites
• Later resale at higher prices or returns
18. Reseller Fraud Example
Objective: Automation of reseller fraud detection
Option 1: Learn a binary classifier using historical orders & human auditor labels
19. Reseller Fraud Example
Objective: Automation of reseller fraud detection
Option 1: Learn a binary classifier using historical orders & human auditor labels
Limitations:
● Reverse-engineers human auditors’ decisions along with their biases and
shortcomings
● Can’t adapt to changes in fraudster tactics or data drifts
● No connection to “actual business metrics” that we want to optimize
20. Reseller Fraud Example
Objective: Reduce return shipping expenses; increase #users served (esp. sale time)
Decision process:
• Partner with reseller in case of potential to expand user base
• Block fraudulent orders or introduce friction (e.g., disable COD/free returns)
Missing information relevant to the decision:
• Likelihood of the buyer reselling the products
• Likely return shipping costs
• Unserved demand for the product (during sale and overall)
• Likelihood of reseller serving an untapped customer base
21. Key elements of an ML Prediction Problem
• Instance definition
• Target variable to be predicted
• Input features
• Modeling metrics
• Ethical & fairness constraints
• Deployment constraints
• Sources of data
REPRESENTATION OBJECTIVES
OBSERVATIONS
22. Instance Definition
• Is it the right granularity for the decision making process?
• Is it feasible from the data collection perspective ?
Multiple options (reseller fraud example)
• a customer
• a purchase order spanning multiple products
• a single product order (i.e., customer-product pair)
23. TargetVariable to be Predicted
• Can we express the business metrics (approximately) in terms of the
prediction quality of the target variables(s)?
• Will accurate predictions improve the business metrics substantially?
– estimate biz. metrics for different cases (ideal, current-baseline, likely)
• What is the data collection effort ?
– manual labeling costs, joins with external data
• Is it possible to get high quality observations?
– uncertainty in the definition, noise or bias in labeling process
24. Input features
• Is the feature predictive of the target ?
• Are the features going to be available in production setting ?
– define exact time windows for features based on aggregates
– watch out for time lags in data availability
– be wary of target leakages (esp. conditional expectations of target )
• How costly is to compute or acquire the feature ?
– monetary and computational costs
– might be different in training and deployment settings
25. Sources of Data
• Is the distribution of training data similar to production data?
– at least conditional distribution of target given input signals?
– are there fairness issues that require sampling adjustments?
– can we re-train with “new data” in case production data evolves over time?
• Are there systemic biases in training data due to collection process?
– fixed training filters?
• adjust the prediction scope to match with the filter
– collection limited by existing model?
• explore-exploit strategies & statistical bias correction approaches
26. Modeling Metrics - Classification
• Online metrics are meant to be computed on a live system
– can be defined directly in terms of the key business metrics (e.g., net revenue)
– typically measured via A/B tests & influenced by a lot of factors
• Offline metrics are meant to be computed on retrospective “labeled” data
– more closely tied to prediction quality (e.g., area under ROC curve)
– typically measured during offline experimentation
11/22/19 26
27. Modeling Metrics - Classification
• Online metrics are meant to be computed on a live system
– can be defined directly in terms of the key business metrics (e.g., net revenue)
– typically measured via A/B tests & influenced by a lot of factors
• Offline metrics are meant to be computed on retrospective “labeled” data
– more closely tied to prediction quality (e.g., area under ROC curve)
– typically measured during offline experimentation
11/22/19 27
• Primary metrics are ones that we are actively trying to optimize
– e.g., losses due to fraud
• Secondary metrics are ones that can serve as constraints or guardrails
– e.g., customer base size
28. Modeling Metrics
• What are the key online metrics (primary/secondary)?
– a deep question related to system goals ! !
• Are the offline modeling metrics aligned with online metrics ?
– relative goodness of models should reflect online metric performance
11/21/19 28
29. Ethical and Fairness Constraints
• What are the long term secondary effects of the ML
system ?
• Is the system fair to different user segments ?
Need$to$be$incorporated$in$the$modeling$metrics$!
30. Deployment Constraints
• What are the application constraints?
– user interface based restrictions (interaction mode, form factor)
– connectivity issues
• What are the hardware constraints ?
– client side or server side computation
– memory, compute power
• What are scalability requirements ?
– size of data, frequency of processing( training/batch prediction)
– rate of arrival of prediction instances & latency bounds (online predictions)
32. Data Definitions
• Precisely record all sources & definitions for all data elements
– (ids, features, targets, metric-factors) for (training, evaluation, production)
• Establish parity across training/evaluation/production
– definitions, level sets, units, time windows, missing value handling, correct snapshots
• Review for common data leakages
– peeking into future, target
• Pro-actively collect information on data quality issues & resolve
– missing/invalid value causes, data corruptions
33. Offline Modeling
• Ensure data is of high quality
– Fix missing values, outliers, systemic bias
• Narrow down modeling options based on data characteristics
– Learn about the relative effectiveness of various preprocessing, feature engineering,
and learning algorithms for different types of data.
• Be smart on the trade-off between feature engg. effort & model complexity
– Sweet spot depends on the problem complexity, available data, domain knowledge,
and computational requirements
• Ensure offline evaluation is a good “proxy” for the “real unseen” data
evaluation
– Generate data splits similar to how it would be during deployment
34. Engineering
• Work backwards from the application use case
– Data/compute /ML framework choices based on deployment constraints
• Clear decoupling of modeling and production system responsibilities
– self contained models (config, parameters, libs) from data scientists
– application-agnostic pipelines for scoring, evaluation, re-training, data-collection
• Maintain versioned repositories for data, models, experiments
– logs, feature factories
• Plan for ecosystems of connected ML models
– easy composition of ML workflows
11/22/19 34
35. Deployment
• Establish offline modeling vs. production parity
– Checks on every possible component that could change
• Establish improvement in business metrics before scaling up
– A/B testing over random buckets of instances
• Trust the models, but always audit
– Insert safe-guards (automated monitoring) and manual audits
• View model building as a continuous process not a one-time effort
– Retrain periodically to handle data drifts & design for this need
36. Main Takeaways
• Map out your org-specific ML application life cycle
• Introduce checklists, templates, and tests for each stage
• Invest effort in getting the problem formulation right (ask “why?”)
• Be proactive about data issues
11/21/19 36