Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Overcoming DataOps hurdles for ML in Production

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 25 Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Overcoming DataOps hurdles for ML in Production (20)

Publicité

Overcoming DataOps hurdles for ML in Production

  1. 1. 1 Overcoming DataOps hurdles for ML in Production August 2020 SANDEEP UTTAMCHANDANI CHIEF DATA OFFICER and VP OF ENGINEERING sandeep@unraveldata.com
  2. 2. 2 Behind the scenes of a ML Model in Production
  3. 3. 3 DATA ML Model in Production Discover Prep Build Operationalize DataOps
  4. 4. 4 Top 10 DataOps Battlescars
  5. 5. Levels of Automation Gather technical metadata Gather operational metadata Aggregate tribal knowledge 1. “I thought the attribute means something else” Battlescar: Incorrect assumptions about the meaning of attributes, whether it is the source of truth, owner/common users, versioning, whether dataset is trustworthy? Metric: Time to Interpret Building a Self-Service Metadata Catalog
  6. 6. 1. “I thought the attribute means something else?” Battlescar: Incorrect assumptions about the meaning of attributes, whether it is the source of truth, owner/common users, versioning, whether dataset is trustworthy? Metric: Time to Interpret Building a Self-Service Metadata Catalog Intuit
  7. 7. 7 2. “Where is the dataset I need for my model?” Battlescar: Building a customer support forecasting model. Data was silo’ed across business units. 4+ months of connecting to data stewards to locate the data attributes required for building the model Building a Self-Service Search Service Levels of Automation Indexing of datasets & artifacts Search Relevance ranking Access control of search results Metric: Time to Find
  8. 8. 8 Battlescar: Building a customer support forecasting model. Data was silo’ed across business units. 4+ months of connecting to data stewards to locate the data attributes required for building the model Building a Self-Service Search Service Metric: Time to Find 2. “Where is the dataset I need for my model?”
  9. 9. 9 3. “1000 rows in source database -- why only 50 rows in data lake?” Battlescar: Issues in correctness, completeness, timeliness in moving data daily/hourly from transactional datastores to centralized data lake Metric: Time to Move Building a Self-Service Data Movement service Data Ingestion Configuration Data Transformation Change Mgt Levels of Automation
  10. 10. 10 4. “Job completed but dashboard graphs have data missing?” Battlescar: Jobs are orchestrated using schedulers (such as Airflow, Oozie). Several times, the job dependencies are incorrect, leading to reporting or model training jobs to be triggered prematurely. Metric: Time to Orchestrate Building a Self-Service orchestration Service Levels of Automation Defining Job Dependencies Robust Job Execution Production Monitoring
  11. 11. 11 5. “Data processing was supposed to complete at 8 am. Its 4pm and my model retraining job is still waiting?” Battlescar: Writing efficient Big Data processing applications is non-trivial. With plethora of technologies, gaining broad expertise is difficult even for expert data engineers. Metric: Time to Optimize Building a Self-Service query optimization Service Levels of Automation Aggregating query, cluster, resource Stats Analyzing & correlating stats Tuning Jobs
  12. 12. 12 6. “Customer changed preference to no marketing emails. Why are we still including in email campaigns?” Battlescar: Without a consistent primary key to identify the customer across data silos, where recurring issues arise. Emerging Data Rights such as GDPR, CCPA, require complying with customer preferences on what data is collected, how it is used, deleted on request. Metric: Time to Comply Building a Self-Service data rights governance Service Levels of Automation Tracking customer data lifecycle and preferences Executing customer’s data rights requests Use-case based access control
  13. 13. 13 7. “Job pipeline ran for 15 hours and now we detect data quality issue upon completion -- could we be proactive?” Battlescar: Data issues in a long running business critical job leads to missing insights. Only when results don't look correct that we realize there is an issue. Metric: Time to Insights Quality Building a Self-Service data observability Service Levels of Automation Verify accuracy of data Detect anomalies Avoid data quality issues
  14. 14. 14 8. “Using the best polyglot datastores -- how do I now write queries effectively across this data?” Battlescar: Significant time spent in planning, design, and writing queries that process data across datastores Metric: Time to Virtualize Datastores Building a Self-Service data virtualization Service Levels of Automation Automatic query routing Managing datastore specific queries Joining across transactional sources
  15. 15. 15 9. “I ran a A/B experiment -- need to build time-consuming data pipelines to now analyze the data” Battlescar: Analyzing experimental results in a consistent fashion is a nightmare. No consistent definitions between metrics used for experimental analysis and business reporting Metric: Time to A/B Test Building a Self-Service A/B Testing Service Levels of Automation
  16. 16. 16 10. “Data processing jobs last week cost us 30% more. Why?” Battlescar: Especially in the cloud, $ cost is linear to usage. Tracking budgets and spend to effectively optimize requires non-trivial effort. Metric: Time to Cost Governance Building a Self-Service cost governance Service Levels of Automation Expenditure Observability Matching Supply-Demand Continuous Cost Optimization
  17. 17. 17 Wrap up: Advice on Managing your DataOps
  18. 18. 18 People Process Technology DataOps hurdles vary and depends on...
  19. 19. 19 Self-Service has levels (not binary)
  20. 20. 20 Discover Prep Build Operationalize TIME-TO-INSIGHT Measuring Current DataOps: Time-to-Insight Metric DATA
  21. 21. 21 Discover Prep Build Operationalize Time-to-Insight Scorecard
  22. 22. 22 Discover Prep Build Operationalize Creating Your Time-to-Insight Scorecard WeeksDaysHoursLegend:
  23. 23. 23 Call for Action: Making DataOps Self-Service 1. Measure Create your Time-to-Insight Scorecard Self-Service DataOps 2. Learn Shortlist 1-2 scorecard metrics to improve level of automation 3. Build Implement well-known design patterns in your data platform to make the metrics self-service
  24. 24. 24 Upcoming Book: The Self-Service Data Roadmap Available Sept’20 Early Release Available on O’Reilly: https://www.oreilly.com/library/view/the-self-service-data/9781492075240/
  25. 25. 25 CONTACT US TO SCHEDULE A DATA OPERATIONS HEALTH CHECK TODAY hello@unraveldata.com

×