Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

From science to engineering, the process to build a machine learning product

146 vues

Publié le

Datacon 2019

Publié dans : Ingénierie
  • Soyez le premier à commenter

From science to engineering, the process to build a machine learning product

  1. 1. From Science to Engineering, Process of a Machine Learning Product Bruce Kuo bruce3557@gmail.com !1
  2. 2. Who Am I? • Bruce Kuo • Experience: • Yahoo software engineer in Data team and Global Search (2014-2017) • Codementor Data Scientist (2017 - 2019) !2
  3. 3. Target Audience • Who is interested in machine learning product development • Junior / Mid Level machine learning engineers • Data scientists / engineers !3
  4. 4. Goals of This Talk • Share the overview of a machine learning project • Share points between business problems and machine learning problems • Share engineering stuffs in a machine learning product !4
  5. 5. Machine Learning Project Overview !5 Machine Learning Project Science Engineering Science Levels Research Steps Requirements Define Problem & Objectives Offline Evalution Solution Research Model Serialization ML Data Pipeline Model Serving Performance Tracking CI & Monitoring
  6. 6. Science !6
  7. 7. Two Different Science Levels • Unknown business problem • Example: do we need fast recommendation after user view a product page? • This is another topic • Known business problem, unknown solutions • Example: we have supply problem on matching algorithm, how can we improve conversion by recommendation. • More ML steps here !7 We focus on known business problem in this sharing. Unknown problem part is another story…
  8. 8. Where ML Requirements Come From !8 Data Analysis We need a recommendation module! PM, Analysts
  9. 9. Where ML Requirements Come From !9 Qualitative Analysis (User Feedback) We need to improve our tag suggestion module! PM, Designer, Sales, Marketing
  10. 10. ML Science on Business • ML science of business problems are like “experimental science” • Different dataset will have different algorithms to solve / learn. • Designing experiments is important. !10
  11. 11. ML Problem Steps • Goal: enhance a specific business metric !11 Define Problem & Objectives Solution Research & Experiments Define Evaluation Metrics
  12. 12. Define the Problem • What is the business problem? • News triggering • Mentor matching • Which type of ML problem can be used to solve the business problem? • Classification? • Recommendation? • … !12
  13. 13. Define Objectives • In algorithm, we focus on loss • 0/1 loss • Mean Square Error (MSE) • Mean Absolute Error (MAE) • Cross Entropy • … https://cloud.tencent.com/developer/article/1092365!13 • In business, we focus on business goal. • Interest rate • Conversion • CTR • …
  14. 14. Design Offline Evaluation • After defining problem & objectives, we need to design offline evaluation. • Usually offline evaluation metrics are business goals (CTR, interest rate, …) • First version of data pipeline design and online evaluation design. • Provide confidence before we start integrating algorithm to online service. • Supervised offline evaluation is easy, unsupervised is hard. !14
  15. 15. Solution Research !15 • Paper, paper, paper • Learning how to solve similar problems and how we can get idea from those solutions • Research areas of machine learning • For different purposes: classification / regression / clustering … • Algorithm optimization: which kind of gradient descend function is better
  16. 16. Solution Research (Cont.) • In startup, we usually focus on high level parts because: • Tuning speed • Integration - need to choose mature implementation for better production usage, e.g., scikit-learn or keras. • Feature engineering is pretty important when we only select algorithms • Small goals on solution engineering - easy to retrain !16
  17. 17. Example: Product Recommendation • Problem: Give an user, we want to recommend products to the user • Ranking problem or recommendation problem • Objectives: • Business Metrics: top-k interest rate • Loss function: dependent to our solution • Offline Metrics: We evaluate top-k interest rate as performance metrics after optimization • Solution research: Matrix Factorization, Factorization machine, Deep Learning, Learning to rank… !17
  18. 18. Engineering !18
  19. 19. Why Engineering Needed? • Model results should be used in your products • How? !19
  20. 20. Need CI / Monitoring Model Training Data Pipeline Science Engineering Serialization API Serving From Science to Engineering First Step: Export Model !20
  21. 21. Serialization - Export Model • Goal: serialize your model into binary file or general format, everyone can use this for prediction. • Different serialization methods for different algorithms but same interface in different machine learning packages • Scikit-learn: https://scikit-learn.org/stable/modules/model_persistence.html • Keras: https://jovianlin.io/saving-loading-keras-models/ • … • More low level design: http://dmg.org/pmml/products.html !21
  22. 22. Serialization - Export Model • Example: how to serialize logistic regression model? • scikit-learn: joblib.dump(model, path) • From scratch: need to realize the model equation • Equation: • Only save , that is a linear weight vector, and we can calculate the prediction function. • PMML is trying to define serialization interface for each algorithm !22 Pr(Yi = y|Xi) = eβ*Xi*y 1 + eβ*Xi β
  23. 23. Serialization - Export Latent Features • We extract hidden vectors to represent user / items • Extract photo features with auto encoder • Extract user features with matrix factorization … • 2 Ways to export latent features • Save model, e.g., auto-encoder • Save features vectors, e.g., matrix factorization vectors !23
  24. 24. Example - Matrix Factorization !24 picture: https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/ Save by user Save by product
  25. 25. How to Use Model Result? • Model predict in data API • Model predict in data pipeline !25
  26. 26. Predict in Data API • Model predict in Data API • Prepare data in data pipeline • Data in request payload • Can provide realtime prediction • Latency is a challenge Data API Model user data Data Warehouse user data !26 Serving Database user features DataPipeline
  27. 27. Predict in Data Pipeline • Model predict in Data Pipeline • Predict result in pipeline and save to database • Backend implements logics on their side • Better API speed • Lower flexibility Data API Model Data Warehouse user data Serving Database predict extract result Components !27 DataPipeline
  28. 28. Other Concerns in Engineering • How long we need to provide model results to users? • How to handle data changes? • Online performance tracking • Monitoring • CI / CD Factors to design your pipeline !28
  29. 29. Conclusion • The overview of a machine learning project • Points between business problems and machine learning problems • Engineering details in a machine learning project !29
  30. 30. Q & A Thanks for Listening! !30

×