Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Digital Origin - Pipelines for model deployment

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 24 Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Digital Origin - Pipelines for model deployment (20)

Publicité

Plus récents (20)

Digital Origin - Pipelines for model deployment

  1. 1. Pipelines for model deployment 2017-04-25
  2. 2. 1. Digital Origin introduction 2. A recurrent problem moving to production 3. H2O 4. Digital Origin pipeline 5. Sometimes is harder than usual to automate: Rejection Inference
  3. 3. 1. Digital Origin introduction 2. A recurrent problem moving to production 3. H2O 4. Digital Origin pipeline 5. Sometimes is harder than usual to automate: Rejection Inference
  4. 4. Digital Origin – Introduction Digital Origin is a leading Spanish fintech company focused on technology-enabled consumer finance. Founded in 2011. €15 million A-round in 2015. 80 employees with offices in Barcelona and Madrid. Uniquely positioned to address mainstream consumer finance market with a wide portfolio of instant real-time products with a process completely online. Over €150 million lent to date. ¡QuéBueno! was released in 2011: Consumer finance microlending. 1 2 3 4 5 Paga+Tarde was released in 2015: Consumer finance for eCommerce and InStore.6
  5. 5. Fraud Risk Business Monitoring Massive Fraud Identity Fraud Not Willing to Pay Default Risk Product, UX, AR vs DR tradeoff Evaluation Control & Alerts Marketing Credit Cards / Returnings QB Device - Fingerprinting User request Graph relationships model DNI Images models Geo fraud model Basket Model Behavioural Model Configuration & Parameter Tuning Reporting Uplift models CLTV models Identity fraud model Alerts CREDIT RISK ENGINE (CRE) Design & Models params Risk Model
  6. 6. 1. Digital Origin introduction 2. A recurrent problem moving to production 3. H2O 4. Digital Origin pipeline 5. Sometimes is harder than usual to automate: Rejection Inference
  7. 7. A recurrent problem moving to production I+D Environment Prod. Environment • Scalable architecture • Error handling • High availability • Load Balance • Reliable and Stable • … Data Scientist profile Engineer profile There are different requirement in development/design phase and once in production. • Interactive mode • Friendly for discovery • Fast developing language • Easy to save a state to continue later on • Access to mathematic libraries • …
  8. 8. A recurrent problem moving to production I+D Environment Prod. Environment • Interactive mode • Friendly for discovery • Fast developing language • Easy to save a state to continue later on • Access to mathematic libraries • … • Scalable architecture • Error handling • High availability • Load Balance • Reliable and Stable • … Data Scientist profile Engineer profile Different languages implies twice or more work.
  9. 9. A recurrent problem moving to production I+D Environment Prod. Environment • Interactive mode • Friendly for discovery • Fast developing language • Easy to save a state to continue later on • Access to mathematic libraries • … • Scalable architecture • Error handling • High availability • Load Balance • Reliable and Stable • … Data Scientist profile Engineer profile Solution A: Python is well suited for both necessities.
  10. 10. A recurrent problem moving to production I+D Environment Prod. Environment • Interactive mode • Friendly for discovery • Fast developing language • Easy to save a state to continue later on • Access to mathematic libraries • … • Scalable architecture • Error handling • High availability • Load Balance • Reliable and Stable • … Data Scientist profile Engineer profile / / / ... ... Solution B: API approach to get some give some flexibility.
  11. 11. A recurrent problem moving to production I+D Environment Prod. Environment • Interactive mode • Friendly for discovery • Fast developing language • Easy to save a state to continue later on • Access to mathematic libraries • … • Scalable architecture • Error handling • High availability • Load Balance • Reliable and Stable • … Data Scientist profile Engineer profile ...
  12. 12. 1. Digital Origin introduction 2. A recurrent problem moving to production 3. H2O 4. Digital Origin pipeline 5. Sometimes is harder than usual to automate: Rejection Inference
  13. 13. H2O - Architectures • Open source API for Machine Learning • Massively Scalable Big Data Analysis • Easy-to-use WebUI (Jupyter – Python notebook) • Familiar Interfaces: R, Python, Scala, Java, API, … • Real-time Data Scoring • Rapidly deploy models to production via POJO or model-optimized Java objects (MOJO) • Algorithms • GLM • Random Forest • GBM • “Deep Learning” • Deep Water: Tensorflow, MXNet, Caffe, … (not yet) • … https://www.h2o.ai/h2o/
  14. 14. H2O - Architectures Local Cluster + HDFS Cluster
  15. 15. H2O - Architectures Cluster + Spark Node 1 … Node N Cluster + Spark
  16. 16. H2O - Performance Reproducible benchmark: https://github.com/szilard/benchm-ml GLM RF GBM (setup A) GBM (setup B)
  17. 17. 1. Digital Origin introduction 2. A recurrent problem moving to production 3. H2O 4. Digital Origin pipeline 5. Sometimes is harder than usual to automate: Rejection Inference
  18. 18. Fraud Risk Business Monitoring Massive Fraud Identity Fraud Not Willing to Pay Default Risk Product, UX, AR vs DR tradeoff Evaluation Control & Alerts Marketing Credit Cards / Returnings QB Device - Fingerprinting User request Graph relationships model DNI Images models Geo fraud model Behavioural Model Configuration & Parameter Tuning Reporting Uplift models CLTV models Identity fraud model Alerts CREDIT RISK ENGINE (CRE) Design & Models params Risk Model
  19. 19. Development Production Node 1 … Node N Hadoop ecosystem Extract Transform Train models Transform Scoring Export POJO Digital Origin – Introduction
  20. 20. Data Analytics activity Production Credit Risk Engine (CRE) Digital Origin – Actual Pipeline Reporting Replica Databases {{mustache}} streaming Query template Tools Corporate Libraries batch Data Science Daily activity and recurrent processes Analytics and Reporting databases Production Databases Alerts System Services to other dep. CRE development CRE param. tuning Front End New Model Back End New Config
  21. 21. THANKS! Questions? ralabern@digitalorigin.com markus@digitalorigin.com

×