SlideShare une entreprise Scribd logo
1  sur  27
Put topic in here
© 2018 CloudMade. Proprietary and Confidential. 2
Meet the Team
CloudMade has Kyiv R&D office with 130 person Engineering
team, own car fleet, and Design Studio in London.
Nazar Sheremeta
Senior Data Science
Enginner
Elena Kasianenko
Data Scientist
© 2018 CloudMade. Proprietary and Confidential. 3
Self driving car
© 2018 CloudMade. Proprietary and Confidential. 4
Self driving car
© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 5© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 5
Golf wheel
Smart
Onboarding
Personalized Autonomy
Predictive Navigation
Personalized Search
Predictive Call List
Personalized Coaching
Intelligent Cabin
Intelligent
Climate
Refueling &
Recharging
One Driver
Profile
Many Use Cases
Personalized
Parking Options
Predictive
Drive Mode
Predictive Media
Predictive
Occupant ID
© 2018 CloudMade. Proprietary and Confidential. 7
Agenda
1. Sudden big data
2. Personalized learning
3. A lot of events and features, but not a lot of observations (Use
complicated models to build features for the simple one)
4. Only 2 weeks to learn
5. 10 tips on how to build ML model
© 2018 CloudMade. Proprietary and Confidential. 8
Personalized learning
Small number
of observations
Strong User
Patterns
Computationally
Friendly
© 2018 CloudMade. Proprietary and Confidential. 9
Fleet learning
Ton of
Observations
No User
Patterns
Computationally
Complex
Problem Definition
1
Page 11© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 11© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
Time Series
Rare phenomena
Enterprise Solutions
Aggregate modeling
Where do small data come from?
● Over-
fitting
becomes
much
harder to
avoid
● Outliers
become
much more
dangerous.
Small Data problems
So what to do in these
situation?
Page 14© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 14© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№1. Stick to simple models
● Train personalised
model on top of
universal model on all
users.
№2. Pool data when
possible
Page 16© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 16© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№3. Limit Experimentation
 If you try too many different
techniques, you’ll overfit on
your validation set.
Page 17© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 17© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№4. How much training data do you need?
Page 18© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 18© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№4. How much training data do you need?
 The rule of 10, namely the
amount of training data you
need for a well performing
model is 10x the number of
parameters in the model.
Page 19© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 19© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№5. Do clean up your data
Page 20© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 20© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№6. Do perform feature
selection
 If the data is truly limiting,
sometimes explicit feature
selection is essential.
Page 21© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 21© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№7. Do use Regularization
 Reduces the effective
degrees of freedom without
reducing the actual number
of parameters in the model.
Page 22© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 22© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№8. Do use Model Averaging
Each of the red curves is a model fitted on a few data points
But averaging all these high variance models gets us a smooth
output that is remarkably close to the original
Page 23© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 23© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№9. Try Bayesian Modeling
 Bayesian inference may
be well suited for dealing
with smaller data sets,
especially if you can use
domain expertise to
construct sensible priors.
Page 24© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 24© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№10. Prefer Confidence Intervals
● Parts of the feature space
are likely to be less covered
by your data and prediction
confidence within these
regions should reflect that.
Page 25© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 25© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№10. Prefer Confidence Intervals
Please ask your questions!
Thanks for your attention!
nsheremeta@cloudmade.com
olena.kasianenko@cloudmade.com

Contenu connexe

Similaire à Nazar Sheremeta and Olena Kasanenko "Building Machine Learning Models using real data from the vahicles"

Similaire à Nazar Sheremeta and Olena Kasanenko "Building Machine Learning Models using real data from the vahicles" (20)

NetApp IT Data Center Strategies to Enable Digital Transformation
NetApp IT Data Center Strategies to Enable Digital TransformationNetApp IT Data Center Strategies to Enable Digital Transformation
NetApp IT Data Center Strategies to Enable Digital Transformation
 
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackAdobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
 
Improving Adobe Experience Cloud Services Dependability with Machine Learning
Improving Adobe Experience Cloud Services Dependability with Machine LearningImproving Adobe Experience Cloud Services Dependability with Machine Learning
Improving Adobe Experience Cloud Services Dependability with Machine Learning
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
 
Three Stage of AWS Cost Optimisation with ActOnCloud Trusted Fixer
Three Stage of AWS Cost Optimisation with ActOnCloud Trusted FixerThree Stage of AWS Cost Optimisation with ActOnCloud Trusted Fixer
Three Stage of AWS Cost Optimisation with ActOnCloud Trusted Fixer
 
ML Inference at the Edge
ML Inference at the EdgeML Inference at the Edge
ML Inference at the Edge
 
Node.Js: Basics Concepts and Introduction
Node.Js: Basics Concepts and Introduction Node.Js: Basics Concepts and Introduction
Node.Js: Basics Concepts and Introduction
 
#PCMVision: HPE Family: Numble Storage and SimpliVity
#PCMVision: HPE Family: Numble Storage and SimpliVity#PCMVision: HPE Family: Numble Storage and SimpliVity
#PCMVision: HPE Family: Numble Storage and SimpliVity
 
Javascript for Enterprise Application
Javascript for Enterprise ApplicationJavascript for Enterprise Application
Javascript for Enterprise Application
 
Nimble Storage - The Predicitive Multicloud Flash Fabric
Nimble Storage - The Predicitive Multicloud Flash FabricNimble Storage - The Predicitive Multicloud Flash Fabric
Nimble Storage - The Predicitive Multicloud Flash Fabric
 
AEM DataLayer IMMERSE 2017 Presentation by Dan Klco
AEM DataLayer IMMERSE 2017 Presentation by Dan KlcoAEM DataLayer IMMERSE 2017 Presentation by Dan Klco
AEM DataLayer IMMERSE 2017 Presentation by Dan Klco
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
Enabling Deep Learning in IoT Applications with Apache MXNet
Enabling Deep Learning in IoT Applications with Apache MXNetEnabling Deep Learning in IoT Applications with Apache MXNet
Enabling Deep Learning in IoT Applications with Apache MXNet
 
Loading data into Apache Ignite
Loading data into Apache IgniteLoading data into Apache Ignite
Loading data into Apache Ignite
 
Big Data London 2019 v.10 I 'Loading data into ignite' - Stephen Darlington, ...
Big Data London 2019 v.10 I 'Loading data into ignite' - Stephen Darlington, ...Big Data London 2019 v.10 I 'Loading data into ignite' - Stephen Darlington, ...
Big Data London 2019 v.10 I 'Loading data into ignite' - Stephen Darlington, ...
 
Top 5 Approaches to Hybrid Cloud Storage
Top 5 Approaches to Hybrid Cloud StorageTop 5 Approaches to Hybrid Cloud Storage
Top 5 Approaches to Hybrid Cloud Storage
 
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM Cloud Côte d'Azur Meetup - 20190328 - OptimisationIBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
 
Using Apache Geode: Lessons Learned at Southwest Airlines
Using Apache Geode: Lessons Learned at Southwest AirlinesUsing Apache Geode: Lessons Learned at Southwest Airlines
Using Apache Geode: Lessons Learned at Southwest Airlines
 
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...
 

Plus de Lviv Startup Club

Plus de Lviv Startup Club (20)

Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
 
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
 
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
 
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
 
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
 
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
 
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
 
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
 
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
 

Dernier

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 

Dernier (20)

chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 

Nazar Sheremeta and Olena Kasanenko "Building Machine Learning Models using real data from the vahicles"

  • 2. © 2018 CloudMade. Proprietary and Confidential. 2 Meet the Team CloudMade has Kyiv R&D office with 130 person Engineering team, own car fleet, and Design Studio in London. Nazar Sheremeta Senior Data Science Enginner Elena Kasianenko Data Scientist
  • 3. © 2018 CloudMade. Proprietary and Confidential. 3 Self driving car
  • 4. © 2018 CloudMade. Proprietary and Confidential. 4 Self driving car
  • 5. © 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 5© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 5 Golf wheel
  • 6. Smart Onboarding Personalized Autonomy Predictive Navigation Personalized Search Predictive Call List Personalized Coaching Intelligent Cabin Intelligent Climate Refueling & Recharging One Driver Profile Many Use Cases Personalized Parking Options Predictive Drive Mode Predictive Media Predictive Occupant ID
  • 7. © 2018 CloudMade. Proprietary and Confidential. 7 Agenda 1. Sudden big data 2. Personalized learning 3. A lot of events and features, but not a lot of observations (Use complicated models to build features for the simple one) 4. Only 2 weeks to learn 5. 10 tips on how to build ML model
  • 8. © 2018 CloudMade. Proprietary and Confidential. 8 Personalized learning Small number of observations Strong User Patterns Computationally Friendly
  • 9. © 2018 CloudMade. Proprietary and Confidential. 9 Fleet learning Ton of Observations No User Patterns Computationally Complex
  • 11. 1 Page 11© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 11© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Time Series Rare phenomena Enterprise Solutions Aggregate modeling Where do small data come from?
  • 12. ● Over- fitting becomes much harder to avoid ● Outliers become much more dangerous. Small Data problems
  • 13. So what to do in these situation?
  • 14. Page 14© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 14© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №1. Stick to simple models
  • 15. ● Train personalised model on top of universal model on all users. №2. Pool data when possible
  • 16. Page 16© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 16© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №3. Limit Experimentation  If you try too many different techniques, you’ll overfit on your validation set.
  • 17. Page 17© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 17© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №4. How much training data do you need?
  • 18. Page 18© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 18© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №4. How much training data do you need?  The rule of 10, namely the amount of training data you need for a well performing model is 10x the number of parameters in the model.
  • 19. Page 19© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 19© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №5. Do clean up your data
  • 20. Page 20© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 20© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №6. Do perform feature selection  If the data is truly limiting, sometimes explicit feature selection is essential.
  • 21. Page 21© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 21© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №7. Do use Regularization  Reduces the effective degrees of freedom without reducing the actual number of parameters in the model.
  • 22. Page 22© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 22© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №8. Do use Model Averaging Each of the red curves is a model fitted on a few data points But averaging all these high variance models gets us a smooth output that is remarkably close to the original
  • 23. Page 23© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 23© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №9. Try Bayesian Modeling  Bayesian inference may be well suited for dealing with smaller data sets, especially if you can use domain expertise to construct sensible priors.
  • 24. Page 24© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 24© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №10. Prefer Confidence Intervals ● Parts of the feature space are likely to be less covered by your data and prediction confidence within these regions should reflect that.
  • 25. Page 25© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 25© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. №10. Prefer Confidence Intervals
  • 26. Please ask your questions! Thanks for your attention!