Contenu connexe
Similaire à Nazar Sheremeta and Olena Kasanenko "Building Machine Learning Models using real data from the vahicles" (20)
Plus de Lviv Startup Club (20)
Nazar Sheremeta and Olena Kasanenko "Building Machine Learning Models using real data from the vahicles"
- 2. © 2018 CloudMade. Proprietary and Confidential. 2
Meet the Team
CloudMade has Kyiv R&D office with 130 person Engineering
team, own car fleet, and Design Studio in London.
Nazar Sheremeta
Senior Data Science
Enginner
Elena Kasianenko
Data Scientist
- 5. © 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 5© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 5
Golf wheel
- 7. © 2018 CloudMade. Proprietary and Confidential. 7
Agenda
1. Sudden big data
2. Personalized learning
3. A lot of events and features, but not a lot of observations (Use
complicated models to build features for the simple one)
4. Only 2 weeks to learn
5. 10 tips on how to build ML model
- 8. © 2018 CloudMade. Proprietary and Confidential. 8
Personalized learning
Small number
of observations
Strong User
Patterns
Computationally
Friendly
- 9. © 2018 CloudMade. Proprietary and Confidential. 9
Fleet learning
Ton of
Observations
No User
Patterns
Computationally
Complex
- 11. 1
Page 11© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 11© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
Time Series
Rare phenomena
Enterprise Solutions
Aggregate modeling
Where do small data come from?
- 14. Page 14© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 14© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№1. Stick to simple models
- 16. Page 16© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 16© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№3. Limit Experimentation
If you try too many different
techniques, you’ll overfit on
your validation set.
- 17. Page 17© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 17© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№4. How much training data do you need?
- 18. Page 18© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 18© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№4. How much training data do you need?
The rule of 10, namely the
amount of training data you
need for a well performing
model is 10x the number of
parameters in the model.
- 19. Page 19© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 19© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№5. Do clean up your data
- 20. Page 20© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 20© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№6. Do perform feature
selection
If the data is truly limiting,
sometimes explicit feature
selection is essential.
- 21. Page 21© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 21© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№7. Do use Regularization
Reduces the effective
degrees of freedom without
reducing the actual number
of parameters in the model.
- 22. Page 22© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 22© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№8. Do use Model Averaging
Each of the red curves is a model fitted on a few data points
But averaging all these high variance models gets us a smooth
output that is remarkably close to the original
- 23. Page 23© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 23© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№9. Try Bayesian Modeling
Bayesian inference may
be well suited for dealing
with smaller data sets,
especially if you can use
domain expertise to
construct sensible priors.
- 24. Page 24© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 24© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№10. Prefer Confidence Intervals
● Parts of the feature space
are likely to be less covered
by your data and prediction
confidence within these
regions should reflect that.
- 25. Page 25© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential. Page 25© 2017 CloudMade. All Rights Reserved. Proprietary and Confidential.
№10. Prefer Confidence Intervals