Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Applied Data Science Course Part 1: Concepts & your first ML model

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 25 Publicité

Applied Data Science Course Part 1: Concepts & your first ML model

Télécharger pour lire hors ligne

In this first course of our Applied Data Science online course series, you'll learn about the mindset shift of going from small to big data, basic definitions and concepts, and an overview of the data science workflow.

In this first course of our Applied Data Science online course series, you'll learn about the mindset shift of going from small to big data, basic definitions and concepts, and an overview of the data science workflow.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Applied Data Science Course Part 1: Concepts & your first ML model (20)

Publicité

Plus par Dataiku (17)

Publicité

Applied Data Science Course Part 1: Concepts & your first ML model

  1. 1. ©2018 dataiku, Inc. Applied Data Science Online Course 1st Class: Learning the Basics, concepts, & your first ML model
  2. 2. ©2018 dataiku, Inc. ● September 20th at 12PM ET: Learning the Basics, concepts, & your first ML model ● September 27th at 12PM ET: The data science workflow, building a predictive model flow ● October 4th at 12PM ET: Getting dirty; data preparation and feature creation ● October 11th at 12PM ET: Understanding your model - and communicating about it Curriculum Go from Small to Big Data in 4 weeks
  3. 3. ©2018 dataiku, Inc. > Intro (that’s now) > Background: going from small to big data > Machine Learning definitions & basic concepts > The data science workflow > Questions > Hands-on exercise: Titanic Prediction data The plan for today Learning the Basics, concepts, & your first ML model
  4. 4. ©2018 dataiku, Inc. Going from Small to Big Data
  5. 5. ©2018 dataiku, Inc. Local processing ● Limited Power (100k lines) ● Downloading and opening csv or xls on a local place ● Not distributed Database processing ● Can process billions of lines ● File are not stored and process in the same space than co-worker ● Distributed analysis Local processing vs. Database processing Going from Small to Big Data
  6. 6. ©2018 dataiku, Inc. The basic element you’re working on when you’re modifying data in Excel is the cell. When you’re working with data from a database, your basic element is a column. Whether you’re cleaning your data or enriching it with new variables, you’ll be creating new columns in new datasets, never changing one line of a file at a time. Cell-to-cell Modifications vs. Mass Actions Going from Small to Big Data VS.
  7. 7. ©2018 dataiku, Inc. Potential pain points for analysts to transition Going from Small to Big Data Interacting with database How to connect to Amazon Web Service? Hadoop? How to extract and transform the data 1 2 4 Collaboration with other profile How to benefit and interact with the works of a Data Scientist or Data Engineer? 3 Working with Big Data Extract and Transform my data on very large files Advanced Analytics How to create Machine Learning Models without coding skills? And to handle Geography, Time series…?
  8. 8. ©2018 dataiku, Inc. Concepts and Definitions
  9. 9. ©2018 dataiku, Inc. Data Science: An interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured (wikipedia) What are we talking about? Definitions
  10. 10. ©2018 dataiku, Inc. Machine Learning: A field of study focused on constructed systems that learn from large amount of data to make predictions or find relations. What are we talking about? Definitions
  11. 11. ©2018 dataiku, Inc. Different types of Machine Learning Definitions Supervised Unsupervised Data is labeled, algorithm predicts an output from the input data Data isn’t labeled, algorithm learns the inherent structure of the data and makes a prediction Examples: • Predicting the genre of a song based on a label Examples: • Predicting the genre of a song without a label
  12. 12. ©2018 dataiku, Inc. Different types of Machine Learning Definitions Prediction Clustering Goal: Create a model that can predict a target variable Goal: Separate data into clusters based on similarity (no specific target) Examples: • Predict the sales price of an apartment • Forecast the winner of an election • Diagnose a disease Examples: • Find groups of similar apartments • Segment voters into demographic groups • Group diseases based on symptoms
  13. 13. ©2018 dataiku, Inc. Different types of Prediction Definition If target is (continuous) then regression If target is (discrete) then classification Ex: predicting price of airline tickets Ex: predicting fraud
  14. 14. ©2018 dataiku, Inc. Different types of Prediction Definition
  15. 15. ©2018 dataiku, Inc. Different types of Machine Learning Examples Predicting mortgage defaults Forecasting lifetime spending of customer Grouping songs into genres Predicting amount of snowfall Segmenting website visitors Recommending movies to Netflix users Detecting unusual financial transactions Prediction Clustering (classification) (regression) (regression) (classification) (regression)
  16. 16. ©2018 dataiku, Inc. What’s in a dataset Definitions Feature Observation ● Types of data
  17. 17. ©2018 dataiku, Inc. Train, test, validate If performance on test set starts to decline, think about retraining your model Training set Used to create your model Validation set Used to measure performance Testing set Used to check model performance after deployment
  18. 18. ©2018 dataiku, Inc. Train, test, validate Random or Time based Training set Validation set Test set 70% 20% 10%
  19. 19. ©2018 dataiku, Inc. The Data Science Workflow
  20. 20. ©2018 dataiku, Inc. 7 steps of a data projects The Data Science Workflow
  21. 21. ©2018 dataiku, Inc. Advanced version of the workflow The Data Science Workflow Data Acquisition & Understanding Data Preparation Model Creation Evaluation Deployment Dataset 1 Scored dataset Scored dataset Iteration 1 Iteration 2 Iteration n Dataset 2 Dataset n Business Understanding
  22. 22. ©2018 dataiku, Inc. Qu s o s?
  23. 23. ©2018 dataiku, Inc. Hands-on
  24. 24. ©2018 dataiku, Inc. Kaggle Titanic Challenge Titanic Use Case Predicting who survived the tragedy
  25. 25. ©2018 dataiku, Inc. About Dataiku - Your Path to Enterprise AI

×