SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Getting Started with Data Science
Using Python
https://www.linkedin.com/in/andreilyskov/
https://www.quora.com/profile/Andrei-Lyskov-1
About The Speaker
What is Data Science?
What is Data Science?
What is Data Science?
Why the Hype Around Data Science?
● IBM predicts that demand for data scientists will soar by 28% by 2020
● Data scientist roles have grown over 650% since 2012, but currently, 35,000 people in
the US have data science skills, while hundreds of companies are hiring for those roles.
● Software engineering is a common starting point for professionals who are in the
top five fasting growing jobs today. The career path to Machine Learning Engineer
and Big Data Developer begins with a solid software engineering background.
● Data Science gives you career flexibility
Who Are Data Scientists?
https://www.datascience.com/blog/data-scientist-skills
Who Are Data Scientists?
https://www.datascience.com/blog/data-scientist-skills
Who Are Data Scientists?
https://www.datascience.com/blog/data-scientist-skills
Application - Security
Application – Sports
Application - Finance
Application – Microsoft (Skype Product)
● The first is with a product feature called Skype Translator. As its implied,
Skype uses machine learning to translate a conversation between two people
speaking different languages through the use of a third party bot that joins
your call.
Skype Translator – How it Works | Skype Blogs
● The second is to detect fraudulent Skype Users, examples range from
users who send spammy messages, to credit card and online payment fraud.
This is an important application of machine learning as you can imagine, a
platform that’s riddled with spammers and fraudsters is not one that will likely
retain many users.
Detecting Fraudulent Skype Users via Machine Learning
Learning Data Science With Python - Libraries
NumPy is a library for the Python
programming language, adding
support for large, multi-
dimensional arrays and matrices,
along with a large collection of
high-level mathematical functions
to operate on these arrays.
Pandas is a software library
written for the Python
programming language for
data manipulation and
analysis. In particular, it offers
data structures and operations
for manipulating numerical
tables and time series.
A free software machine learning
library that features various
classification, regression and
clustering algorithms including
support vector machines,
random forests, gradient
boosting, and k-means and is
designed to interoperate with the
Python numerical and scientific
libraries NumPy and SciPy.
Learning Data Science With Python - Libraries
A plotting library for the Python
programming language and its
numerical mathematics extension
NumPy
Keras is an open source
neural network library written
in Python. It is capable of
running on top of TensorFlow,
Microsoft Cognitive Toolkit,
Theano, or MXNet. It was
developed with a focus on
enabling fast experimentation
TensorFlow is an open-source
software library for dataflow
programming across a range of
tasks. It is a symbolic math
library, and is also used for
machine learning applications
such as neural networks.
Learning Data Science With Python - Tools
Open-source web application that
allows you to create and share
documents that contain live code,
equations, visualizations and
narrative text
http://jupyter.org/
Crestle is your GPU-enabled
Jupyter environment in the
cloud.
https://www.crestle.com/
Similar to Jupyter Notebook, but
with the added benefit of “google
doc” type sharing and
collaboration
https://colab.research.google.com
Data Science Steps
● Data Gathering
Unless you’re at a company with great data governance you’re likely going to have some trouble
accessing the data you want. Whether that's because your company has neglected to put the
necessary systems in place to gather data, or the data that they are collecting is fragmented and
scattered across the organization, you’ll have to first spend some time gathering whatever data you’ll
need to do your job. That means having discussions with relevant stakeholders, and getting the
necessary credentials to access databases within your organization.
● Data Preparation
Once you have access to data, you’ll need to spend some time cleaning and formatting it. This is
where Data Science can often become more of an art, then a science. Unlike datasets you’ll find in
competitions, the real world has very messy data sets. Missing values, error in data collection, data
formatting, normalization, outliers - these are all issues that you’ll have to learn to deal with.
Data Science Steps
● Exploration
Before diving into building any models, you’ll want to explore the data to try to glean
some insights. Clustering algorithms, scatterplots, bar graphs, Chernoff faces are all
interesting ways of visualizing data that will lead to a better understanding of the
structure of your data and aid you in your model building step.
● Model Building
With your data cleaned and formatted, you’ll have an opportunity to explore a variety of
models to see which one works best. Random Forests, SVM’s, Bayesian Predictors
Neural Networks, Deep Learning, K-Nearest Neighbours - all models you should
familiarize yourself with. There is no one model fits all, and so you again will need to
develop intuition on which model suits your particular problem.
Data Science Steps
● Model Validation
Prediction accuracy is a common benchmark for whether your model is performing well, however
often times there are other evaluation metrics to consider. False positives and false negatives are
important to think about from the perspective of the problem you’re working on. If you’re predicting
disease, you’ll care more about minimizing false negative, since it may result in a persons death -
whereas a false positive will only lead to additional testing.
● Model Deployment
Finally you’ll deploy your model into the wild, as you gather more data and feedback on how its
doing you’ll be able to tweak and improve it as time goes on.
*This is by no means a comprehensive list of steps, and there are certainly other things you’ll need
to do to be able to do well in your job - however this is a good high level overview of the steps
involved in tackling data science problems.
Case Study
Building a regression model to
predict housing prices
Building A Portfolio
1. 2.
3.
Building A Portfolio
4. 5.
Questions?
Resources
Podcasts Websites/Blogs Communities
Data Skeptic Dataquest.io
experfylabs.slack.com/m
essages/C0L736X36/
Data Stories Kaggle.com
opendatacommunity.slac
k.com
Learning Machines 101 Quora.com dcommunity.slack.com
Linear Digressions analyticsvidhya.com kagglenoobs.slack.com
O’Reilly Data Show Coursera.org pythondev.slack.com
Talking Machine
https://developers.google.com/
machine-learning/crash-
course/
This week in Machine Learning and AI https://portal.azure.com/
Siraj Raval (Youtube) https://www.luis.ai/
Resources
Books Tv Shows/documentaries
Hands-On Machine Learning with Scikit-Learn and TensorFlow Humans (2015-)
Python Machine Learning, 1st Edition Persons of interest
Everybody Lies: Big Data, New Data Intelligence
Weapons of Math Destruction Minority report
Big Data: A Revolution That Will Transform How We Live, Work, and Think Almost human
Turing: Pioneer of the Information Age Robot and frank
Avogadro Corp Her
Code: The Hidden Language of Computer Hardware and Software Black Mirror
Superintelligence: Paths, Dangers, Strategies iRobot
Visual Explanations: Images and Quantities, Evidence and Narrative Ex Machina
Pattern Recognition and Machine Learning (Information Science and Statistics)
The Secret Rules of Modern Living:
Algorithms
Storytelling with Data: A Data Visualization Guide for Business Professionals
An Introduction to Statistical Learning by James, Witten, Hastie and Tibshirani

Contenu connexe

Tendances

Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science Ansh Budania
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data ScienceActonRoy
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 

Tendances (20)

Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Data science
Data scienceData science
Data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science
Data science Data science
Data science
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Data science
Data scienceData science
Data science
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Data science
Data scienceData science
Data science
 
Data science
Data scienceData science
Data science
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Data analytics
Data analyticsData analytics
Data analytics
 

Similaire à Data science presentation

A Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptxA Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptxRajSingh512965
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data ScientistsMitch Sanders
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placementSaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science trainingDIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 

Similaire à Data science presentation (20)

A Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptxA Comprehensive Learning Path to Become a Data Science 2021.pptx
A Comprehensive Learning Path to Become a Data Science 2021.pptx
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 

Plus de MSDEVMTL

Intro grpc.net
Intro  grpc.netIntro  grpc.net
Intro grpc.netMSDEVMTL
 
Grpc and asp.net partie 2
Grpc and asp.net partie 2Grpc and asp.net partie 2
Grpc and asp.net partie 2MSDEVMTL
 
Property based testing
Property based testingProperty based testing
Property based testingMSDEVMTL
 
Improve cloud visibility and cost in Microsoft Azure
Improve cloud visibility and cost in Microsoft AzureImprove cloud visibility and cost in Microsoft Azure
Improve cloud visibility and cost in Microsoft AzureMSDEVMTL
 
Return on Ignite 2019: Azure, .NET, A.I. & Data
Return on Ignite 2019: Azure, .NET, A.I. & DataReturn on Ignite 2019: Azure, .NET, A.I. & Data
Return on Ignite 2019: Azure, .NET, A.I. & DataMSDEVMTL
 
C sharp 8.0 new features
C sharp 8.0 new featuresC sharp 8.0 new features
C sharp 8.0 new featuresMSDEVMTL
 
Asp.net core 3
Asp.net core 3Asp.net core 3
Asp.net core 3MSDEVMTL
 
MSDEVMTL Informations 2019
MSDEVMTL Informations 2019MSDEVMTL Informations 2019
MSDEVMTL Informations 2019MSDEVMTL
 
Common features in webapi aspnetcore
Common features in webapi aspnetcoreCommon features in webapi aspnetcore
Common features in webapi aspnetcoreMSDEVMTL
 
Groupe Excel et Power BI - Rencontre du 25 septembre 2018
Groupe Excel et Power BI  - Rencontre du 25 septembre 2018Groupe Excel et Power BI  - Rencontre du 25 septembre 2018
Groupe Excel et Power BI - Rencontre du 25 septembre 2018MSDEVMTL
 
Api gateway
Api gatewayApi gateway
Api gatewayMSDEVMTL
 
Common features in webapi aspnetcore
Common features in webapi aspnetcoreCommon features in webapi aspnetcore
Common features in webapi aspnetcoreMSDEVMTL
 
Stephane Lapointe: Governance in Azure, keep control of your environments
Stephane Lapointe: Governance in Azure, keep control of your environmentsStephane Lapointe: Governance in Azure, keep control of your environments
Stephane Lapointe: Governance in Azure, keep control of your environmentsMSDEVMTL
 
Eric Routhier: Garder le contrôle sur vos coûts Azure
Eric Routhier: Garder le contrôle sur vos coûts AzureEric Routhier: Garder le contrôle sur vos coûts Azure
Eric Routhier: Garder le contrôle sur vos coûts AzureMSDEVMTL
 
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...MSDEVMTL
 
Open id connect, azure ad, angular 5, web api core
Open id connect, azure ad, angular 5, web api coreOpen id connect, azure ad, angular 5, web api core
Open id connect, azure ad, angular 5, web api coreMSDEVMTL
 
Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analyticsYoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analyticsMSDEVMTL
 
CAE: etude de cas - Rolling Average
CAE: etude de cas - Rolling AverageCAE: etude de cas - Rolling Average
CAE: etude de cas - Rolling AverageMSDEVMTL
 
CAE: etude de cas
CAE: etude de casCAE: etude de cas
CAE: etude de casMSDEVMTL
 
Dan Edwards : Data visualization best practices with Power BI
Dan Edwards : Data visualization best practices with Power BIDan Edwards : Data visualization best practices with Power BI
Dan Edwards : Data visualization best practices with Power BIMSDEVMTL
 

Plus de MSDEVMTL (20)

Intro grpc.net
Intro  grpc.netIntro  grpc.net
Intro grpc.net
 
Grpc and asp.net partie 2
Grpc and asp.net partie 2Grpc and asp.net partie 2
Grpc and asp.net partie 2
 
Property based testing
Property based testingProperty based testing
Property based testing
 
Improve cloud visibility and cost in Microsoft Azure
Improve cloud visibility and cost in Microsoft AzureImprove cloud visibility and cost in Microsoft Azure
Improve cloud visibility and cost in Microsoft Azure
 
Return on Ignite 2019: Azure, .NET, A.I. & Data
Return on Ignite 2019: Azure, .NET, A.I. & DataReturn on Ignite 2019: Azure, .NET, A.I. & Data
Return on Ignite 2019: Azure, .NET, A.I. & Data
 
C sharp 8.0 new features
C sharp 8.0 new featuresC sharp 8.0 new features
C sharp 8.0 new features
 
Asp.net core 3
Asp.net core 3Asp.net core 3
Asp.net core 3
 
MSDEVMTL Informations 2019
MSDEVMTL Informations 2019MSDEVMTL Informations 2019
MSDEVMTL Informations 2019
 
Common features in webapi aspnetcore
Common features in webapi aspnetcoreCommon features in webapi aspnetcore
Common features in webapi aspnetcore
 
Groupe Excel et Power BI - Rencontre du 25 septembre 2018
Groupe Excel et Power BI  - Rencontre du 25 septembre 2018Groupe Excel et Power BI  - Rencontre du 25 septembre 2018
Groupe Excel et Power BI - Rencontre du 25 septembre 2018
 
Api gateway
Api gatewayApi gateway
Api gateway
 
Common features in webapi aspnetcore
Common features in webapi aspnetcoreCommon features in webapi aspnetcore
Common features in webapi aspnetcore
 
Stephane Lapointe: Governance in Azure, keep control of your environments
Stephane Lapointe: Governance in Azure, keep control of your environmentsStephane Lapointe: Governance in Azure, keep control of your environments
Stephane Lapointe: Governance in Azure, keep control of your environments
 
Eric Routhier: Garder le contrôle sur vos coûts Azure
Eric Routhier: Garder le contrôle sur vos coûts AzureEric Routhier: Garder le contrôle sur vos coûts Azure
Eric Routhier: Garder le contrôle sur vos coûts Azure
 
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...
Michel Ouellette + Gabriel Lainesse: Process Automation & Data Analytics at S...
 
Open id connect, azure ad, angular 5, web api core
Open id connect, azure ad, angular 5, web api coreOpen id connect, azure ad, angular 5, web api core
Open id connect, azure ad, angular 5, web api core
 
Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analyticsYoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
 
CAE: etude de cas - Rolling Average
CAE: etude de cas - Rolling AverageCAE: etude de cas - Rolling Average
CAE: etude de cas - Rolling Average
 
CAE: etude de cas
CAE: etude de casCAE: etude de cas
CAE: etude de cas
 
Dan Edwards : Data visualization best practices with Power BI
Dan Edwards : Data visualization best practices with Power BIDan Edwards : Data visualization best practices with Power BI
Dan Edwards : Data visualization best practices with Power BI
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Data science presentation

  • 1. Getting Started with Data Science Using Python
  • 3. What is Data Science?
  • 4. What is Data Science?
  • 5. What is Data Science?
  • 6. Why the Hype Around Data Science? ● IBM predicts that demand for data scientists will soar by 28% by 2020 ● Data scientist roles have grown over 650% since 2012, but currently, 35,000 people in the US have data science skills, while hundreds of companies are hiring for those roles. ● Software engineering is a common starting point for professionals who are in the top five fasting growing jobs today. The career path to Machine Learning Engineer and Big Data Developer begins with a solid software engineering background. ● Data Science gives you career flexibility
  • 7. Who Are Data Scientists? https://www.datascience.com/blog/data-scientist-skills
  • 8. Who Are Data Scientists? https://www.datascience.com/blog/data-scientist-skills
  • 9. Who Are Data Scientists? https://www.datascience.com/blog/data-scientist-skills
  • 13. Application – Microsoft (Skype Product) ● The first is with a product feature called Skype Translator. As its implied, Skype uses machine learning to translate a conversation between two people speaking different languages through the use of a third party bot that joins your call. Skype Translator – How it Works | Skype Blogs ● The second is to detect fraudulent Skype Users, examples range from users who send spammy messages, to credit card and online payment fraud. This is an important application of machine learning as you can imagine, a platform that’s riddled with spammers and fraudsters is not one that will likely retain many users. Detecting Fraudulent Skype Users via Machine Learning
  • 14. Learning Data Science With Python - Libraries NumPy is a library for the Python programming language, adding support for large, multi- dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. A free software machine learning library that features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, and k-means and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
  • 15. Learning Data Science With Python - Libraries A plotting library for the Python programming language and its numerical mathematics extension NumPy Keras is an open source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or MXNet. It was developed with a focus on enabling fast experimentation TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks.
  • 16. Learning Data Science With Python - Tools Open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text http://jupyter.org/ Crestle is your GPU-enabled Jupyter environment in the cloud. https://www.crestle.com/ Similar to Jupyter Notebook, but with the added benefit of “google doc” type sharing and collaboration https://colab.research.google.com
  • 17. Data Science Steps ● Data Gathering Unless you’re at a company with great data governance you’re likely going to have some trouble accessing the data you want. Whether that's because your company has neglected to put the necessary systems in place to gather data, or the data that they are collecting is fragmented and scattered across the organization, you’ll have to first spend some time gathering whatever data you’ll need to do your job. That means having discussions with relevant stakeholders, and getting the necessary credentials to access databases within your organization. ● Data Preparation Once you have access to data, you’ll need to spend some time cleaning and formatting it. This is where Data Science can often become more of an art, then a science. Unlike datasets you’ll find in competitions, the real world has very messy data sets. Missing values, error in data collection, data formatting, normalization, outliers - these are all issues that you’ll have to learn to deal with.
  • 18. Data Science Steps ● Exploration Before diving into building any models, you’ll want to explore the data to try to glean some insights. Clustering algorithms, scatterplots, bar graphs, Chernoff faces are all interesting ways of visualizing data that will lead to a better understanding of the structure of your data and aid you in your model building step. ● Model Building With your data cleaned and formatted, you’ll have an opportunity to explore a variety of models to see which one works best. Random Forests, SVM’s, Bayesian Predictors Neural Networks, Deep Learning, K-Nearest Neighbours - all models you should familiarize yourself with. There is no one model fits all, and so you again will need to develop intuition on which model suits your particular problem.
  • 19. Data Science Steps ● Model Validation Prediction accuracy is a common benchmark for whether your model is performing well, however often times there are other evaluation metrics to consider. False positives and false negatives are important to think about from the perspective of the problem you’re working on. If you’re predicting disease, you’ll care more about minimizing false negative, since it may result in a persons death - whereas a false positive will only lead to additional testing. ● Model Deployment Finally you’ll deploy your model into the wild, as you gather more data and feedback on how its doing you’ll be able to tweak and improve it as time goes on. *This is by no means a comprehensive list of steps, and there are certainly other things you’ll need to do to be able to do well in your job - however this is a good high level overview of the steps involved in tackling data science problems.
  • 20. Case Study Building a regression model to predict housing prices
  • 24. Resources Podcasts Websites/Blogs Communities Data Skeptic Dataquest.io experfylabs.slack.com/m essages/C0L736X36/ Data Stories Kaggle.com opendatacommunity.slac k.com Learning Machines 101 Quora.com dcommunity.slack.com Linear Digressions analyticsvidhya.com kagglenoobs.slack.com O’Reilly Data Show Coursera.org pythondev.slack.com Talking Machine https://developers.google.com/ machine-learning/crash- course/ This week in Machine Learning and AI https://portal.azure.com/ Siraj Raval (Youtube) https://www.luis.ai/
  • 25. Resources Books Tv Shows/documentaries Hands-On Machine Learning with Scikit-Learn and TensorFlow Humans (2015-) Python Machine Learning, 1st Edition Persons of interest Everybody Lies: Big Data, New Data Intelligence Weapons of Math Destruction Minority report Big Data: A Revolution That Will Transform How We Live, Work, and Think Almost human Turing: Pioneer of the Information Age Robot and frank Avogadro Corp Her Code: The Hidden Language of Computer Hardware and Software Black Mirror Superintelligence: Paths, Dangers, Strategies iRobot Visual Explanations: Images and Quantities, Evidence and Narrative Ex Machina Pattern Recognition and Machine Learning (Information Science and Statistics) The Secret Rules of Modern Living: Algorithms Storytelling with Data: A Data Visualization Guide for Business Professionals An Introduction to Statistical Learning by James, Witten, Hastie and Tibshirani