SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
The Colorful World of
Data Science
Sreejith C
Data Scientist
Calpine Labs
UVJ Technologies
Kochi
Overview
- Presentaion:
Introduction to Data Science
- Demonstration :
Loan Prediction Problem
- Exploratory data analysis in Python
- Data Munging in Python
- Building a Predictive Model in Python
Logistic Regression
Decision Tree
Random Forest
What is Data Science ?
The Science of
- Discovering what we don’t know from data
- Obtaining predictive, actionable insight from data
- Creating Data Products that have business impact
now
- Communicating relevant business stories from data
- Building confidence in decisions that drive business
value
“ Data science is clearly a blend of the hackers’ arts,
statistics and machine learning...
and the expertise in mathematics and the domain of
the data for the analysis to be interpretable...
It requires creative decisions and open-mindedness in
a scientific context “
Hilary Mason and Chris Wiggins
Hilary Mason is an American data scientist and the founder of technology startup Fast Forward Labs as well as Data Scientist in Residence at Accel Partners. She
was the Chief Scientist at bitly.
Christopher H. Wiggins is an associate professor of applied mathematics at Columbia University, the first Chief Data Scientist at The New York Times, and co-
founder and co-organizer of hackNY hackNY.org
THE DATA SCIENCE VENN DIAGRAM
Who is a Data Scientist ?
“ We realized that as our organizations grew, we both had to figure
out what to call the people on our teams.
Business analyst and Data analyst seemed too limiting.
The focus of our teams was to work on data applications that would
have an immediate and massive impact on the business.
The term that seemed to fit best was data scientist:
those who use both data and science to create something new “
DJ Patil
Chief Data Scientist of the United States Office of Science and Technology Policy, Patil is credited for coining the term "data science"
What Does a Data Scientist
Do?
“... on any given day, a team member could author a multistage
processing pipeline in Python,
design a hypothesis test, perform a regression analysis over data
samples with R,
design and implement an algorithm for some data-intensive product
or service in Hadoop,
communicate the results of our analyses to other members of the
organization “
Jeff Hammerbacher
Data scientist as well as chief scientist and cofounder at Cloudera.Along with Along with Jeff Hammerbacher, Patil is credited with coining the term "data science", Jeff
Hammerbacher is credited with coining the term "data science"
Machine Learning
- Regression
- Classification
- Clustering
Big Data Analytics
How to become a data scientist ?
Data scientists need to know how to code
Python
R
Julia
Java
Scala
Sql / NoSql
Spark / Hadoop
Data scientists need to be comfortable with
mathematics & statistics.
Data scientists need know machine learning &
software engineering.
Putting the pieces together .....
SIMPLE (Students' Innovations in Morphology Phonology and
Language Engineering) groups
CLEAR (Computational Linguistics in Engineering And
Research) magazine
- Blog / Write about your experience
- Build sample projects
- Share ideas
Puzzle
A huntsman can hit a target with a probability of 0.8
He sees a flock of birds (150 birds) atop a banyan tree.
He takes aim and fires 5 continuos shots.
Question : How many birds remain on the tree ?
Don't lose the big picture !!
0 !
Loan Prediction Problem
challenge is to predict approval status of loan
(Approved/ Reject)
Link :
https://github.com/sreejithc321/ML_Regression/tree/master/loan
_prediction
Demonstration
References
http://www.slideshare.net/ryanorban/how-to-become-a-data-
scientist
http://www.slideshare.net/datasciencelondon/big-data-sorry-data-
science-what-does-a-data-scientist-do
https://speakerdeck.com/bargava/introduction-to-machine-learning
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-
learn-data-science-python-scratch-2/
Connect me at : http://in.linkedin.com/in/sreejithc321
Follow me at : https://twitter.com/sreejithc321

Contenu connexe

Tendances

What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
Edureka!
 

Tendances (20)

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science
Data scienceData science
Data science
 
Data science
Data science Data science
Data science
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideBig Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation Slide
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
What is big data?
What is big data?What is big data?
What is big data?
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Big data
Big dataBig data
Big data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science Introduction
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Big Data
Big DataBig Data
Big Data
 

Similaire à Data science

Data science presentation - Management career institute
Data science presentation - Management career instituteData science presentation - Management career institute
Data science presentation - Management career institute
PoojaPatidar11
 

Similaire à Data science (20)

intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
 
Workshop_Presentation.pptx
Workshop_Presentation.pptxWorkshop_Presentation.pptx
Workshop_Presentation.pptx
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Science
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo Pellegrini
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
Data science presentation - Management career institute
Data science presentation - Management career instituteData science presentation - Management career institute
Data science presentation - Management career institute
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
 
What is data_science_by_khawar_shehzad
What is data_science_by_khawar_shehzadWhat is data_science_by_khawar_shehzad
What is data_science_by_khawar_shehzad
 
Data Skills for Digital Era
Data Skills for Digital EraData Skills for Digital Era
Data Skills for Digital Era
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Data science

  • 1. The Colorful World of Data Science Sreejith C Data Scientist Calpine Labs UVJ Technologies Kochi
  • 2. Overview - Presentaion: Introduction to Data Science - Demonstration : Loan Prediction Problem - Exploratory data analysis in Python - Data Munging in Python - Building a Predictive Model in Python Logistic Regression Decision Tree Random Forest
  • 3. What is Data Science ?
  • 4. The Science of - Discovering what we don’t know from data - Obtaining predictive, actionable insight from data - Creating Data Products that have business impact now - Communicating relevant business stories from data - Building confidence in decisions that drive business value
  • 5. “ Data science is clearly a blend of the hackers’ arts, statistics and machine learning... and the expertise in mathematics and the domain of the data for the analysis to be interpretable... It requires creative decisions and open-mindedness in a scientific context “ Hilary Mason and Chris Wiggins Hilary Mason is an American data scientist and the founder of technology startup Fast Forward Labs as well as Data Scientist in Residence at Accel Partners. She was the Chief Scientist at bitly. Christopher H. Wiggins is an associate professor of applied mathematics at Columbia University, the first Chief Data Scientist at The New York Times, and co- founder and co-organizer of hackNY hackNY.org
  • 6. THE DATA SCIENCE VENN DIAGRAM
  • 7. Who is a Data Scientist ?
  • 8. “ We realized that as our organizations grew, we both had to figure out what to call the people on our teams. Business analyst and Data analyst seemed too limiting. The focus of our teams was to work on data applications that would have an immediate and massive impact on the business. The term that seemed to fit best was data scientist: those who use both data and science to create something new “ DJ Patil Chief Data Scientist of the United States Office of Science and Technology Policy, Patil is credited for coining the term "data science"
  • 9.
  • 10. What Does a Data Scientist Do?
  • 11. “... on any given day, a team member could author a multistage processing pipeline in Python, design a hypothesis test, perform a regression analysis over data samples with R, design and implement an algorithm for some data-intensive product or service in Hadoop, communicate the results of our analyses to other members of the organization “ Jeff Hammerbacher Data scientist as well as chief scientist and cofounder at Cloudera.Along with Along with Jeff Hammerbacher, Patil is credited with coining the term "data science", Jeff Hammerbacher is credited with coining the term "data science"
  • 12.
  • 13. Machine Learning - Regression - Classification - Clustering
  • 15. How to become a data scientist ?
  • 16. Data scientists need to know how to code Python R Julia Java Scala Sql / NoSql Spark / Hadoop
  • 17. Data scientists need to be comfortable with mathematics & statistics.
  • 18. Data scientists need know machine learning & software engineering.
  • 19. Putting the pieces together ..... SIMPLE (Students' Innovations in Morphology Phonology and Language Engineering) groups CLEAR (Computational Linguistics in Engineering And Research) magazine - Blog / Write about your experience - Build sample projects - Share ideas
  • 20. Puzzle A huntsman can hit a target with a probability of 0.8 He sees a flock of birds (150 birds) atop a banyan tree. He takes aim and fires 5 continuos shots. Question : How many birds remain on the tree ?
  • 21. Don't lose the big picture !! 0 !
  • 22. Loan Prediction Problem challenge is to predict approval status of loan (Approved/ Reject) Link : https://github.com/sreejithc321/ML_Regression/tree/master/loan _prediction Demonstration
  • 24. Connect me at : http://in.linkedin.com/in/sreejithc321 Follow me at : https://twitter.com/sreejithc321