SlideShare une entreprise Scribd logo
1  sur  18
Data Science
  Data Meetup Jan. 12
What is data science?
Besides a reason to have beer and pizza…
What does the literature say?
Hacking
“Good data scientists understand, in a
deep way, that the heavy lifting of
cleanup and preparation isn’t
something that gets in the way of solving
the problem… it is the problem”
                                   DJ Patil



 bash/awk/sed
Statistics
What’s the probability that 2 people in
the front 2 rows share a birthday?
1. ~10%
2. ~20%
3. ~50%
4. ~90%

What’s the probability that a 99%
accurate test diagnosed a 1/1000 disease?
1. ~10%
2. ~50%
3. ~90%
4. ~99%
Domain Expertise
Intelligence Cookbook
      Just follow the steps
The Recipe

First, make it valuable.
Then, make it possible.
Then, make it beautiful.
 Then, make it smart.
Example

E-Commerce website
Make it valuable

Find a KPI that is correlated
   to bottom line revenue


e.g. number of products the
  visitor browses through
Make it possible

Develop the simplest heuristic



e.g. show the visitor one of the
     top 10 selling products
Make it beautiful

Create a method to quickly test new
    algorithms against old ones


 e.g. create a framework that split
   tests two models and reports
         which one is better
Make it smart

Figure out in what field your problem is
 and choose an off the shelf algorithm


    e.g. recognize that the problem
   is product recommendation and
       use collaborative filtering
Common ML problems
•   Supervised learning
    •   Classification
    •   Regression
    •   Anomaly detection
•   Unsupervised learning
    •   Clustering
    •   Separation
•   Recommendation
    •   Feature based recommendation
    •   Collaborative filtering
•   Search
    •   Indexing
    •   Ranking
To sum it all up
Real data science is hard

but …

Real data science is the last step in data
science, not the first

and besides …

The most important thing in data science is
the business, not the science
Questions?

email: vitalyp@liveperson.com

     Twitter: @bigdatasc

Contenu connexe

Tendances

TDAmeritrade Holiday Spending and Behavioral Econ
TDAmeritrade Holiday Spending and Behavioral EconTDAmeritrade Holiday Spending and Behavioral Econ
TDAmeritrade Holiday Spending and Behavioral EconStephen Wendel
 
How to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data ScientistHow to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data ScientistTanayKarnik1
 
Nabep analytics presentation
Nabep analytics presentationNabep analytics presentation
Nabep analytics presentationaarongblack1
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJuliosarahdijulio
 
Giovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenGiovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenBigDataExpo
 
DataScienceSummit2016
DataScienceSummit2016DataScienceSummit2016
DataScienceSummit2016Paolo Massimi
 
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @RasaTheFamily
 
Start Thinking Like a Data Scientist
Start Thinking Like a Data ScientistStart Thinking Like a Data Scientist
Start Thinking Like a Data ScientistAmanMehta47
 
Making fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learningMaking fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learningBrad Klingenberg
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupDavid Johnston
 
Managing Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoBig Data Spain
 
Design Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoopDesign Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoopIntuit Inc.
 

Tendances (14)

TDAmeritrade Holiday Spending and Behavioral Econ
TDAmeritrade Holiday Spending and Behavioral EconTDAmeritrade Holiday Spending and Behavioral Econ
TDAmeritrade Holiday Spending and Behavioral Econ
 
How to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data ScientistHow to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data Scientist
 
Nabep analytics presentation
Nabep analytics presentationNabep analytics presentation
Nabep analytics presentation
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio
 
Giovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenGiovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDriven
 
DataScienceSummit2016
DataScienceSummit2016DataScienceSummit2016
DataScienceSummit2016
 
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
 
Start Thinking Like a Data Scientist
Start Thinking Like a Data ScientistStart Thinking Like a Data Scientist
Start Thinking Like a Data Scientist
 
Making fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learningMaking fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learning
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
 
Idea generation
Idea generationIdea generation
Idea generation
 
Managing Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez Rego
 
Design Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoopDesign Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoop
 
Essentials op3
Essentials op3Essentials op3
Essentials op3
 

En vedette

Computing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphComputing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphVitaly Gordon
 
Building Data Products
Building Data ProductsBuilding Data Products
Building Data ProductsCloudera, Inc.
 
LinkedIn Data Products
LinkedIn Data ProductsLinkedIn Data Products
LinkedIn Data ProductsVitaly Gordon
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data ProductsPeter Skomoroch
 
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Vitaly Gordon
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 

En vedette (7)

Computing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphComputing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic Graph
 
Building Data Products
Building Data ProductsBuilding Data Products
Building Data Products
 
LinkedIn Data Products
LinkedIn Data ProductsLinkedIn Data Products
LinkedIn Data Products
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
 
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 

Similaire à Big data meetup

Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data ScienceMandar Parikh
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewAnidata
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Onlinesfdatascience
 
Fundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineFundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineDan Meyer
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)Lakshmi Prasanna
 
CYCLES Course (2): Alignment
CYCLES Course (2): AlignmentCYCLES Course (2): Alignment
CYCLES Course (2): AlignmentBryan Cassady
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceLivePerson
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
 
Ala virtual july2012
Ala virtual july2012Ala virtual july2012
Ala virtual july2012Stephen Abram
 
How to be a Good Machine Learning PM by Google Product Manager
How to be a Good Machine Learning PM by Google Product ManagerHow to be a Good Machine Learning PM by Google Product Manager
How to be a Good Machine Learning PM by Google Product ManagerProduct School
 
Digital analytics lecture1
Digital analytics lecture1Digital analytics lecture1
Digital analytics lecture1Joni Salminen
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceAnnie Flippo
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist? BICC Thomas More
 
Large language models in higher education
Large language models in higher educationLarge language models in higher education
Large language models in higher educationPeter Trkman
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
How Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask GoogleHow Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask Googleprateek kumar
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Watershed
 

Similaire à Big data meetup (20)

Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical Interview
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
 
Fundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineFundamentals of Data Analytics Outline
Fundamentals of Data Analytics Outline
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)
 
CYCLES Course (2): Alignment
CYCLES Course (2): AlignmentCYCLES Course (2): Alignment
CYCLES Course (2): Alignment
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
Ala virtual july2012
Ala virtual july2012Ala virtual july2012
Ala virtual july2012
 
How to be a Good Machine Learning PM by Google Product Manager
How to be a Good Machine Learning PM by Google Product ManagerHow to be a Good Machine Learning PM by Google Product Manager
How to be a Good Machine Learning PM by Google Product Manager
 
Oclc cla2012 abram
Oclc cla2012 abramOclc cla2012 abram
Oclc cla2012 abram
 
Digital analytics lecture1
Digital analytics lecture1Digital analytics lecture1
Digital analytics lecture1
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
 
Large language models in higher education
Large language models in higher educationLarge language models in higher education
Large language models in higher education
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
How Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask GoogleHow Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask Google
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
 
Saoug
SaougSaoug
Saoug
 

Dernier

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Big data meetup

  • 1. Data Science Data Meetup Jan. 12
  • 2. What is data science? Besides a reason to have beer and pizza…
  • 3.
  • 4.
  • 5. What does the literature say?
  • 6. Hacking “Good data scientists understand, in a deep way, that the heavy lifting of cleanup and preparation isn’t something that gets in the way of solving the problem… it is the problem” DJ Patil bash/awk/sed
  • 7. Statistics What’s the probability that 2 people in the front 2 rows share a birthday? 1. ~10% 2. ~20% 3. ~50% 4. ~90% What’s the probability that a 99% accurate test diagnosed a 1/1000 disease? 1. ~10% 2. ~50% 3. ~90% 4. ~99%
  • 9. Intelligence Cookbook Just follow the steps
  • 10. The Recipe First, make it valuable. Then, make it possible. Then, make it beautiful. Then, make it smart.
  • 12. Make it valuable Find a KPI that is correlated to bottom line revenue e.g. number of products the visitor browses through
  • 13. Make it possible Develop the simplest heuristic e.g. show the visitor one of the top 10 selling products
  • 14. Make it beautiful Create a method to quickly test new algorithms against old ones e.g. create a framework that split tests two models and reports which one is better
  • 15. Make it smart Figure out in what field your problem is and choose an off the shelf algorithm e.g. recognize that the problem is product recommendation and use collaborative filtering
  • 16. Common ML problems • Supervised learning • Classification • Regression • Anomaly detection • Unsupervised learning • Clustering • Separation • Recommendation • Feature based recommendation • Collaborative filtering • Search • Indexing • Ranking
  • 17. To sum it all up Real data science is hard but … Real data science is the last step in data science, not the first and besides … The most important thing in data science is the business, not the science