SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
DATA SCIENCE
ECOSYSTEM
M. TAMER ÖZSU NANCY REID RAYMOND NG
U. WATERLOO U. TORONTO UBC
Canadian Data Science Workshop
DATA SCIENCE/BIG DATA IN THE NEWS…
2
Canadian Data Science Workshop
DATA SCIENCE EVERYWHERE!...
3
Canadian Data Science Workshop
DATA SCIENCE EVERYWHERE!...
3
Canadian Data Science Workshop
DATA SCIENCE EVERYWHERE!...
3
Canadian Data Science Workshop
DATA SCIENCE VOCABULARY
4
Canadian Data Science Workshop
WHAT IS DATA SCIENCE?
5
Canadian Data Science Workshop
WHAT IS DATA SCIENCE?
• “Data science, also known as data-driven science, is an
interdisciplinary field of scientific methods, processes,
algorithms and systems to extract knowledge or insights
from data in various forms, either structured or
unstructured, similar to data mining.”
5
Canadian Data Science Workshop
WHAT IS DATA SCIENCE?
• “Data science, also known as data-driven science, is an
interdisciplinary field of scientific methods, processes,
algorithms and systems to extract knowledge or insights
from data in various forms, either structured or
unstructured, similar to data mining.”
• “Data science intends to analyze and understand actual
phenomena with ‘data’. In other words, the aim of data science
is to reveal the features or the hidden structure of complicated
natural, human, and social phenomena with data from a
different point of view from the established or traditional theory
and method.”
5
Canadian Data Science Workshop
WHAT IS DATA SCIENCE?
• Fourth paradigm
• “… change of all sciences moving from observational, to
theoretical, to computational and now to the 4th Paradigm –
Data-Intensive Scientific Discovery”
6
Canadian Data Science Workshop
WHAT IS IMPORTANT?
Need to solve a real problem using data…
No applications, no data science.
7
Canadian Data Science Workshop
DATA SCIENCE AS A UNIFIER
8
Data
Science
Humanities
Machine/
Statistical
Learning
Application
Domain
Expertise
Visualization
Mathematical
Optimization
Social
Science
Law
Data
Management
Canadian Data Science Workshop
DATA SCIENCE AND BIG DATA
• They are not the “same thing”
• Big data = crude oil
• Big data is about extracting “crude oil”, transporting it in “mega tankers”,
siphoning it through “pipelines”, and storing it in “massive silos”
• Data science is about refining the “crude oil”
Carlos Samohano
Founder, Data Science London
9
Canadian Data Science Workshop
DATA SCIENCE AND ARTIFICIAL INTELLIGENCE
Data
Science
Artificial
Intelligence
ML/DM/
Analytics
10
Canadian Data Science Workshop
DATA SCIENCE AND ARTIFICIAL INTELLIGENCE
Data
Science
Artificial
Intelligence
ML/DM/
Analytics
10
“Data science produces insights.
Machine learning produces predictions”
Canadian Data Science Workshop
DATA SCIENCE APPLICATION EXAMPLES
• Fraud detection
• Investigate fraud patterns in past data
• Early detection is important
• Before damage propagates
• Harder than late detection
• Precision is important
• False positive and false negative are both
bad
• Real-time analytics
11
Canadian Data Science Workshop
DATA SCIENCE APPLICATION EXAMPLES
• Recommender systems
• The ability to offer unique
personalized service
• Increase sales, click-through rates,
conversions, …
• Netflix recommender system valued at
$1B per year
• Amazon recommender system drives a
20-35% lift in sales annually
• Collaborative filtering at scale
12
Canadian Data Science Workshop
DATA SCIENCE APPLICATION EXAMPLES
• Predicting why patients are being
readmitted
• Reduce costs
• Improve population health
• Find the “why” behind specific
populations being readmitted
• Data lakes of multiple data sources
• Investigate ties between readmission and
socioeconomic data points, patient
history, genetics, …
13
Canadian Data Science Workshop
DATA SCIENCE APPLICATION EXAMPLES
• “Smart cities”
• Not well-defined
14
Canadian Data Science Workshop
DATA SCIENCE APPLICATION EXAMPLES
• “Smart cities”
• Not well-defined
14
Canadian Data Science Workshop
DATA SCIENCE APPLICATION EXAMPLES
• “Smart cities”
• Not well-defined
• Generally refers to using data and
ICT to
• Better plan communities
• Better manage assets
• Reduce costs
• Deploy open data to better engage
with community
14
Canadian Data Science Workshop
DATA SCIENCE APPLICATION EXAMPLES
• Moneyball
• How to build a baseball team on a very
low budget by relying on data
• Sabermetrics: the statistical analysis of
baseball data to objectively evaluate
performance
• 2002 record of 103-59 was joint best in
MLB
• Team salary budget: $40 million
• Other team: Yankees
• Team salary budget: $120 million
15
Canadian Data Science Workshop
HOLISTIC APPROACH TO DATA SCIENCE
Dissemination &
Visualization
Ethics, Policy & Social Impact
Core
Data
Acquisition
Data
Preservation
16
Modeling &
Analysis
Management of
Big Data
Making Data
Trustable &
Usable
Data Security & Privacy
Application
Application
Application
Application
Canadian Data Science Workshop
CORE RESEARCH ISSUES & INTERACTIONS
Making Data
Trustable &
Usable
Modelling &
Analysis
Data
Visualization &
Dissemination
Big Data
Management
17
Canadian Data Science Workshop
CORE RESEARCH ISSUES & INTERACTIONS
Making Data
Trustable &
Usable
Modelling &
Analysis
Data
Visualization &
Dissemination
Big Data
Management
• Data cleaning
• Sampling
• Data provenance
17
Canadian Data Science Workshop
CORE RESEARCH ISSUES & INTERACTIONS
Making Data
Trustable &
Usable
Modelling &
Analysis
Data
Visualization &
Dissemination
Big Data
Management
• Data cleaning
• Sampling
• Data provenance
• Data lakes
• Batch & online access
• Platforms
17
Canadian Data Science Workshop
CORE RESEARCH ISSUES & INTERACTIONS
Making Data
Trustable &
Usable
Modelling &
Analysis
Data
Visualization &
Dissemination
Big Data
Management
• Data cleaning
• Sampling
• Data provenance
• Data lakes
• Batch & online access
• Platforms
• Models & methods for data
lakes
• Unsupervised
classification & AI
17
Canadian Data Science Workshop
CORE RESEARCH ISSUES & INTERACTIONS
Making Data
Trustable &
Usable
Modelling &
Analysis
Data
Visualization &
Dissemination
Big Data
Management
• Data cleaning
• Sampling
• Data provenance
• Data lakes
• Batch & online access
• Platforms
• Models & methods for data
lakes
• Unsupervised
classification & AI
• Visualization for wider
audience
• Visualization for data
exploration
• Open data technologies
17
Canadian Data Science Workshop
CORE RESEARCH ISSUES & INTERACTIONS
Making Data
Trustable &
Usable
Modelling &
Analysis
Data
Visualization &
Dissemination
Big Data
Management
• Data cleaning
• Sampling
• Data provenance
• Data lakes
• Batch & online access
• Platforms
• Models & methods for data
lakes
• Unsupervised
classification & AI
• Visualization for wider
audience
• Visualization for data
exploration
• Open data technologies
• DM support for
provenance
• Data preparation for big
data management
• Cleaning for data
analysis
• DM for ML
• ML for DM
• Visual analytics
…
17

Contenu connexe

Similaire à Data Science Presentation.pdf

Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
The Evolution of Data and New Opportunities for Analytics
The Evolution of Data and New Opportunities for AnalyticsThe Evolution of Data and New Opportunities for Analytics
The Evolution of Data and New Opportunities for AnalyticsSAS Canada
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptxAkhirulAminulloh2
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 
Dia sds2015 web version
Dia sds2015 web versionDia sds2015 web version
Dia sds2015 web versionMichael Brodie
 
The Most Effective Method For Selecting Data Science Projects
The Most Effective Method For Selecting Data Science ProjectsThe Most Effective Method For Selecting Data Science Projects
The Most Effective Method For Selecting Data Science ProjectsGramener
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxssuser1a4f0f
 
Keynote: GraphTour Toronto
Keynote: GraphTour TorontoKeynote: GraphTour Toronto
Keynote: GraphTour TorontoNeo4j
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxwahiba ben abdessalem
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Edwin S. Garcia
 
Rethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubRethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubCloudera, Inc.
 

Similaire à Data Science Presentation.pdf (20)

Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Data mining
Data miningData mining
Data mining
 
The Evolution of Data and New Opportunities for Analytics
The Evolution of Data and New Opportunities for AnalyticsThe Evolution of Data and New Opportunities for Analytics
The Evolution of Data and New Opportunities for Analytics
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Oracle openworld-presentation
Oracle openworld-presentationOracle openworld-presentation
Oracle openworld-presentation
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Dia sds2015 web version
Dia sds2015 web versionDia sds2015 web version
Dia sds2015 web version
 
The Most Effective Method For Selecting Data Science Projects
The Most Effective Method For Selecting Data Science ProjectsThe Most Effective Method For Selecting Data Science Projects
The Most Effective Method For Selecting Data Science Projects
 
Baker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated AudiencesBaker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated Audiences
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Keynote: GraphTour Toronto
Keynote: GraphTour TorontoKeynote: GraphTour Toronto
Keynote: GraphTour Toronto
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019
 
Rethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubRethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data Hub
 

Dernier

Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdftbatkhuu1
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 

Dernier (20)

Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdf
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 

Data Science Presentation.pdf

  • 1. DATA SCIENCE ECOSYSTEM M. TAMER ÖZSU NANCY REID RAYMOND NG U. WATERLOO U. TORONTO UBC
  • 2. Canadian Data Science Workshop DATA SCIENCE/BIG DATA IN THE NEWS… 2
  • 3. Canadian Data Science Workshop DATA SCIENCE EVERYWHERE!... 3
  • 4. Canadian Data Science Workshop DATA SCIENCE EVERYWHERE!... 3
  • 5. Canadian Data Science Workshop DATA SCIENCE EVERYWHERE!... 3
  • 6. Canadian Data Science Workshop DATA SCIENCE VOCABULARY 4
  • 7. Canadian Data Science Workshop WHAT IS DATA SCIENCE? 5
  • 8. Canadian Data Science Workshop WHAT IS DATA SCIENCE? • “Data science, also known as data-driven science, is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.” 5
  • 9. Canadian Data Science Workshop WHAT IS DATA SCIENCE? • “Data science, also known as data-driven science, is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.” • “Data science intends to analyze and understand actual phenomena with ‘data’. In other words, the aim of data science is to reveal the features or the hidden structure of complicated natural, human, and social phenomena with data from a different point of view from the established or traditional theory and method.” 5
  • 10. Canadian Data Science Workshop WHAT IS DATA SCIENCE? • Fourth paradigm • “… change of all sciences moving from observational, to theoretical, to computational and now to the 4th Paradigm – Data-Intensive Scientific Discovery” 6
  • 11. Canadian Data Science Workshop WHAT IS IMPORTANT? Need to solve a real problem using data… No applications, no data science. 7
  • 12. Canadian Data Science Workshop DATA SCIENCE AS A UNIFIER 8 Data Science Humanities Machine/ Statistical Learning Application Domain Expertise Visualization Mathematical Optimization Social Science Law Data Management
  • 13. Canadian Data Science Workshop DATA SCIENCE AND BIG DATA • They are not the “same thing” • Big data = crude oil • Big data is about extracting “crude oil”, transporting it in “mega tankers”, siphoning it through “pipelines”, and storing it in “massive silos” • Data science is about refining the “crude oil” Carlos Samohano Founder, Data Science London 9
  • 14. Canadian Data Science Workshop DATA SCIENCE AND ARTIFICIAL INTELLIGENCE Data Science Artificial Intelligence ML/DM/ Analytics 10
  • 15. Canadian Data Science Workshop DATA SCIENCE AND ARTIFICIAL INTELLIGENCE Data Science Artificial Intelligence ML/DM/ Analytics 10 “Data science produces insights. Machine learning produces predictions”
  • 16. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • Fraud detection • Investigate fraud patterns in past data • Early detection is important • Before damage propagates • Harder than late detection • Precision is important • False positive and false negative are both bad • Real-time analytics 11
  • 17. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • Recommender systems • The ability to offer unique personalized service • Increase sales, click-through rates, conversions, … • Netflix recommender system valued at $1B per year • Amazon recommender system drives a 20-35% lift in sales annually • Collaborative filtering at scale 12
  • 18. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • Predicting why patients are being readmitted • Reduce costs • Improve population health • Find the “why” behind specific populations being readmitted • Data lakes of multiple data sources • Investigate ties between readmission and socioeconomic data points, patient history, genetics, … 13
  • 19. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • “Smart cities” • Not well-defined 14
  • 20. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • “Smart cities” • Not well-defined 14
  • 21. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • “Smart cities” • Not well-defined • Generally refers to using data and ICT to • Better plan communities • Better manage assets • Reduce costs • Deploy open data to better engage with community 14
  • 22. Canadian Data Science Workshop DATA SCIENCE APPLICATION EXAMPLES • Moneyball • How to build a baseball team on a very low budget by relying on data • Sabermetrics: the statistical analysis of baseball data to objectively evaluate performance • 2002 record of 103-59 was joint best in MLB • Team salary budget: $40 million • Other team: Yankees • Team salary budget: $120 million 15
  • 23. Canadian Data Science Workshop HOLISTIC APPROACH TO DATA SCIENCE Dissemination & Visualization Ethics, Policy & Social Impact Core Data Acquisition Data Preservation 16 Modeling & Analysis Management of Big Data Making Data Trustable & Usable Data Security & Privacy Application Application Application Application
  • 24. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management 17
  • 25. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management • Data cleaning • Sampling • Data provenance 17
  • 26. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management • Data cleaning • Sampling • Data provenance • Data lakes • Batch & online access • Platforms 17
  • 27. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management • Data cleaning • Sampling • Data provenance • Data lakes • Batch & online access • Platforms • Models & methods for data lakes • Unsupervised classification & AI 17
  • 28. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management • Data cleaning • Sampling • Data provenance • Data lakes • Batch & online access • Platforms • Models & methods for data lakes • Unsupervised classification & AI • Visualization for wider audience • Visualization for data exploration • Open data technologies 17
  • 29. Canadian Data Science Workshop CORE RESEARCH ISSUES & INTERACTIONS Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management • Data cleaning • Sampling • Data provenance • Data lakes • Batch & online access • Platforms • Models & methods for data lakes • Unsupervised classification & AI • Visualization for wider audience • Visualization for data exploration • Open data technologies • DM support for provenance • Data preparation for big data management • Cleaning for data analysis • DM for ML • ML for DM • Visual analytics … 17