SlideShare une entreprise Scribd logo
1  sur  14
THE PYTHON ECOSYSTEM FOR DATA
SCIENCE - LANDSCAPE OVERVIEW
Ananth Krishnamoorthy, Ph.D.
Outline Slides for Talk at Fifth Elephant 2017
25-Apr-2017
Summary
• In their day-to-day jobs, data science teams and data scientists face challenges in
many overlapping yet distinct areas such as Reporting, Data Processing &
Storage, Scientific Computing, ML Modelling, Application Development. To
succeed, Data science teams, especially small ones, need a deep appreciation of
these dependencies on their success.
• Python ecosystem for data science has a number of tools and libraries for various
aspects of data science, including Machine Learning, Cluster Computing,
Scientific Computing, etc.
• The idea of this talk is to understand what the Python data science ecosystem
offers (so that you don't reinvent it), what are some common gaps (so that you
don't go blue looking for answers).
• In this talk, we describe how different tools/libraries fit in the machine learning
model development and deployment workflow . This talk is about how these
different tools work (and don’t work) together with each other. It is intended as a
landscape survey of the python data science ecosystem, along with a mention of
some common gaps that practitioners may notice as they put together a stack
and/or an application for their company.
The most important trait of the Analytics 3.0 era is that not only online firms, but virtually any type of firm
in any industry, can participate in the data economy. Banks, industrial manufacturers, health care
providers, retailers—any company in any industry that is willing to exploit the possibilities—can all
develop data-based offerings for customers, as well as support internal decisions with big data.
Analytics 1.0 Analytics 2.0 Analytics 3.0
Data  Enterprise Data
 Structured transactional data
 Bring in web and social data
 Complex, large,
semistructured data sources
 GPS, Mobile Device, Clickstream,
Sensor data
 Unstructured, real time, streaming
Tools  Spreadsheets
 BI, OLAP
 ETL
 On-premise servers
 Visualization
 NoSQL
 Hadoop
 Machine Learning , Artificial
Intelligence
 On-Demand Everything
 Analytical Apps
 Integrated, Embedded models
Activity  Majority of analytical activity
was descriptive analytics, or
reporting
 Creating analytical models
was a time-consuming
“batch” process
 Visual analytics dominates
predictive and prescriptive
techniques
 Develop products, not
PowerPoints or reports
 Analytics integral to running the
business, strategic asset
 Rapid and agile insight delivery
 Analytical tools available at point of
decision
Source: THE RISE OF ANALYTICS 3.0, By Thomas H. Davenport, IIA, 2013
Evolving Role of Data Science Teams
Machine Learning vs Real World Data
Science
Machine Learning
Deployment
Application Development
Big Data Processing
Data Storage
ETL
Challenges faced by Data Science Teams
• Requires many more competencies than can be reasonably expected
from one person
• Challenges are greater for smaller teams and smaller companies, e.g.
startups
• Challenges create dependencies on other teams e.g. Development
• Dependencies slow down execution and benefits realization
Plethora of Choices
Reporting
Data
Processing
& Storage
Scientific
Computing
ML
Modelling
Application
Development
SQL
NoSQL
Graphdb
OLAP
ETL
Cluster
Computing
Stream
Processing
SQL
Charting
Statistics
Cloud
Front End
Microservices
Back End
ML
Deep Learning
Dim. Reduction
Signal
Processing
Optimization
Time Series
Analysis
Simulation
MapReduce
Data Science Workflow
ETL Process ModelStore Deploy
DATA SCIENTIST SKILLS
Infrastructure and Provisioning ???
Python Ecosystem
ETL Process ModelStore Deploy
Odo Blaze Pandas
Dask
Spark
Sklearn_Pandas
Scikit-learn
Keras
Spark MLlib
Bokeh
Jupyter
Review of Key Tools
(50% of talk time spent here, more slides to be added)
• Jupyter
• Pandas
• Scikit-Learn
• Keras / TensorFlow / Theano
• Matplotlib/Bokeh
• Blaze
• Odo
• Dask
• pySpark
We shall see some code snippets here, to
illustrate a few ideas
The idea is to know enough to pick the right
components for the job at hand
Use Case 1: Small Data
This use case will illustrate case of Small
Data i.e. Desktop / In-memory processing
Use Case 2: ‘Medium’ Data
This use case will illustrate case of Medium
Data with Out-of-core processing
Use Case 3: Big Data
This use case will illustrate case of Big Data
i.e cluster computing
What Works
• Sklearn’s Consistent API, wide variety of ML algorithms
• Sklearn Pipelines
• Scikit-Keras Integration
• Pandas for Data Analysis
• ….
• ….
Gaps – A Practitioner Perspective
• Uniform API Across Activities
• Separation of Data, Processing, and Instructions
• Single Data Structure Paradigm
• Support for in-memory, out-of-core, and distributed computing in same
paradigm e.g. SFrame
• ETL
• Push heavy lifting to backend systems
• Monitoring workflows
• Application development
• Bokeh
• Deployment
• Application
• Web Services

Contenu connexe

Tendances

Realizations of discrete time systems 1 unit
Realizations of discrete time systems 1 unitRealizations of discrete time systems 1 unit
Realizations of discrete time systems 1 unitHIMANSHU DIWAKAR
 
Stepper motor control
Stepper motor controlStepper motor control
Stepper motor controlJatin Arora
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsAmr E. Mohamed
 
Active noise control
Active noise controlActive noise control
Active noise controlRishikesh .
 
Intro to mechanical vibrations
Intro to mechanical vibrationsIntro to mechanical vibrations
Intro to mechanical vibrationsFizah Amer
 
4 matched filters and ambiguity functions for radar signals
4 matched filters and ambiguity functions for radar signals4 matched filters and ambiguity functions for radar signals
4 matched filters and ambiguity functions for radar signalsSolo Hermelin
 
Introduction to DSP.ppt
Introduction to DSP.pptIntroduction to DSP.ppt
Introduction to DSP.pptDr.YNM
 
Self Power Generating Electrical Bicycle
Self Power Generating Electrical BicycleSelf Power Generating Electrical Bicycle
Self Power Generating Electrical BicycleIRJET Journal
 
Unit v linear induction motor
Unit v linear induction motorUnit v linear induction motor
Unit v linear induction motorEr.Meraj Akhtar
 
Synchrophasor Fundamentals: from Computation to Implementation
Synchrophasor Fundamentals: from Computation to ImplementationSynchrophasor Fundamentals: from Computation to Implementation
Synchrophasor Fundamentals: from Computation to ImplementationPower System Operation
 
4. Analogy between electrical and mechanical systems.pptx
4. Analogy between electrical and mechanical systems.pptx4. Analogy between electrical and mechanical systems.pptx
4. Analogy between electrical and mechanical systems.pptxAMSuryawanshi
 
Applications of Z transform
Applications of Z transformApplications of Z transform
Applications of Z transformAakankshaR
 
Introduction to Digital Signal Processing (DSP)
Introduction  to  Digital Signal Processing (DSP)Introduction  to  Digital Signal Processing (DSP)
Introduction to Digital Signal Processing (DSP)Md. Arif Hossain
 

Tendances (20)

Realizations of discrete time systems 1 unit
Realizations of discrete time systems 1 unitRealizations of discrete time systems 1 unit
Realizations of discrete time systems 1 unit
 
Stepper motor control
Stepper motor controlStepper motor control
Stepper motor control
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
 
Power screws
Power screwsPower screws
Power screws
 
Active noise control
Active noise controlActive noise control
Active noise control
 
Chapter 4 dc machine [autosaved]
Chapter 4   dc machine [autosaved]Chapter 4   dc machine [autosaved]
Chapter 4 dc machine [autosaved]
 
Intro to mechanical vibrations
Intro to mechanical vibrationsIntro to mechanical vibrations
Intro to mechanical vibrations
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
4 matched filters and ambiguity functions for radar signals
4 matched filters and ambiguity functions for radar signals4 matched filters and ambiguity functions for radar signals
4 matched filters and ambiguity functions for radar signals
 
Introduction to DSP.ppt
Introduction to DSP.pptIntroduction to DSP.ppt
Introduction to DSP.ppt
 
Self Power Generating Electrical Bicycle
Self Power Generating Electrical BicycleSelf Power Generating Electrical Bicycle
Self Power Generating Electrical Bicycle
 
Unit v linear induction motor
Unit v linear induction motorUnit v linear induction motor
Unit v linear induction motor
 
Synchrophasor Fundamentals: from Computation to Implementation
Synchrophasor Fundamentals: from Computation to ImplementationSynchrophasor Fundamentals: from Computation to Implementation
Synchrophasor Fundamentals: from Computation to Implementation
 
4. Analogy between electrical and mechanical systems.pptx
4. Analogy between electrical and mechanical systems.pptx4. Analogy between electrical and mechanical systems.pptx
4. Analogy between electrical and mechanical systems.pptx
 
convolution
convolutionconvolution
convolution
 
Z transform
 Z transform Z transform
Z transform
 
Electrical machine.pdf
Electrical machine.pdfElectrical machine.pdf
Electrical machine.pdf
 
Applications of Z transform
Applications of Z transformApplications of Z transform
Applications of Z transform
 
Dc servo motor
Dc servo motorDc servo motor
Dc servo motor
 
Introduction to Digital Signal Processing (DSP)
Introduction  to  Digital Signal Processing (DSP)Introduction  to  Digital Signal Processing (DSP)
Introduction to Digital Signal Processing (DSP)
 

Similaire à The Python ecosystem for data science - Landscape Overview

Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Ali Alkan
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Building Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python ExpertiseBuilding Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python Expertiseriyak40
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4Ferdin Joe John Joseph PhD
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceFerdin Joe John Joseph PhD
 
Cognitive Computing - A Primer
Cognitive Computing - A PrimerCognitive Computing - A Primer
Cognitive Computing - A PrimerMarlabs
 
Big data and you
Big data and you Big data and you
Big data and you IBM
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsxSangeetaTripathi8
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxOTA13NayabNakhwa
 

Similaire à The Python ecosystem for data science - Landscape Overview (20)

Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Big Data
Big DataBig Data
Big Data
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Building Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python ExpertiseBuilding Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python Expertise
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
Python para Manual de Ciência de Dados
Python para Manual de Ciência de DadosPython para Manual de Ciência de Dados
Python para Manual de Ciência de Dados
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
 
Cognitive Computing - A Primer
Cognitive Computing - A PrimerCognitive Computing - A Primer
Cognitive Computing - A Primer
 
Big data and you
Big data and you Big data and you
Big data and you
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
 
On Big Data
On Big DataOn Big Data
On Big Data
 

Dernier

Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 

Dernier (20)

Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 

The Python ecosystem for data science - Landscape Overview

  • 1. THE PYTHON ECOSYSTEM FOR DATA SCIENCE - LANDSCAPE OVERVIEW Ananth Krishnamoorthy, Ph.D. Outline Slides for Talk at Fifth Elephant 2017 25-Apr-2017
  • 2. Summary • In their day-to-day jobs, data science teams and data scientists face challenges in many overlapping yet distinct areas such as Reporting, Data Processing & Storage, Scientific Computing, ML Modelling, Application Development. To succeed, Data science teams, especially small ones, need a deep appreciation of these dependencies on their success. • Python ecosystem for data science has a number of tools and libraries for various aspects of data science, including Machine Learning, Cluster Computing, Scientific Computing, etc. • The idea of this talk is to understand what the Python data science ecosystem offers (so that you don't reinvent it), what are some common gaps (so that you don't go blue looking for answers). • In this talk, we describe how different tools/libraries fit in the machine learning model development and deployment workflow . This talk is about how these different tools work (and don’t work) together with each other. It is intended as a landscape survey of the python data science ecosystem, along with a mention of some common gaps that practitioners may notice as they put together a stack and/or an application for their company.
  • 3. The most important trait of the Analytics 3.0 era is that not only online firms, but virtually any type of firm in any industry, can participate in the data economy. Banks, industrial manufacturers, health care providers, retailers—any company in any industry that is willing to exploit the possibilities—can all develop data-based offerings for customers, as well as support internal decisions with big data. Analytics 1.0 Analytics 2.0 Analytics 3.0 Data  Enterprise Data  Structured transactional data  Bring in web and social data  Complex, large, semistructured data sources  GPS, Mobile Device, Clickstream, Sensor data  Unstructured, real time, streaming Tools  Spreadsheets  BI, OLAP  ETL  On-premise servers  Visualization  NoSQL  Hadoop  Machine Learning , Artificial Intelligence  On-Demand Everything  Analytical Apps  Integrated, Embedded models Activity  Majority of analytical activity was descriptive analytics, or reporting  Creating analytical models was a time-consuming “batch” process  Visual analytics dominates predictive and prescriptive techniques  Develop products, not PowerPoints or reports  Analytics integral to running the business, strategic asset  Rapid and agile insight delivery  Analytical tools available at point of decision Source: THE RISE OF ANALYTICS 3.0, By Thomas H. Davenport, IIA, 2013 Evolving Role of Data Science Teams
  • 4. Machine Learning vs Real World Data Science Machine Learning Deployment Application Development Big Data Processing Data Storage ETL
  • 5. Challenges faced by Data Science Teams • Requires many more competencies than can be reasonably expected from one person • Challenges are greater for smaller teams and smaller companies, e.g. startups • Challenges create dependencies on other teams e.g. Development • Dependencies slow down execution and benefits realization
  • 6. Plethora of Choices Reporting Data Processing & Storage Scientific Computing ML Modelling Application Development SQL NoSQL Graphdb OLAP ETL Cluster Computing Stream Processing SQL Charting Statistics Cloud Front End Microservices Back End ML Deep Learning Dim. Reduction Signal Processing Optimization Time Series Analysis Simulation MapReduce
  • 7. Data Science Workflow ETL Process ModelStore Deploy DATA SCIENTIST SKILLS Infrastructure and Provisioning ???
  • 8. Python Ecosystem ETL Process ModelStore Deploy Odo Blaze Pandas Dask Spark Sklearn_Pandas Scikit-learn Keras Spark MLlib Bokeh Jupyter
  • 9. Review of Key Tools (50% of talk time spent here, more slides to be added) • Jupyter • Pandas • Scikit-Learn • Keras / TensorFlow / Theano • Matplotlib/Bokeh • Blaze • Odo • Dask • pySpark We shall see some code snippets here, to illustrate a few ideas The idea is to know enough to pick the right components for the job at hand
  • 10. Use Case 1: Small Data This use case will illustrate case of Small Data i.e. Desktop / In-memory processing
  • 11. Use Case 2: ‘Medium’ Data This use case will illustrate case of Medium Data with Out-of-core processing
  • 12. Use Case 3: Big Data This use case will illustrate case of Big Data i.e cluster computing
  • 13. What Works • Sklearn’s Consistent API, wide variety of ML algorithms • Sklearn Pipelines • Scikit-Keras Integration • Pandas for Data Analysis • …. • ….
  • 14. Gaps – A Practitioner Perspective • Uniform API Across Activities • Separation of Data, Processing, and Instructions • Single Data Structure Paradigm • Support for in-memory, out-of-core, and distributed computing in same paradigm e.g. SFrame • ETL • Push heavy lifting to backend systems • Monitoring workflows • Application development • Bokeh • Deployment • Application • Web Services

Notes de l'éditeur

  1. Slide needs improvement 