SlideShare a Scribd company logo
1 of 36
Download to read offline
Tips and tricks for data
science projects with Python
José Manuel Ortega
Python Developer
Jose Manuel Ortega
Software engineer,
Freelance
1. Introducing Python for machine learning projects
2. Stages of a machine learning project
3. Selecting the best python library for your project
for each stage
4. Python tools for deep learning in data science
projects
Introducing Python for machine learning projects
● Simple and consistent
● Understandable by humans
● General-purpose programming language
● Extensive selection of libraries and
frameworks
Introducing Python for machine learning projects
● Spam filters
● Recommendation systems
● Search engines
● Ppersonal assistants
● Fraud detection systems
Introducing Python for machine learning projects
● Machine learning ● Keras, TensorFlow, and
Scikit-learn
● High-performance
scientific computing
● Numpy, Scipy
● Computer vision ● OpenCV
● Data analysis ● Numpy, Pandas
● Natural language
processing
● NLTK, spaCy
Introducing Python for machine learning projects
Introducing Python for machine learning projects
Introducing Python for machine learning projects
● Reading/writing many different data formats
● Selecting subsets of data
● Calculating across rows and down columns
● Finding and filling missing data
● Applying operations to independent groups within the data
● Reshaping data into different forms
● Combing multiple datasets together
● Advanced time-series functionality
● Visualization through Matplotlib and Seaborn
Introducing Python for machine learning projects
Introducing Python for machine learning projects
import pandas as pd
import pandas_profiling
# read the dataset
data = pd.read_csv('your-data')
prof = pandas_profiling.ProfileReport(data)
prof.to_file(output_file='output.html')
Stages of a machine learning project
Stages of a machine learning project
Stages of a machine learning project
Python libraries
Python libraries
● Supervised and unsupervised machine learning
● Classification, regression, Support Vector Machine
● Clustering, Kmeans, DBSCAN
● Random Forest
Python libraries
● Pipelines
● Grid-search
● Validation curves
● One-hot encoding of categorial data
● Dataset generators
● Principal Component Analysis (PCA)
Python libraries
Pipelines
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.naive_bayes import MultinomialNB
>>> from sklearn.preprocessing import Binarizer
>>> make_pipeline(Binarizer(), MultinomialNB())
Pipeline(steps=[('binarizer', Binarizer()),
('multinomialnb', MultinomialNB())])
http://scikit-learn.org/stable/modules/pipeline.html
Python libraries
Grid-search
estimator.get_params()
A search consists of:
● an estimator (regressor or classifier such as
sklearn.svm.SVC())
● a parameter space
● a method for searching or sampling candidates
● a cross-validation scheme
● a score function
https://scikit-learn.org/stable/modules/grid_search.html#grid-search
Python libraries
Validation curves
https://scikit-learn.org/stable/modules/learning_curve.html
Python libraries
Validation curves
>>> train_scores, valid_scores = validation_curve(
... Ridge(), X, y, param_name="alpha", param_range=np.logspace(-7, 3, 3),
... cv=5)
>>> train_scores
array([[0.93..., 0.94..., 0.92..., 0.91..., 0.92...],
[0.93..., 0.94..., 0.92..., 0.91..., 0.92...],
[0.51..., 0.52..., 0.49..., 0.47..., 0.49...]])
>>> valid_scores
array([[0.90..., 0.84..., 0.94..., 0.96..., 0.93...],
[0.90..., 0.84..., 0.94..., 0.96..., 0.93...],
[0.46..., 0.25..., 0.50..., 0.49..., 0.52...]])
Python libraries
One-hot encoding
https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
# importing sklearn one hot encoding
from sklearn.preprocessing import
OneHotEncoder
# initializing one hot encoding
encoding = OneHotEncoder()
# applying one hot encoding in python
transformed_data =
encoding.fit_transform(data[['Status']])
# head
print(transformed_data.toarray())
Python libraries
Dataset generators
https://scikit-learn.org/stable/datasets/sample_generators.html
Python libraries
Principal Component Analysis (PCA)
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
Python libraries
Principal Component Analysis (PCA)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)
Python libraries
Python tools for deep learning
Python tools for deep learning
Python tools for deep learning
Python tools for deep learning
Python tools for deep learning
Python tools for deep learning
Python tools for deep learning
TensorFlow Keras Pytorch
API Level High and Low High Low
Architecture Not easy to use Simple, concise,
readable
Complex, less
readable
Speed Fast,
high-performance
Slow, low
performance
Fast,
high-performance
Trained
Models
Yes Yes Yes
Python tools for deep learning
● tight integration with NumPy – Use numpy.ndarray in Theano-compiled
functions.
● transparent use of a GPU – Perform data-intensive computations much faster
than on a CPU.
● efficient symbolic differentiation – Theano does your derivatives for
functions with one or many inputs.
● speed and stability optimizations – Get the right answer for log(1+x) even
when x is really tiny.
● dynamic C code generation – Evaluate expressions faster.
● extensive unit-testing and self-verification – Detect and diagnose many
types of error
Python tools for deep learning
● Synkhronos Extension to Theano for multi-GPU data
parallelism
● Theano-MPI Theano-MPI a distributed framework for training
models built in Theano based on data-parallelism.
● Platoon Multi-GPU mini-framework for Theano, single node.
● Elephas Distributed Deep Learning with Keras & Spark.
Tips and tricks for data
science projects with Python
@jmortegac
https://www.linkedin.com/in/jmortega1

More Related Content

Similar to Tips and tricks for data science projects with Python

Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Fwdays
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonMohammed Rafi
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
 
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?Viach Kakovskyi
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Lecture 4: Deep Learning Frameworks
Lecture 4: Deep Learning FrameworksLecture 4: Deep Learning Frameworks
Lecture 4: Deep Learning FrameworksMohamed Loey
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisBrendan Gregg
 
New Capabilities in the PyData Ecosystem
New Capabilities in the PyData EcosystemNew Capabilities in the PyData Ecosystem
New Capabilities in the PyData EcosystemTuri, Inc.
 
PPT6: Neuron Demo
PPT6: Neuron DemoPPT6: Neuron Demo
PPT6: Neuron Demoakira-ai
 
Introduction to-python
Introduction to-pythonIntroduction to-python
Introduction to-pythonAakashdata
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughgabriellekuruvilla
 
How to integrate python into a scala stack
How to integrate python into a scala stackHow to integrate python into a scala stack
How to integrate python into a scala stackFliptop
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientistsStitch Fix Algorithms
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Simplilearn
 
Python in the real world : from everyday applications to advanced robotics
Python in the real world : from everyday applications to advanced roboticsPython in the real world : from everyday applications to advanced robotics
Python in the real world : from everyday applications to advanced roboticsJivitesh Dhaliwal
 
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPragueSql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPragueLuis Beltran
 

Similar to Tips and tricks for data science projects with Python (20)

Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Pyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdfPyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdf
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Lecture 4: Deep Learning Frameworks
Lecture 4: Deep Learning FrameworksLecture 4: Deep Learning Frameworks
Lecture 4: Deep Learning Frameworks
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
New Capabilities in the PyData Ecosystem
New Capabilities in the PyData EcosystemNew Capabilities in the PyData Ecosystem
New Capabilities in the PyData Ecosystem
 
Automation tools: making things go... (March 2019)
Automation tools: making things go... (March 2019)Automation tools: making things go... (March 2019)
Automation tools: making things go... (March 2019)
 
PPT6: Neuron Demo
PPT6: Neuron DemoPPT6: Neuron Demo
PPT6: Neuron Demo
 
Introduction to-python
Introduction to-pythonIntroduction to-python
Introduction to-python
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
 
How to integrate python into a scala stack
How to integrate python into a scala stackHow to integrate python into a scala stack
How to integrate python into a scala stack
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
 
Python in the real world : from everyday applications to advanced robotics
Python in the real world : from everyday applications to advanced roboticsPython in the real world : from everyday applications to advanced robotics
Python in the real world : from everyday applications to advanced robotics
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPragueSql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
 

More from Jose Manuel Ortega Candel

Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdfAsegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdfJose Manuel Ortega Candel
 
PyGoat Analizando la seguridad en aplicaciones Django.pdf
PyGoat Analizando la seguridad en aplicaciones Django.pdfPyGoat Analizando la seguridad en aplicaciones Django.pdf
PyGoat Analizando la seguridad en aplicaciones Django.pdfJose Manuel Ortega Candel
 
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...Jose Manuel Ortega Candel
 
Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops Jose Manuel Ortega Candel
 
Evolution of security strategies in K8s environments.pdf
Evolution of security strategies in K8s environments.pdfEvolution of security strategies in K8s environments.pdf
Evolution of security strategies in K8s environments.pdfJose Manuel Ortega Candel
 
Implementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfImplementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfJose Manuel Ortega Candel
 
Seguridad en arquitecturas serverless y entornos cloud
Seguridad en arquitecturas serverless y entornos cloudSeguridad en arquitecturas serverless y entornos cloud
Seguridad en arquitecturas serverless y entornos cloudJose Manuel Ortega Candel
 
Construyendo arquitecturas zero trust sobre entornos cloud
Construyendo arquitecturas zero trust sobre entornos cloud Construyendo arquitecturas zero trust sobre entornos cloud
Construyendo arquitecturas zero trust sobre entornos cloud Jose Manuel Ortega Candel
 
Sharing secret keys in Docker containers and K8s
Sharing secret keys in Docker containers and K8sSharing secret keys in Docker containers and K8s
Sharing secret keys in Docker containers and K8sJose Manuel Ortega Candel
 
Python para equipos de ciberseguridad(pycones)
Python para equipos de ciberseguridad(pycones)Python para equipos de ciberseguridad(pycones)
Python para equipos de ciberseguridad(pycones)Jose Manuel Ortega Candel
 
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodanShodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodanJose Manuel Ortega Candel
 
ELK para analistas de seguridad y equipos Blue Team
ELK para analistas de seguridad y equipos Blue TeamELK para analistas de seguridad y equipos Blue Team
ELK para analistas de seguridad y equipos Blue TeamJose Manuel Ortega Candel
 
Monitoring and managing Containers using Open Source tools
Monitoring and managing Containers using Open Source toolsMonitoring and managing Containers using Open Source tools
Monitoring and managing Containers using Open Source toolsJose Manuel Ortega Candel
 
Python memory managment. Deeping in Garbage collector
Python memory managment. Deeping in Garbage collectorPython memory managment. Deeping in Garbage collector
Python memory managment. Deeping in Garbage collectorJose Manuel Ortega Candel
 
Machine Learning para proyectos de seguridad(Pycon)
Machine Learning para proyectos de seguridad(Pycon)Machine Learning para proyectos de seguridad(Pycon)
Machine Learning para proyectos de seguridad(Pycon)Jose Manuel Ortega Candel
 

More from Jose Manuel Ortega Candel (20)

Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdfAsegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
 
PyGoat Analizando la seguridad en aplicaciones Django.pdf
PyGoat Analizando la seguridad en aplicaciones Django.pdfPyGoat Analizando la seguridad en aplicaciones Django.pdf
PyGoat Analizando la seguridad en aplicaciones Django.pdf
 
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
 
Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops
 
Evolution of security strategies in K8s environments.pdf
Evolution of security strategies in K8s environments.pdfEvolution of security strategies in K8s environments.pdf
Evolution of security strategies in K8s environments.pdf
 
Implementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfImplementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdf
 
Computación distribuida usando Python
Computación distribuida usando PythonComputación distribuida usando Python
Computación distribuida usando Python
 
Seguridad en arquitecturas serverless y entornos cloud
Seguridad en arquitecturas serverless y entornos cloudSeguridad en arquitecturas serverless y entornos cloud
Seguridad en arquitecturas serverless y entornos cloud
 
Construyendo arquitecturas zero trust sobre entornos cloud
Construyendo arquitecturas zero trust sobre entornos cloud Construyendo arquitecturas zero trust sobre entornos cloud
Construyendo arquitecturas zero trust sobre entornos cloud
 
Sharing secret keys in Docker containers and K8s
Sharing secret keys in Docker containers and K8sSharing secret keys in Docker containers and K8s
Sharing secret keys in Docker containers and K8s
 
Implementing cert-manager in K8s
Implementing cert-manager in K8sImplementing cert-manager in K8s
Implementing cert-manager in K8s
 
Python para equipos de ciberseguridad(pycones)
Python para equipos de ciberseguridad(pycones)Python para equipos de ciberseguridad(pycones)
Python para equipos de ciberseguridad(pycones)
 
Python para equipos de ciberseguridad
Python para equipos de ciberseguridad Python para equipos de ciberseguridad
Python para equipos de ciberseguridad
 
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodanShodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
 
ELK para analistas de seguridad y equipos Blue Team
ELK para analistas de seguridad y equipos Blue TeamELK para analistas de seguridad y equipos Blue Team
ELK para analistas de seguridad y equipos Blue Team
 
Monitoring and managing Containers using Open Source tools
Monitoring and managing Containers using Open Source toolsMonitoring and managing Containers using Open Source tools
Monitoring and managing Containers using Open Source tools
 
Python Memory Management 101(Europython)
Python Memory Management 101(Europython)Python Memory Management 101(Europython)
Python Memory Management 101(Europython)
 
SecDevOps containers
SecDevOps containersSecDevOps containers
SecDevOps containers
 
Python memory managment. Deeping in Garbage collector
Python memory managment. Deeping in Garbage collectorPython memory managment. Deeping in Garbage collector
Python memory managment. Deeping in Garbage collector
 
Machine Learning para proyectos de seguridad(Pycon)
Machine Learning para proyectos de seguridad(Pycon)Machine Learning para proyectos de seguridad(Pycon)
Machine Learning para proyectos de seguridad(Pycon)
 

Recently uploaded

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 

Recently uploaded (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Tips and tricks for data science projects with Python

  • 1. Tips and tricks for data science projects with Python José Manuel Ortega Python Developer
  • 2. Jose Manuel Ortega Software engineer, Freelance
  • 3. 1. Introducing Python for machine learning projects 2. Stages of a machine learning project 3. Selecting the best python library for your project for each stage 4. Python tools for deep learning in data science projects
  • 4. Introducing Python for machine learning projects ● Simple and consistent ● Understandable by humans ● General-purpose programming language ● Extensive selection of libraries and frameworks
  • 5. Introducing Python for machine learning projects ● Spam filters ● Recommendation systems ● Search engines ● Ppersonal assistants ● Fraud detection systems
  • 6. Introducing Python for machine learning projects ● Machine learning ● Keras, TensorFlow, and Scikit-learn ● High-performance scientific computing ● Numpy, Scipy ● Computer vision ● OpenCV ● Data analysis ● Numpy, Pandas ● Natural language processing ● NLTK, spaCy
  • 7. Introducing Python for machine learning projects
  • 8. Introducing Python for machine learning projects
  • 9. Introducing Python for machine learning projects ● Reading/writing many different data formats ● Selecting subsets of data ● Calculating across rows and down columns ● Finding and filling missing data ● Applying operations to independent groups within the data ● Reshaping data into different forms ● Combing multiple datasets together ● Advanced time-series functionality ● Visualization through Matplotlib and Seaborn
  • 10. Introducing Python for machine learning projects
  • 11. Introducing Python for machine learning projects import pandas as pd import pandas_profiling # read the dataset data = pd.read_csv('your-data') prof = pandas_profiling.ProfileReport(data) prof.to_file(output_file='output.html')
  • 12. Stages of a machine learning project
  • 13. Stages of a machine learning project
  • 14. Stages of a machine learning project
  • 16. Python libraries ● Supervised and unsupervised machine learning ● Classification, regression, Support Vector Machine ● Clustering, Kmeans, DBSCAN ● Random Forest
  • 17. Python libraries ● Pipelines ● Grid-search ● Validation curves ● One-hot encoding of categorial data ● Dataset generators ● Principal Component Analysis (PCA)
  • 18. Python libraries Pipelines >>> from sklearn.pipeline import make_pipeline >>> from sklearn.naive_bayes import MultinomialNB >>> from sklearn.preprocessing import Binarizer >>> make_pipeline(Binarizer(), MultinomialNB()) Pipeline(steps=[('binarizer', Binarizer()), ('multinomialnb', MultinomialNB())]) http://scikit-learn.org/stable/modules/pipeline.html
  • 19. Python libraries Grid-search estimator.get_params() A search consists of: ● an estimator (regressor or classifier such as sklearn.svm.SVC()) ● a parameter space ● a method for searching or sampling candidates ● a cross-validation scheme ● a score function https://scikit-learn.org/stable/modules/grid_search.html#grid-search
  • 21. Python libraries Validation curves >>> train_scores, valid_scores = validation_curve( ... Ridge(), X, y, param_name="alpha", param_range=np.logspace(-7, 3, 3), ... cv=5) >>> train_scores array([[0.93..., 0.94..., 0.92..., 0.91..., 0.92...], [0.93..., 0.94..., 0.92..., 0.91..., 0.92...], [0.51..., 0.52..., 0.49..., 0.47..., 0.49...]]) >>> valid_scores array([[0.90..., 0.84..., 0.94..., 0.96..., 0.93...], [0.90..., 0.84..., 0.94..., 0.96..., 0.93...], [0.46..., 0.25..., 0.50..., 0.49..., 0.52...]])
  • 22. Python libraries One-hot encoding https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features # importing sklearn one hot encoding from sklearn.preprocessing import OneHotEncoder # initializing one hot encoding encoding = OneHotEncoder() # applying one hot encoding in python transformed_data = encoding.fit_transform(data[['Status']]) # head print(transformed_data.toarray())
  • 24. Python libraries Principal Component Analysis (PCA) https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
  • 25. Python libraries Principal Component Analysis (PCA) from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) from sklearn.decomposition import PCA pca = PCA(n_components=2) X_train = pca.fit_transform(X_train) X_test = pca.transform(X_test)
  • 27. Python tools for deep learning
  • 28. Python tools for deep learning
  • 29. Python tools for deep learning
  • 30. Python tools for deep learning
  • 31. Python tools for deep learning
  • 32. Python tools for deep learning
  • 33. Python tools for deep learning TensorFlow Keras Pytorch API Level High and Low High Low Architecture Not easy to use Simple, concise, readable Complex, less readable Speed Fast, high-performance Slow, low performance Fast, high-performance Trained Models Yes Yes Yes
  • 34. Python tools for deep learning ● tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions. ● transparent use of a GPU – Perform data-intensive computations much faster than on a CPU. ● efficient symbolic differentiation – Theano does your derivatives for functions with one or many inputs. ● speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny. ● dynamic C code generation – Evaluate expressions faster. ● extensive unit-testing and self-verification – Detect and diagnose many types of error
  • 35. Python tools for deep learning ● Synkhronos Extension to Theano for multi-GPU data parallelism ● Theano-MPI Theano-MPI a distributed framework for training models built in Theano based on data-parallelism. ● Platoon Multi-GPU mini-framework for Theano, single node. ● Elephas Distributed Deep Learning with Keras & Spark.
  • 36. Tips and tricks for data science projects with Python @jmortegac https://www.linkedin.com/in/jmortega1