SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
Uczenie maszynowe
Vladimir Alekseichenko
„rocket science” czy chleb powszedni?
Zmiany w czasie
10min na jeden
36 500 000 minut
~70 lat
Kierowca vs Mechanik
dataworkshop.eu
Bike Sharing Demand
Zadnie - kaggle
Rozwiązanie - github.com/dataworkshop
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Zrozum Biznes i Dane
(understand business and data)
Dni robocze
Weekend
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Wytworzenie cech
(feature engineering)
• ilościowe => od 1 do 10, 11 do 20…
• daty => dzień, miesiąc, rok, godzina, czy weekend…
• kategorii/jakościowe (czerwony, zielony, biały)
• przypisać identyfikator liczbowy (1, 2, 3)
• stworzyć n-kolumn binarnych (jest czerwony? itd)
• prawdopodobieństwa ze zmienną docelową
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Selekcja cech
(feature selection)
• Czym mniej tym lepiej (prostszy model)
• Zostawić najbardziej wartościowe (idealnie jedna :)
• Cechy (zazwyczaj) są zależny, więc trzeba uważać… (sprawdzać empirycznie)
• Szybciej
Variance
Univariate
Recursive
xgbfir
https://github.com/limexp/xgbfir
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Dobór Modelu
(model selection)
• Linear
• Decision Tree
• Random Forest
• Gradient Boosting
• Neural Network
Linear
https://github.com/dataworkshop/model_evaluation/blob/master/step1-regression.ipynb
Decision Tree
http://xgboost.readthedocs.io/en/latest/model.html
Ensemble trees
http://xgboost.readthedocs.io/en/latest/model.html
Ensemble trees
• Bagging (bootstrap aggregation)
• Random Forest
• Extra Trees
• Boosting
• Gradient Boosting
XGBoost
(Extreme Gradient Boosting)
“When in doubt, use
xgboost”
Owen Zhang
Wybór modelu
(model selection)
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Dobór hiperparametrów
(tuning hyperparameters)
• Grid Search
• Random Search
• Bayesian
hyperopt
Understand
Business & Data
Read and explore data
Feature Engineering
Create a new ones based on already exists
Feature Selection
Select only useful features
Model Selection
Find the best model(s) model
A
model
B
model
C
model
D
model
E
Tuning
Hyperparameters
Find the best hyperparameters for given model
Ensemble Modeling
Combine few models into one more better
x0.6 x0.4+
mode
l B
mode
l E
datetime season temp count
2011-01-01 08:32:02 1 9.23 5
2012-04-02 12:10:00 2 18.78 32
2012-08-07 15:47:01 3 15.45 15
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
mode
l B
mode
l E
datetime season temp hour day month … count count_log
2011-01-01
08:32:02 1 9.23 8 1 1 … 5 1.609
2012-04-02
12:10:00 2 18.78 12 2 4 … 32 3.466
2012-08-07
15:47:01 3 15.45 15 7 8 … 15 2.708
Ansambl
(ensemble modeling)
Neuron
(Artificial) Neural Network
MNIST
Dane
Neural Network
Error: 1.60%
http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html
source
Wyzwania
Przeuczenie się
(overfitting)
http://mlwiki.org/index.php/Overfitting
Sprawdzian krzyżowy
(cross-validation)
http://blog.goldenhelix.com/bchristensen/cross-validation-for-genomic-prediction-in-svs/
Kreatywność jest wiele warta
https://techcrunch.com/2016/11/19/how-data-science-and-rocket-science-will-get-humans-to-mars
source
Fala już idzi…
czy jesteś gotów?
Dziękuję
@slon1024
hello@vova.me
dataworkshop.eu

Contenu connexe

Similaire à AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni?

Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Databricks
 
15 a 01 reporting
15 a 01 reporting15 a 01 reporting
15 a 01 reporting
tflung
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
Serge Smetana
 
Summer 2013 Internship Reflection
Summer 2013 Internship ReflectionSummer 2013 Internship Reflection
Summer 2013 Internship Reflection
Trevor Huggins
 

Similaire à AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni? (20)

Performance Analysis - A practical example
Performance Analysis - A practical examplePerformance Analysis - A practical example
Performance Analysis - A practical example
 
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
 
15 a 01 reporting
15 a 01 reporting15 a 01 reporting
15 a 01 reporting
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Back to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FMEBack to FME School - Day 1: Your Data and FME
Back to FME School - Day 1: Your Data and FME
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data Warehouse
 
STEP Architecture Update
STEP Architecture UpdateSTEP Architecture Update
STEP Architecture Update
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache Beam
 
Summer 2013 Internship Reflection
Summer 2013 Internship ReflectionSummer 2013 Internship Reflection
Summer 2013 Internship Reflection
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
 
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023 Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
Building Intelligent Workplace Limits and Challenges RIGA COMM 2023
 
Reporting with cloud solutions from SAP
Reporting with cloud solutions from SAPReporting with cloud solutions from SAP
Reporting with cloud solutions from SAP
 

Plus de 2040.io

Plus de 2040.io (12)

Jak budujemy inteligentnego asystenta biznesowego
Jak budujemy inteligentnego asystenta biznesowegoJak budujemy inteligentnego asystenta biznesowego
Jak budujemy inteligentnego asystenta biznesowego
 
Obsługa klienta z wykorzystaniem sztucznej inteligencji
Obsługa klienta z wykorzystaniem sztucznej inteligencjiObsługa klienta z wykorzystaniem sztucznej inteligencji
Obsługa klienta z wykorzystaniem sztucznej inteligencji
 
Jak AI pozwala nam usłyszeć głos klienta
Jak AI pozwala nam usłyszeć głos klientaJak AI pozwala nam usłyszeć głos klienta
Jak AI pozwala nam usłyszeć głos klienta
 
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstuWyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
Wyzwania związane z modelowaniem mobilnych systemów świadomych kontekstu
 
Rozpoznawanie mowy: problem rozwiązany?
Rozpoznawanie mowy: problem rozwiązany?Rozpoznawanie mowy: problem rozwiązany?
Rozpoznawanie mowy: problem rozwiązany?
 
Czy Deep Learning działa?
Czy Deep Learning działa?Czy Deep Learning działa?
Czy Deep Learning działa?
 
Analiza semantyczna zasosowana w środowisku Menerva
Analiza semantyczna zasosowana w środowisku MenervaAnaliza semantyczna zasosowana w środowisku Menerva
Analiza semantyczna zasosowana w środowisku Menerva
 
Time-series prediction with neural networks
Time-series prediction with neural networksTime-series prediction with neural networks
Time-series prediction with neural networks
 
Ai meetup Neural machine translation updated
Ai meetup Neural machine translation updatedAi meetup Neural machine translation updated
Ai meetup Neural machine translation updated
 
AIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translationAIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translation
 
AIMeetup #4: Artificial intelligence and economics
AIMeetup #4: Artificial intelligence and economicsAIMeetup #4: Artificial intelligence and economics
AIMeetup #4: Artificial intelligence and economics
 
AIMeetup #4: Let’s compete with machine! edrone crm
AIMeetup #4: Let’s compete with machine! edrone crmAIMeetup #4: Let’s compete with machine! edrone crm
AIMeetup #4: Let’s compete with machine! edrone crm
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni?

  • 1. Uczenie maszynowe Vladimir Alekseichenko „rocket science” czy chleb powszedni?
  • 3.
  • 4.
  • 5.
  • 6. 10min na jeden 36 500 000 minut ~70 lat
  • 7.
  • 10. Bike Sharing Demand Zadnie - kaggle Rozwiązanie - github.com/dataworkshop
  • 11. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 12. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 13. Zrozum Biznes i Dane (understand business and data)
  • 14.
  • 17. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 18. Wytworzenie cech (feature engineering) • ilościowe => od 1 do 10, 11 do 20… • daty => dzień, miesiąc, rok, godzina, czy weekend… • kategorii/jakościowe (czerwony, zielony, biały) • przypisać identyfikator liczbowy (1, 2, 3) • stworzyć n-kolumn binarnych (jest czerwony? itd) • prawdopodobieństwa ze zmienną docelową
  • 19.
  • 20. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 21. Selekcja cech (feature selection) • Czym mniej tym lepiej (prostszy model) • Zostawić najbardziej wartościowe (idealnie jedna :) • Cechy (zazwyczaj) są zależny, więc trzeba uważać… (sprawdzać empirycznie) • Szybciej
  • 24. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 25. Dobór Modelu (model selection) • Linear • Decision Tree • Random Forest • Gradient Boosting • Neural Network
  • 29. Ensemble trees • Bagging (bootstrap aggregation) • Random Forest • Extra Trees • Boosting • Gradient Boosting
  • 30. XGBoost (Extreme Gradient Boosting) “When in doubt, use xgboost” Owen Zhang
  • 32. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 33. Dobór hiperparametrów (tuning hyperparameters) • Grid Search • Random Search • Bayesian
  • 35. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  • 37.
  • 40. MNIST
  • 41. Dane
  • 50. source Fala już idzi… czy jesteś gotów?