SKLearn Workshop.pptx

Machine Learning
with SKLearn
Omar
Alam
Yonge
Bai
Trinity
Yip

Different fields of AI:
• AI: Using machines to make human-like
decisions
○ Ex. Chatbot
• ML: Using data to “learn” features of a problem
then make predictions
○ Ex. Decision trees, Logistic regression
• DL: Using Neural networks to “learn” correct
outputs given data
○ Ex. self driving
Artificial Intelligence

Machine Learning Pipeline
What does creating a ML project look like?
1.Gathering data
2.Preprocessing data
3.Model selection
4.Training
5.Evaluation
6.Parameter tuning
7.Prediction

What is SKLearn?
Machine Learning Library
● Python library for introduction to machine learning
● Created by Google
● Provides tools for several steps of the pipeline
○ E.g turning data into information readable by the
computer

Minmax Normalization
● Generally, datasets attributes are on very different attributes.
○ Consider house pricing dataset: Rooms (1 - 5), Price (200k - 1mil), Age (1 - 70).
● Many methods of Normalization
○ Minmax Normalization, Log scaling, z-score
● Minmax normalization: Scale all values based on the minimum and maximum values for
each attribute.
Data Normalization

80/20 Split
● Datasets are split into two sets: Training + Validation
● Model is trained on training set.
● The accuracy is measured on validation set.
Dataset Splitting

• Many different types of ML models
• This step is to figure out which one is best
for your application
Model Selection
Important Factors
● Type of problem: Classification, Regression, etc
● Type of data: images, text, numerical, audio
● How much data

Math/Theory Behind Log Reg
● Logistic function is just a sigmoid function.
○ Used for classification and regression.
● Better suited for non-linear datasets.
● The function can be used as a probability
Function when L = 1;

Training
Gradient Descent
● Gradient Descent is a first order optimization function.
○ Function has to be convex and differentiable.
● Gradient is the derivative of a function.
○ Just the derivative of a function f(x).
○ N-Vector for f(x1, x2, x3, ….., xn)
● Used to minimize error using gradient and learning rate.
○ Learning rate defines how fast the model learns

Cost Function
● Cost function is a “loss function”
● A loss function define the error between
predictions and actual values.
● We can use the log loss function for logistic regression.
Training

Evaluation
Selecting Evaluation Metrics
- Used to measure the quality of
model
- Different evaluation metric can skew
your perception of model
performance
- Selecting the incorrect metric will
make your model optimize
incorrectly

Example Dataset
Total Patients: 10
Deaths: 1
Model’s prediction: nobody passes away

Our useless model seems pretty good

Parameter Tuning
- Hyperparameters are the fine details of model
- Initially assumed values
Examples of Hyperparameters:
- Number of Iterations
- Regularization
Tune Hyperparameters

Grid Search: Tests all the
parameters combinations
(Can take a long time)
Randomized Search: Test random
combinations of the parameters
(Can be faster)
Grid Search and Randomized Search
What’s the difference? Who cares?

Resources
- https://scikit-learn.org/stable/index.html SKLearn Docs
- https://www.oreilly.com/library/view/hands-on-machine-
learning/9781492032632/ Popular textbook for machine learning
Youtube Channels
- https://www.youtube.com/user/sentdex
- https://www.youtube.com/user/keeroyz
- https://www.youtube.com/@YannicKilcher
Stable Diffusion
https://creator.nightcafe.studio/create/text-to-image?algo=stable
DALL E Mini
https://www.craiyon.com/

SKLearn Workshop.pptx

Recommandé

Recommandé

Contenu connexe

Similaire à SKLearn Workshop.pptx

Similaire à SKLearn Workshop.pptx (20)

Plus de fsxflyer789Productio

Plus de fsxflyer789Productio (7)

Dernier

Dernier (20)

SKLearn Workshop.pptx