SlideShare une entreprise Scribd logo
1  sur  100
Télécharger pour lire hors ligne
INTRODUCTION TO MACHINE LEARNING
Pruet Boonma
Faculty of Engineering
Chiang Mai University
pruet@eng.cmu.ac.th
1
http://screenprism.com/insights/article/why-does-hal-breakdown-and-become-hostile-to-humans-in-2001
AGENDA
▪Machine Learning Background
▪Introduction to R
▪Classification: Nearest Neighbors, Naïve Bayes, Decision
Trees
▪Forecasting with Regression Methods
▪Patterns Recognition with Association Rules
▪Clustering with K-means
▪Black box Model: Neural Network, SVM 2
HORIZON OF DATA ANALYTICS
▪Data Mining/Data Science
Extract patterns from data
▪Big Data
3Vs: Volume, Velocity, Variety
▪Artificial Intelligence
Machine Learning
 Deep Learning
3http://www.kdnuggets.com/2016/03/data-science-puzzle-explained.html
MACHINE LEARNING BACKGROUND
4
https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
▪Artificial Intelligence was first proposed in 1950s
to construct complex machines that inhibit some
characteristic of human intelligence.
General AI: machine that have all human’s senses,
reason, intuitive, imagination and think just like us.
Narrow AI: technology that able to perform
a specific task as well as, or better than,
human cans.
5
ARTIFICIAL INTELLIGENCE: A PURPOSE
EXAMPLE OF NARROW AI
▪Face recognition on Facebook, image classification on Pinterest,
Spam detection on Gmail, Spell suggestion on Google.
6
Computer makes a guess on what
we really want to search
MACHINE LEARNING: AN APPROACH
▪Machine learning is started to flourish in 90s
as an approach in narrow AI.
▪“Ability to learn without being explicitly
programmed” – Arthur Samuel (1954)
▪“A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured by P,
improves with experience E” – Tom M. Mitchell (1997)
7http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbook.html
EXAMPLE OF MACHINE LEARNING PROBLEM
▪We have to separate chickens into two groups, I’ll
show you how, then you can do the rest.
Task: separate chickens
Experience: I’ll show you how
Performance: separate chicken correctly
8
MACHINE LEARNING APPROACH
▪Instead of hard-coding
a program to perform a
task, the machine is
“trained” using large
amount of data and
algorithms to learn how
to perform a task.
9
TYPE OF PROBLEMS/TASKS
▪Supervised Learning
Computer is presented with training data set (example input
and desired output), the goal is to learn from that.
▪Unsupervised Learning
No training data set, computer needs to find structure from
given input.
▪Reinforcement Learning
Computer finds output from given input, then the output is
evaluated and feedback is given to the computer.
10
MACHINE LEARNING ALGORITHMS
11https://jixta.wordpress.com/2015/07/17/machine-learning-algorithms-mindmap/
MACHINE LEARNING APPLICATIONS
12
EXAMPLE APPLICATIONS
▪Online Marketing
Recommender system
Sentiment analysis
13
EXAMPLE APPLICATIONS
▪Computer Vision
Object recognition
Motion analysis
Image restoration
14http://cs.stanford.edu/~taranlan/
EXAMPLE APPLICATIONS
▪Internet fraud
Pattern recognition
Bayesian inference
Profiling
15
https://siftscience.com/sift-edu/prevent-fraud/fraud-detection-solutions
EXAMPLE APPLICATIONS
▪Self-driving car
Localization
Mapping
Object recognition
16
DEEP LEARNING: A TECHNIQUE
▪Human’s brain is a network
of a huge amount of small
computing devices (neuron).
▪Artificial neural network
emulates human’s brain by
create a network of simple
computing unit.
17Wikipedia
DEEP LEARNING
▪In 2012, Andrew Ng,
then with Google, proposed to use a huge neural
network with many layers.
Deep in deep learning means deep layers.
▪The first application for deep learning is image
recognition.
Now it’s used in many area including Go playing and
automatic cars.
18
http://www.popularmechanics.com/technology/a19863/googles-alphago-ai-wins-second-game-go/
HOW MACHINE LEARNS
▪In machine learning, machines makes sense of
data by creating models, then generalize the
models to make them support new data.
19
DATA STORAGE
▪Human has recorded data since the birth of history.
▪Electronic sensors contributes to explode of data
Wireless Sensor Network, Internet of Things, Big Data
▪The amount of data
beyond human comprehension.
▪Garbage-in, garbage-out.
20
https://www.slideshare.net/Sparkhound/spinning-brown-donuts-why-storage-still-counts
ABSTRACTION
▪Assigned meaning to stored data.
▪Computer summarized stored raw data using
model
An explicit description of the pattern within the data.
Math equations, trees/graphs, logical rules, clusters.
▪Typically, it depends on human to choose model.
Process of fitting a model to a dataset is training.
21https://arnesund.com/2015/05/31/using-amazon-machine-learning-to-predict-the-weather/
Chick
GENERALIZATION
▪The production of abstract can be limited, the idea
need to be generalized to be used in the future.
On tasks that are similar, but not identical.
Infer from the existing models to the new input.
Heuristics, i.e., educated guess, can be used to find most
useful inference.
22
infer
This is also a chick.
23https://blogs.nvidia.com/blog/2016/08/22/difference-deep-learning-training-inference-ai/
EVALUATION
▪With limited training data, bias can occurred in the
model.
Chicken needs to be yellow?
▪After trained, model is evaluated with test dataset.
To judge the generality of the model to unseen data.
Model may contain noise, i.e., variation in data.
24
MACHINE LEARNING IN PRACTICE
▪Machine learning process
Data collection: gathering data, in consumable format.
Data exploration and preparation: learn the character of
the data, eliminate unnecessary data, change format, etc.
Model training: human chooses algorithms to be used and
observe the result model.
Model evaluation: models are tested against test dataset.
Model improvement: change algorithm, more data, etc.
25
TYPE OF INPUT DATA
▪Data is expressed in unit of observation.
A set of properties of person, object, transaction, geographic
regions, time point, or measurements.
It can combines with the other, e.g., person-year, person’s
data for one year.
▪Collection of data consists of:
Examples: instances of the unit of observation
Features: recorded properties or attributes
26
TYPE OF FEATURES
▪Numerical : measured in numbers in some
positional numeral systems, i.e., quantitative
property.
▪Categorical/nominal: set of categories, i.e.,
qualitative property.
▪Ordinal: nominal variable with categories falling
in an ordered list, i.e., {small, medium, large}.
27
TYPE OF MACHINE LEARNING ALGORITHMS
▪Predictive model: prediction of one value using the other values
in the data set.
Chance of raining tomorrow?
▪Classification model: predicting which category an example
belongs.
Is this email a spam?
▪Descriptive model: insight gained from summarizing data in new
and interesting way.
What people also buy when they buy milk.
▪Clustering: identifies groups of examples with similar properties.
How many types of customers at a grocery store? 28
INTRODUCTION TO R
▪Open source programming language and software
environment for statistical computing.
Used by statisticians and data miners for developing
statistical software and data analysis.
29
From: Wikipedia
USING R
▪Start Rgui
▪Load internal data
Iris data set
▪Load external data
From csv file
▪Preview data
▪Plot the data
30
R GUI
31
Tool bar
Status bar
Command line +
text output
LOAD INTERNAL DATA
32
Data set name
List of data set example
Data set
variable(feature)
LOAD EXTERNAL DATA
▪We will use data from
https://www.data.go.th/DatasetDetail.aspx?id=70
49410f-5bb8-4c75-9e94-112ca18b63e2
▪Load CSV file, reformat data format to make it
consumable.
33
LOAD EXTERNAL DATA
▪Save to CSV file, e.g., household.csv
▪Load into R using read_csv() command from readr
library
34
File name
Assignment operator
Data set
name
PREVIEW DATA
▪names(): change feature name
▪head(): show feature name of couple of first
examples
▪c(…): create a list
35
PREVIEW
▪summary(): show summarized information of
dataframe (data set)
36
PREVIEW
▪Data set is a matrix, df[row, col]
▪To change order of column, use c() for assign new
order
37
Empty string before comma means
that wee need all row
PLOT THE DATA
▪Boxplot
Distribution of data
38
http://www.physics.csbsju.edu/stats/box2.html
Empty string before comma means
that wee need all row
But only column 2 - 11
BARPLOT
39
Only column 11, transpose column
to row
Use column 1 as X axis label
Too many examples
BARPLOT: TOP 10
40
Order row by the value in column
2558
Only top 10
SCATTER PLOT
▪Tutorial:
http://www2.warwick.ac.uk/fac/sci/moac/people/
students/peter_cock/r/iris_plots
41
CLASSIFICATION
▪Group data based on minimal distance.
▪Applications
Computer vision: this animal is a cat or a dog?
Recommender system: which book that this user will enjoy?
▪Example of classification techniques
k-Nearest Neighbors (k-NN), Naïve Bayes, Decision Trees
42
CLASSIFICATION
EXAMPLE
43
CLASSIFICATION WITH DECISION TREE
▪Make complex decision based on series of simple
conditions.
▪Utilize tree structure to model
the relationships among features and class.
▪Based on concept of recursive
partitioning.
44
RECURSIVE PARTITIONING
45
DECISION TREE WITH IRIS DATASET
▪Tutorial:
http://www.rdatamining.com/examples/decision-
tree
46
K-NN
▪Nearest neighbor classifiers assign unlabeled
examples to the class of similar labeled examples.
▪k-NN is a simplest but yet effective classifier.
No assumptions on underlying data distribution
Fast training but slow classification
Require selection of an appropriate k.
Not suitable for nominal data, need additional processing
47
K-NN
EXAMPLE
48Training phaseClassification phase
Distance
SIMILARITY MEASUREMENT
▪Similarity can be measured by distance function on
n-dimension spaces.
Traditionally, Euclidean distance is used.
Let p1 refers to value of first feature of p
tomato (sweetness = 6, crunchiness = 4), and the green
bean (sweetness = 3, crunchiness = 7),
49
K-NN EXAMPLE
▪tomato (sweetness = 6, crunchiness = 4)
▪If we calculate distance to its single nearest
neighbor, this is call 1-NN because k=1. 50
Closest single
neighbor
K-NN EXAMPLE
▪If k=3, k-NN performs a vote among the three
nearest neighbors.
The majority class among them is fruit, so tomato is fruit.
▪So, question is, what is the appropriate k.
51
APPROPRIATE K
▪Large k can leads to underfitting
▪Small k can leads to overfitting
▪One common practice is to use square root of the
number of training examples as k.
▪Try many k, observe the result.
52
PREPARING DATA FOR K-NN
▪Rescaling features
Min-max normalization
▪For nominal data, dummy coding can be used.
53
K-NN WITH IRIS DATA SET
▪If you don’t have iris data set, load from
http://archive.ics.uci.edu/ml/
▪Tutorial
https://www.datacamp.com/community/tutorials/m
achine-learning-in-r#one
54
NAÏVE BAYES
▪“70 percent chance of rain”
▪Use data of past events to extrapolate future events.
▪70 percent chance of rain implies that 7 out of 10 past
cases with similar condition, there is raining in the area.
▪Naïve Bayes classifier is based on Bayesian method.
55
BAYESIAN METHODS
▪Developed by Thomas Bayes in 18th century to describe
the probability of events.
▪Estimate likelihood of an event based on the evidence at
hand across multiple trials.
56
BAYESIAN METHODS
▪Classifiers utilize training data to calculate an observed
probability of each outcome based on the evidence
provided by feature values.
▪Later, when apply to unlabeled data, it uses the observed
probabilities to predict the most likely class.
▪Probability is a number between 0 and 1, i.e., 0% to
100% chance.
57
PROBABILITY
▪If it rained 3 out of 10 days with similar conditions
as today, the probability of rain is estimated as
3/10 = 0.30
▪Probability of event A is P(A), e.g., P(rain) = 0.30
▪So if trail has only two outcome, e.g., rain and not
rain, P(not rain) = 1 – 0.30 = 0.70
Mutually exclusive and exhaustive
58
Not rain (0.70)Rain (0.30)
PROBABILITY
▪If a second event is observed together with the first
event, they may have joint probability.
The chance of windy is 0.10
it’s overlapped with rain, this implies that not all windy
day will be rainy day, and vice versa.
▪So, P(rain) = 0.30, P(windy) = 0.10, the chance
that both raining and cloudy occur is written as
P(rain ∩ windy)
59
Not rain (0.70)Rain (0.30)
Windy
(0.10)
Rain (0.30)
PROBABILITY
▪Calculating P(rain ∩ windy) depends on the joint
probability of the two event.
If the two events are totally unrelated, they are called
independent events.
But if all events are independent, it would be impossible
to predict one by observing another.
Dependent events are the basis of predictive model.
60
PROBABILITY
▪Calculating independent event is simple;
P(rain ∩ windy) = P(rain) * P(windy)
P(A ∩ B) = P(A) * P(B)
▪Calculating dependent event is more complex
that comes Bayes’ theorem.
61
BAYES’ THEOREM
▪The relationship between dependent events is
▪P(A|B) = probability of event A given that event B
occurred (AKA, conditional probability)
▪P(A ∩ B) = probability that A and B occurred together
▪P(B) = probability of B alone
62
BAYES’ THEOREM
▪By definition, P(A ∩ B) = P(A|B) * P(B), so
▪
63
Posterior
probability
Likelihood
Prior
probability
Marginal
likelihood
EXAMPLE: SPAM DETECTION
▪What is the probability that the email is a spam
when there is the word “Viagra” inside?
64
EXAMPLE: SPAM DETECTION
▪Construct a frequency table.
Record number of times Viagra appeared in spam and ham
messages.
▪Observation:
P(Viagra=Yes|spam) = 4/20 = 0.20 indicates 20% chance
that spam messages contain word Viagra. 65
EXAMPLE: SPAM DETECTION
▪Since P(A ∩ B) = P(B|A) * P(A), so P(spam ∩ Viagra) =
P(Viagra|spam) * P(spam) = 4/20 * 20/100 = 0.04
P(spam ∩ Viagra) == P(spam=yes ∩ Viagra=yes)
▪So, P(spam|Viagra) = P(Viagra|spam) *
P(spam)/P(Viagra) = (4/20)*(20/100)/(5/100) = 0.80
So there is 80% chance that a message is spam, given that it
contains the world Viagra.
66
NAÏVE BAYES ALGORITHM
▪Applying Bayes’ theorem to classification problems.
Simple, fast, effect
Work well with large numbers of examples
Easy to understand and explain.
Events might not truly dependent/independent.
67
NAÏVE BAYES WITH IRIS DATASET
▪Tutorial
http://rischanlab.github.io/NaiveBayes.html
68
FORECASTING NUMERIC DATA
▪Mathematical relationship help us to understand
aspect of every life.
Body weight is a function of one’s calorie intake.
In more detail, 250 kc consumed result in nearly a
kilogram of weight.
▪This might not be perfect fit every situation, but can
be reasonably correct.
69
REGRESSION
▪Regression concerns with specifying the relationship
between a dependent variable and independent
variables.
Variables are numeric
▪We will use independent variables to predict
dependent variable.
▪There are many forms of regression, the simples form is
straight line.
i.e., linear regression. 70
LINEAR REGRESSION
▪Slope-intercept form; y = a + bx
71
IndependentDependent
SlopeIntercept
LINEAR REGRESSION
▪Simple linear regression: single independent
variable.
▪Multiple linear regression: multiple independent
variables.
aka; multiple regression.
72
SIMPLE LINEAR REGRESSION EXAMPLE
▪Distress events (dependent) vs. temperature
(independent)
73
y=3.70 – 0.048x
SIMPLE LINEAR REGRESSION
▪How to find the best a and b?
Ordinary least squares (OLS): the slope and intercept
are chosen to minimize the sum of the squared error.
74
minimize
SIMPLE LINEAR REGRESSION
▪Generally, the solution of a depends on the value of b
▪For the value of b, it can calculate from
75
SIMPLE LINEAR REGRESSION
▪Variance of x, i.e., Var(x), can be expressed as
▪Covariance of x and y, i.e., Cov(x,y) can be
expressed as
▪So
76
LINEAR REGRESSION WITH IRIS DATASET
▪Tutorial:
http://www2.warwick.ac.uk/fac/sci/moac/people/
students/peter_cock/r/iris_lm/
77
OTHER REGRESSION
▪Logistic regression: model a binary categorical
outcome
▪Poisson regression: models integer count data
▪Multinominal logistic regression: models a
categorical outcome.
78
ASSOCIATION RULES
▪Market basket analysis = barcode + inventory
system + personalize shopping profile
▪Itemset = group of item people bought together
{bread, peanut butter, jelly}
▪Result of market basket analysis is a set of
association rules.
Pattern found in the relationships among items in itemsets.
{peanut butter, jelly} -> {bread}
79
ASSOCIATION RULES
▪Apriori algorithm, introduced in 1994, is an
efficient algorithm for association rules.
Working with large amount of data
Result rules are easy to understand
Not good for small data
80
APRIORI ALGORITHM
▪Typical buying pattern.
Get well card and flowers
81
APRIORI ALGORITHM: SUPPORT AND CONFIDENCE
▪Support measures how frequently itemset occurs in the
data.
{get well card, flowers} = 3/5 = 0.6
{flowers} -> {get well card} = 3/5 = 0.6
{flowers} = 4/5 = 0.8
▪Confidence measurement
predictive power/accuracy.
{flowers} -> {get well card} = 0.6/0.8 = 0.75
82
APRIORI PRINCIPLE: BUILDING SET OF RULES
▪Identify all the itemsets that meet a minimum
support threshold.
▪Creating rules from these itemsets using those
meeting a minimum confidence threshold.
83
ASSOCIATION RULES WITH GROCERIES DATASET
▪Tutorial:
http://www.salemmarafi.com/code/market-basket-
analysis-with-r/
84
CLUSTERING
▪Clustering is an unsupervised learning task
that automatically divides the data into clusters.
No need to be provided with classes as training output.
▪Data can be grouped (clustered) by their similarity
(pattern) in features.
85
K-MEANS CLUSTERING ALGORITHM
▪Most commonly used and well studied algorithm.
Simple to learn
Might not found optimized cluster
Need to guess k value
▪K-means assigns each examples to one of the k clusters.
Minimize the differences within each cluster and maximized
the differences between the clusters.
86
K-MEAN CLUSTERING
1. Choosing k marker points randomly
2. Calculate distances from each examples
3. to each marker point, assign the example
4. to the closest marker point
5. Update marker location by calculating
6. The centroid of each group of data
7. Keep doing steps 2-6 until stable
87
K-MEAN CLUSTERING
1. Choosing k marker points randomly
2. Calculate distances from each examples
3. to each marker point, assign the example
4. to the closest marker point
5. Update marker location by calculating
6. The centroid of each group of data
7. Keep doing steps 2-6 until stable
88
K-MEAN CLUSTERING
1. Choosing k marker points randomly
2. Calculate distances from each examples
3. to each marker point, assign the example
4. to the closest marker point
5. Update marker location by calculating
6. The centroid of each group of data
7. Keep doing steps 2-6 until stable
89
K-MEAN WITH IRIS DATA SET
▪Tutorial: http://rischanlab.github.io/Kmeans.html
90
NEURAL NETWORK
▪Biological neuron
▪Artificial neuron
f is activation function
91
NEURAL NETWORK
▪Network topology
Number of layers/number of neuron in each layer
▪Single-layer Network
▪Multi-layer Network
92
NEURAL NETWORK WITH BOSTON DATA SET
▪Tutorial https://datascienceplus.com/fitting-neural-
network-in-r/
93
SUPPORT VECTOR MACHINES(SVM)
▪SVM is a surface that create boundary (hyperplane)
between points of data in multidimensional.
94
SVM
▪SVM searches for maximum margin hyperplane (MMH) that
creates the greatest separation between the two classes.
95
SVM WITH IRIS DATASET
▪Tutorial: https://www.r-bloggers.com/using-
support-vector-machines-as-flower-finders-name-
that-iris/
96
R PACKAGE
▪R can be extended by installing package
CRAN = Comprehensive R Archive Network
▪Installing R packages
97
INSTALL REQUIRED PACKAGE
▪Repo -> Thailand
▪Install -> ggvis
98
▪https://blogs.nvidia.com/blog/2016/07/29/whats-difference-
artificial-intelligence-machine-learning-deep-learning-ai/
▪http://sphweb.bumc.bu.edu/otlt/mph-
modules/bs/r/r2_summarystats-graphs/R2_SummaryStats-
Graphs_print.html
▪https://www.datacamp.com/community/tutorials/machine-
learning-in-r#one
▪https://www.autodeskresearch.com/publications/samestats
▪https://arnesund.com/2015/05/31/using-amazon-machine-
learning-to-predict-the-weather/
99
▪K mean
▪Cluster
▪Decision tree
▪Regression
▪Assoc rule
▪Logistic regression
100

Contenu connexe

Tendances

Machine Learning
Machine LearningMachine Learning
Machine LearningShrey Malik
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRaveen Perera
 
Machine Learning
Machine LearningMachine Learning
Machine LearningKumar P
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKoundinya Desiraju
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learningKnoldus Inc.
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Machine Learning
Machine LearningMachine Learning
Machine LearningVivek Garg
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningDr. Radhey Shyam
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningHaptik
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 

Tendances (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 

Similaire à Introduction to machine learning

Big Data And Machine Learning Using MATLAB.pdf
Big Data And Machine Learning Using MATLAB.pdfBig Data And Machine Learning Using MATLAB.pdf
Big Data And Machine Learning Using MATLAB.pdfssuserb2837a
 
Datacamp @ Transparency Camp 2010
Datacamp @ Transparency Camp 2010Datacamp @ Transparency Camp 2010
Datacamp @ Transparency Camp 2010Knowerce
 
lecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptxlecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptxMarc Teunis
 
Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)ShehryarSH1
 
AI with Azure Machine Learning
AI with Azure Machine LearningAI with Azure Machine Learning
AI with Azure Machine LearningGeert Baeke
 
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...GeeksLab Odessa
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streamingAdam Doyle
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Model Experiments Tracking and Registration using MLflow on Databricks
Model Experiments Tracking and Registration using MLflow on DatabricksModel Experiments Tracking and Registration using MLflow on Databricks
Model Experiments Tracking and Registration using MLflow on DatabricksDatabricks
 
Knobbe practice webinar series intellectual property strategies for artific...
Knobbe practice webinar series   intellectual property strategies for artific...Knobbe practice webinar series   intellectual property strategies for artific...
Knobbe practice webinar series intellectual property strategies for artific...Knobbe Martens - Intellectual Property Law
 
Manoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning DevelopmentManoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning DevelopmentAgile Impact Conference
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Rodney Joyce
 

Similaire à Introduction to machine learning (20)

Big Data And Machine Learning Using MATLAB.pdf
Big Data And Machine Learning Using MATLAB.pdfBig Data And Machine Learning Using MATLAB.pdf
Big Data And Machine Learning Using MATLAB.pdf
 
Datacamp @ Transparency Camp 2010
Datacamp @ Transparency Camp 2010Datacamp @ Transparency Camp 2010
Datacamp @ Transparency Camp 2010
 
lecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptxlecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptx
 
Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)
 
AI with Azure Machine Learning
AI with Azure Machine LearningAI with Azure Machine Learning
AI with Azure Machine Learning
 
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Model Experiments Tracking and Registration using MLflow on Databricks
Model Experiments Tracking and Registration using MLflow on DatabricksModel Experiments Tracking and Registration using MLflow on Databricks
Model Experiments Tracking and Registration using MLflow on Databricks
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Knobbe practice webinar series intellectual property strategies for artific...
Knobbe practice webinar series   intellectual property strategies for artific...Knobbe practice webinar series   intellectual property strategies for artific...
Knobbe practice webinar series intellectual property strategies for artific...
 
Manoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning DevelopmentManoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning Development
 
Deeplearning in production
Deeplearning in productionDeeplearning in production
Deeplearning in production
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
 

Dernier

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Dernier (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Introduction to machine learning

  • 1. INTRODUCTION TO MACHINE LEARNING Pruet Boonma Faculty of Engineering Chiang Mai University pruet@eng.cmu.ac.th 1 http://screenprism.com/insights/article/why-does-hal-breakdown-and-become-hostile-to-humans-in-2001
  • 2. AGENDA ▪Machine Learning Background ▪Introduction to R ▪Classification: Nearest Neighbors, Naïve Bayes, Decision Trees ▪Forecasting with Regression Methods ▪Patterns Recognition with Association Rules ▪Clustering with K-means ▪Black box Model: Neural Network, SVM 2
  • 3. HORIZON OF DATA ANALYTICS ▪Data Mining/Data Science Extract patterns from data ▪Big Data 3Vs: Volume, Velocity, Variety ▪Artificial Intelligence Machine Learning  Deep Learning 3http://www.kdnuggets.com/2016/03/data-science-puzzle-explained.html
  • 5. ▪Artificial Intelligence was first proposed in 1950s to construct complex machines that inhibit some characteristic of human intelligence. General AI: machine that have all human’s senses, reason, intuitive, imagination and think just like us. Narrow AI: technology that able to perform a specific task as well as, or better than, human cans. 5 ARTIFICIAL INTELLIGENCE: A PURPOSE
  • 6. EXAMPLE OF NARROW AI ▪Face recognition on Facebook, image classification on Pinterest, Spam detection on Gmail, Spell suggestion on Google. 6 Computer makes a guess on what we really want to search
  • 7. MACHINE LEARNING: AN APPROACH ▪Machine learning is started to flourish in 90s as an approach in narrow AI. ▪“Ability to learn without being explicitly programmed” – Arthur Samuel (1954) ▪“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E” – Tom M. Mitchell (1997) 7http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbook.html
  • 8. EXAMPLE OF MACHINE LEARNING PROBLEM ▪We have to separate chickens into two groups, I’ll show you how, then you can do the rest. Task: separate chickens Experience: I’ll show you how Performance: separate chicken correctly 8
  • 9. MACHINE LEARNING APPROACH ▪Instead of hard-coding a program to perform a task, the machine is “trained” using large amount of data and algorithms to learn how to perform a task. 9
  • 10. TYPE OF PROBLEMS/TASKS ▪Supervised Learning Computer is presented with training data set (example input and desired output), the goal is to learn from that. ▪Unsupervised Learning No training data set, computer needs to find structure from given input. ▪Reinforcement Learning Computer finds output from given input, then the output is evaluated and feedback is given to the computer. 10
  • 14. EXAMPLE APPLICATIONS ▪Computer Vision Object recognition Motion analysis Image restoration 14http://cs.stanford.edu/~taranlan/
  • 15. EXAMPLE APPLICATIONS ▪Internet fraud Pattern recognition Bayesian inference Profiling 15 https://siftscience.com/sift-edu/prevent-fraud/fraud-detection-solutions
  • 17. DEEP LEARNING: A TECHNIQUE ▪Human’s brain is a network of a huge amount of small computing devices (neuron). ▪Artificial neural network emulates human’s brain by create a network of simple computing unit. 17Wikipedia
  • 18. DEEP LEARNING ▪In 2012, Andrew Ng, then with Google, proposed to use a huge neural network with many layers. Deep in deep learning means deep layers. ▪The first application for deep learning is image recognition. Now it’s used in many area including Go playing and automatic cars. 18 http://www.popularmechanics.com/technology/a19863/googles-alphago-ai-wins-second-game-go/
  • 19. HOW MACHINE LEARNS ▪In machine learning, machines makes sense of data by creating models, then generalize the models to make them support new data. 19
  • 20. DATA STORAGE ▪Human has recorded data since the birth of history. ▪Electronic sensors contributes to explode of data Wireless Sensor Network, Internet of Things, Big Data ▪The amount of data beyond human comprehension. ▪Garbage-in, garbage-out. 20 https://www.slideshare.net/Sparkhound/spinning-brown-donuts-why-storage-still-counts
  • 21. ABSTRACTION ▪Assigned meaning to stored data. ▪Computer summarized stored raw data using model An explicit description of the pattern within the data. Math equations, trees/graphs, logical rules, clusters. ▪Typically, it depends on human to choose model. Process of fitting a model to a dataset is training. 21https://arnesund.com/2015/05/31/using-amazon-machine-learning-to-predict-the-weather/
  • 22. Chick GENERALIZATION ▪The production of abstract can be limited, the idea need to be generalized to be used in the future. On tasks that are similar, but not identical. Infer from the existing models to the new input. Heuristics, i.e., educated guess, can be used to find most useful inference. 22 infer This is also a chick.
  • 24. EVALUATION ▪With limited training data, bias can occurred in the model. Chicken needs to be yellow? ▪After trained, model is evaluated with test dataset. To judge the generality of the model to unseen data. Model may contain noise, i.e., variation in data. 24
  • 25. MACHINE LEARNING IN PRACTICE ▪Machine learning process Data collection: gathering data, in consumable format. Data exploration and preparation: learn the character of the data, eliminate unnecessary data, change format, etc. Model training: human chooses algorithms to be used and observe the result model. Model evaluation: models are tested against test dataset. Model improvement: change algorithm, more data, etc. 25
  • 26. TYPE OF INPUT DATA ▪Data is expressed in unit of observation. A set of properties of person, object, transaction, geographic regions, time point, or measurements. It can combines with the other, e.g., person-year, person’s data for one year. ▪Collection of data consists of: Examples: instances of the unit of observation Features: recorded properties or attributes 26
  • 27. TYPE OF FEATURES ▪Numerical : measured in numbers in some positional numeral systems, i.e., quantitative property. ▪Categorical/nominal: set of categories, i.e., qualitative property. ▪Ordinal: nominal variable with categories falling in an ordered list, i.e., {small, medium, large}. 27
  • 28. TYPE OF MACHINE LEARNING ALGORITHMS ▪Predictive model: prediction of one value using the other values in the data set. Chance of raining tomorrow? ▪Classification model: predicting which category an example belongs. Is this email a spam? ▪Descriptive model: insight gained from summarizing data in new and interesting way. What people also buy when they buy milk. ▪Clustering: identifies groups of examples with similar properties. How many types of customers at a grocery store? 28
  • 29. INTRODUCTION TO R ▪Open source programming language and software environment for statistical computing. Used by statisticians and data miners for developing statistical software and data analysis. 29 From: Wikipedia
  • 30. USING R ▪Start Rgui ▪Load internal data Iris data set ▪Load external data From csv file ▪Preview data ▪Plot the data 30
  • 31. R GUI 31 Tool bar Status bar Command line + text output
  • 32. LOAD INTERNAL DATA 32 Data set name List of data set example Data set variable(feature)
  • 33. LOAD EXTERNAL DATA ▪We will use data from https://www.data.go.th/DatasetDetail.aspx?id=70 49410f-5bb8-4c75-9e94-112ca18b63e2 ▪Load CSV file, reformat data format to make it consumable. 33
  • 34. LOAD EXTERNAL DATA ▪Save to CSV file, e.g., household.csv ▪Load into R using read_csv() command from readr library 34 File name Assignment operator Data set name
  • 35. PREVIEW DATA ▪names(): change feature name ▪head(): show feature name of couple of first examples ▪c(…): create a list 35
  • 36. PREVIEW ▪summary(): show summarized information of dataframe (data set) 36
  • 37. PREVIEW ▪Data set is a matrix, df[row, col] ▪To change order of column, use c() for assign new order 37 Empty string before comma means that wee need all row
  • 38. PLOT THE DATA ▪Boxplot Distribution of data 38 http://www.physics.csbsju.edu/stats/box2.html Empty string before comma means that wee need all row But only column 2 - 11
  • 39. BARPLOT 39 Only column 11, transpose column to row Use column 1 as X axis label Too many examples
  • 40. BARPLOT: TOP 10 40 Order row by the value in column 2558 Only top 10
  • 42. CLASSIFICATION ▪Group data based on minimal distance. ▪Applications Computer vision: this animal is a cat or a dog? Recommender system: which book that this user will enjoy? ▪Example of classification techniques k-Nearest Neighbors (k-NN), Naïve Bayes, Decision Trees 42
  • 44. CLASSIFICATION WITH DECISION TREE ▪Make complex decision based on series of simple conditions. ▪Utilize tree structure to model the relationships among features and class. ▪Based on concept of recursive partitioning. 44
  • 46. DECISION TREE WITH IRIS DATASET ▪Tutorial: http://www.rdatamining.com/examples/decision- tree 46
  • 47. K-NN ▪Nearest neighbor classifiers assign unlabeled examples to the class of similar labeled examples. ▪k-NN is a simplest but yet effective classifier. No assumptions on underlying data distribution Fast training but slow classification Require selection of an appropriate k. Not suitable for nominal data, need additional processing 47
  • 49. SIMILARITY MEASUREMENT ▪Similarity can be measured by distance function on n-dimension spaces. Traditionally, Euclidean distance is used. Let p1 refers to value of first feature of p tomato (sweetness = 6, crunchiness = 4), and the green bean (sweetness = 3, crunchiness = 7), 49
  • 50. K-NN EXAMPLE ▪tomato (sweetness = 6, crunchiness = 4) ▪If we calculate distance to its single nearest neighbor, this is call 1-NN because k=1. 50 Closest single neighbor
  • 51. K-NN EXAMPLE ▪If k=3, k-NN performs a vote among the three nearest neighbors. The majority class among them is fruit, so tomato is fruit. ▪So, question is, what is the appropriate k. 51
  • 52. APPROPRIATE K ▪Large k can leads to underfitting ▪Small k can leads to overfitting ▪One common practice is to use square root of the number of training examples as k. ▪Try many k, observe the result. 52
  • 53. PREPARING DATA FOR K-NN ▪Rescaling features Min-max normalization ▪For nominal data, dummy coding can be used. 53
  • 54. K-NN WITH IRIS DATA SET ▪If you don’t have iris data set, load from http://archive.ics.uci.edu/ml/ ▪Tutorial https://www.datacamp.com/community/tutorials/m achine-learning-in-r#one 54
  • 55. NAÏVE BAYES ▪“70 percent chance of rain” ▪Use data of past events to extrapolate future events. ▪70 percent chance of rain implies that 7 out of 10 past cases with similar condition, there is raining in the area. ▪Naïve Bayes classifier is based on Bayesian method. 55
  • 56. BAYESIAN METHODS ▪Developed by Thomas Bayes in 18th century to describe the probability of events. ▪Estimate likelihood of an event based on the evidence at hand across multiple trials. 56
  • 57. BAYESIAN METHODS ▪Classifiers utilize training data to calculate an observed probability of each outcome based on the evidence provided by feature values. ▪Later, when apply to unlabeled data, it uses the observed probabilities to predict the most likely class. ▪Probability is a number between 0 and 1, i.e., 0% to 100% chance. 57
  • 58. PROBABILITY ▪If it rained 3 out of 10 days with similar conditions as today, the probability of rain is estimated as 3/10 = 0.30 ▪Probability of event A is P(A), e.g., P(rain) = 0.30 ▪So if trail has only two outcome, e.g., rain and not rain, P(not rain) = 1 – 0.30 = 0.70 Mutually exclusive and exhaustive 58 Not rain (0.70)Rain (0.30)
  • 59. PROBABILITY ▪If a second event is observed together with the first event, they may have joint probability. The chance of windy is 0.10 it’s overlapped with rain, this implies that not all windy day will be rainy day, and vice versa. ▪So, P(rain) = 0.30, P(windy) = 0.10, the chance that both raining and cloudy occur is written as P(rain ∩ windy) 59 Not rain (0.70)Rain (0.30) Windy (0.10) Rain (0.30)
  • 60. PROBABILITY ▪Calculating P(rain ∩ windy) depends on the joint probability of the two event. If the two events are totally unrelated, they are called independent events. But if all events are independent, it would be impossible to predict one by observing another. Dependent events are the basis of predictive model. 60
  • 61. PROBABILITY ▪Calculating independent event is simple; P(rain ∩ windy) = P(rain) * P(windy) P(A ∩ B) = P(A) * P(B) ▪Calculating dependent event is more complex that comes Bayes’ theorem. 61
  • 62. BAYES’ THEOREM ▪The relationship between dependent events is ▪P(A|B) = probability of event A given that event B occurred (AKA, conditional probability) ▪P(A ∩ B) = probability that A and B occurred together ▪P(B) = probability of B alone 62
  • 63. BAYES’ THEOREM ▪By definition, P(A ∩ B) = P(A|B) * P(B), so ▪ 63 Posterior probability Likelihood Prior probability Marginal likelihood
  • 64. EXAMPLE: SPAM DETECTION ▪What is the probability that the email is a spam when there is the word “Viagra” inside? 64
  • 65. EXAMPLE: SPAM DETECTION ▪Construct a frequency table. Record number of times Viagra appeared in spam and ham messages. ▪Observation: P(Viagra=Yes|spam) = 4/20 = 0.20 indicates 20% chance that spam messages contain word Viagra. 65
  • 66. EXAMPLE: SPAM DETECTION ▪Since P(A ∩ B) = P(B|A) * P(A), so P(spam ∩ Viagra) = P(Viagra|spam) * P(spam) = 4/20 * 20/100 = 0.04 P(spam ∩ Viagra) == P(spam=yes ∩ Viagra=yes) ▪So, P(spam|Viagra) = P(Viagra|spam) * P(spam)/P(Viagra) = (4/20)*(20/100)/(5/100) = 0.80 So there is 80% chance that a message is spam, given that it contains the world Viagra. 66
  • 67. NAÏVE BAYES ALGORITHM ▪Applying Bayes’ theorem to classification problems. Simple, fast, effect Work well with large numbers of examples Easy to understand and explain. Events might not truly dependent/independent. 67
  • 68. NAÏVE BAYES WITH IRIS DATASET ▪Tutorial http://rischanlab.github.io/NaiveBayes.html 68
  • 69. FORECASTING NUMERIC DATA ▪Mathematical relationship help us to understand aspect of every life. Body weight is a function of one’s calorie intake. In more detail, 250 kc consumed result in nearly a kilogram of weight. ▪This might not be perfect fit every situation, but can be reasonably correct. 69
  • 70. REGRESSION ▪Regression concerns with specifying the relationship between a dependent variable and independent variables. Variables are numeric ▪We will use independent variables to predict dependent variable. ▪There are many forms of regression, the simples form is straight line. i.e., linear regression. 70
  • 71. LINEAR REGRESSION ▪Slope-intercept form; y = a + bx 71 IndependentDependent SlopeIntercept
  • 72. LINEAR REGRESSION ▪Simple linear regression: single independent variable. ▪Multiple linear regression: multiple independent variables. aka; multiple regression. 72
  • 73. SIMPLE LINEAR REGRESSION EXAMPLE ▪Distress events (dependent) vs. temperature (independent) 73 y=3.70 – 0.048x
  • 74. SIMPLE LINEAR REGRESSION ▪How to find the best a and b? Ordinary least squares (OLS): the slope and intercept are chosen to minimize the sum of the squared error. 74 minimize
  • 75. SIMPLE LINEAR REGRESSION ▪Generally, the solution of a depends on the value of b ▪For the value of b, it can calculate from 75
  • 76. SIMPLE LINEAR REGRESSION ▪Variance of x, i.e., Var(x), can be expressed as ▪Covariance of x and y, i.e., Cov(x,y) can be expressed as ▪So 76
  • 77. LINEAR REGRESSION WITH IRIS DATASET ▪Tutorial: http://www2.warwick.ac.uk/fac/sci/moac/people/ students/peter_cock/r/iris_lm/ 77
  • 78. OTHER REGRESSION ▪Logistic regression: model a binary categorical outcome ▪Poisson regression: models integer count data ▪Multinominal logistic regression: models a categorical outcome. 78
  • 79. ASSOCIATION RULES ▪Market basket analysis = barcode + inventory system + personalize shopping profile ▪Itemset = group of item people bought together {bread, peanut butter, jelly} ▪Result of market basket analysis is a set of association rules. Pattern found in the relationships among items in itemsets. {peanut butter, jelly} -> {bread} 79
  • 80. ASSOCIATION RULES ▪Apriori algorithm, introduced in 1994, is an efficient algorithm for association rules. Working with large amount of data Result rules are easy to understand Not good for small data 80
  • 81. APRIORI ALGORITHM ▪Typical buying pattern. Get well card and flowers 81
  • 82. APRIORI ALGORITHM: SUPPORT AND CONFIDENCE ▪Support measures how frequently itemset occurs in the data. {get well card, flowers} = 3/5 = 0.6 {flowers} -> {get well card} = 3/5 = 0.6 {flowers} = 4/5 = 0.8 ▪Confidence measurement predictive power/accuracy. {flowers} -> {get well card} = 0.6/0.8 = 0.75 82
  • 83. APRIORI PRINCIPLE: BUILDING SET OF RULES ▪Identify all the itemsets that meet a minimum support threshold. ▪Creating rules from these itemsets using those meeting a minimum confidence threshold. 83
  • 84. ASSOCIATION RULES WITH GROCERIES DATASET ▪Tutorial: http://www.salemmarafi.com/code/market-basket- analysis-with-r/ 84
  • 85. CLUSTERING ▪Clustering is an unsupervised learning task that automatically divides the data into clusters. No need to be provided with classes as training output. ▪Data can be grouped (clustered) by their similarity (pattern) in features. 85
  • 86. K-MEANS CLUSTERING ALGORITHM ▪Most commonly used and well studied algorithm. Simple to learn Might not found optimized cluster Need to guess k value ▪K-means assigns each examples to one of the k clusters. Minimize the differences within each cluster and maximized the differences between the clusters. 86
  • 87. K-MEAN CLUSTERING 1. Choosing k marker points randomly 2. Calculate distances from each examples 3. to each marker point, assign the example 4. to the closest marker point 5. Update marker location by calculating 6. The centroid of each group of data 7. Keep doing steps 2-6 until stable 87
  • 88. K-MEAN CLUSTERING 1. Choosing k marker points randomly 2. Calculate distances from each examples 3. to each marker point, assign the example 4. to the closest marker point 5. Update marker location by calculating 6. The centroid of each group of data 7. Keep doing steps 2-6 until stable 88
  • 89. K-MEAN CLUSTERING 1. Choosing k marker points randomly 2. Calculate distances from each examples 3. to each marker point, assign the example 4. to the closest marker point 5. Update marker location by calculating 6. The centroid of each group of data 7. Keep doing steps 2-6 until stable 89
  • 90. K-MEAN WITH IRIS DATA SET ▪Tutorial: http://rischanlab.github.io/Kmeans.html 90
  • 91. NEURAL NETWORK ▪Biological neuron ▪Artificial neuron f is activation function 91
  • 92. NEURAL NETWORK ▪Network topology Number of layers/number of neuron in each layer ▪Single-layer Network ▪Multi-layer Network 92
  • 93. NEURAL NETWORK WITH BOSTON DATA SET ▪Tutorial https://datascienceplus.com/fitting-neural- network-in-r/ 93
  • 94. SUPPORT VECTOR MACHINES(SVM) ▪SVM is a surface that create boundary (hyperplane) between points of data in multidimensional. 94
  • 95. SVM ▪SVM searches for maximum margin hyperplane (MMH) that creates the greatest separation between the two classes. 95
  • 96. SVM WITH IRIS DATASET ▪Tutorial: https://www.r-bloggers.com/using- support-vector-machines-as-flower-finders-name- that-iris/ 96
  • 97. R PACKAGE ▪R can be extended by installing package CRAN = Comprehensive R Archive Network ▪Installing R packages 97
  • 98. INSTALL REQUIRED PACKAGE ▪Repo -> Thailand ▪Install -> ggvis 98