SlideShare une entreprise Scribd logo
1  sur  62
UNIT- 4
Machine Learning
What is Machine Learning?
 Adapt to / learn from data
To optimize a performance function
Can be used to:
Extract knowledge from data
Learn tasks that are difficult to formalise
Create software that improves over time
When to learn
 Human expertise does not exist (navigating on Mars)
 Humans are unable to explain their expertise (speech
recognition)
 Solution changes in time (routing on a computer network)
 Solution needs to be adapted to particular cases (user biometrics)
Learning involves
 Learning general models from data
 Data is cheap and abundant. Knowledge is expensive and scarce
 Customer transactions to computer behaviour
 Build a model that is a good and useful approximation to the data
Applications
 Speech and hand-writing recognition
 Autonomous robot control
 Data mining and bioinformatics: motifs, alignment, …
 Playing games
 Fault detection
 Clinical diagnosis
 Spam email detection
 Credit scoring, fraud detection
 Web mining: search engines
 Market basket analysis,
Applications are diverse but methods are generic
Generic methods
 Learning from labelled data (supervised learning)
Eg. Classification, regression, prediction, function approx.
 Learning from unlabelled data (unsupervised learning)
Eg. Clustering, visualisation, dimensionality reduction
 Learning from sequential data
Eg. Speech recognition, DNA data analysis
 Associations
 Reinforcement Learning
Statistical Learning
Machine learning methods can be unified within the
framework of statistical learning:
Data is considered to be a sample from a probability
distribution.
Typically, we don’t expect perfect learning but only
“probably correct” learning.
Statistical concepts are the key to measuring our expected
performance on novel problem instances.
Induction and inference
 Induction: Generalizing from specific examples.
 Inference: Drawing conclusions from possibly incomplete
knowledge.
Learning machines need to do both.
Inductive learning
 Data produced by “target”.
 Hypothesis learned from data in order to “explain”, “predict”,“model”
or “control” target.
 Generalisation ability is essential.
Inductive learning hypothesis:
“If the hypothesis works for enough data
then it will work on new examples.”
Example 1: Hand-written digits
Data representation: Greyscale images
Task: Classification (0,1,2,3…..9)
Problem features:
 Highly variable inputs from same class including some
“weird” inputs,
 imperfect human classification,
 high cost associated with errors so “don’t know” may be
useful.
Example 2: Speech recognition
Data representation: features from spectral analysis of
speech signals (two in this simple example).
Problem features:
 Highly variable data with same classification.
 Good feature selection is very important.
 Speech recognition is often broken into a number of
smaller tasks like this.
Example 3: DNA microarrays
 DNA from ~10000 genes attached to a glass slide (the
microarray).
 Green and red labels attached to mRNA from two
different samples.
 mRNA is hybridized (stuck) to the DNA on the chip and
green/red ratio is used to measure relative abundance of
gene products.
DNA microarrays
Data representation: ~10000 Green/red intensity levels ranging
from 10-10000.
Tasks: Sample classification, gene classification, visualisation and
clustering of genes/samples.
Problem features:
 High-dimensional data but relatively small number of examples.
 Extremely noisy data (noise ~ signal).
 Lack of good domain knowledge.
Projection of 10000 dimensional data onto 2D using PCA
effectively separates cancer subtypes.
Probabilistic models
A large part of the module will deal with methods
that have an explicit probabilistic interpretation:
 Good for dealing with uncertainty
eg. is a handwritten digit a three or an eight ?
 Provides interpretable results
 Unifies methods from different fields
20 of 15
Face Detection
1. Image pyramid used to locate faces of different sizes
2. Image lighting compensation
3. Neural Network detects rotation of face candidate
4. Final face candidate de-rotated ready for detection
21 of 15
Face Detection (Con’t)
5. Submit image to Neural Network
a. Break image into segments
b. Each segment is a unique input to the network
c. Each segment looks for certain patterns (eyes,
mouth, etc)
6. Output is likelihood of a face
Supervised Learning: Uses
 Prediction of future cases
 Knowledge extraction
 Compression of Data & knowledge
Unsupervised Learning
 Clustering: grouping similar instances
 Example applications
Customer segmentation in CRM
Learning pattern in bioinformatics
Clustering items based on similarity
Clustering users based on interests
Reinforcement Learning
 Learning a policy: A sequence of outputs
 No supervised output but delayed reward
 Credit assignment problem
 Game playing
 Robot in a maze
 Multiple agnts, partial observability
ID3 Decision Tree
 It is particularly interesting for
Its representation of learned knowledge
Its approach to the management of complexity
Its heuristic for selecting candidate concepts
Its potential for handling noisy data
ID3 Decision Tree
ID3 Decision Tree
 The previous table can be represented as the following
decision tree:
ID3 Decision Tree
 In a decision tree, each internal node represents a test on some property
 Each possible value of that property corresponds to a branch of the tree
 Leaf nodes represents classification, such as low or moderate risk
ID3 Decision Tree
 A simplified decision tree for credit risk management
ID3 Decision Tree
 ID3 constructs decision trees in a top-down fashion.
 ID3 selects a property to test at the current node of the
tree and uses this test to partition the set of examples
 The algorithm recursively constructs a sub-tree for each
parturition
 This continues until all members of the partition are in
the same class
ID3 Decision Tree
 For example, ID3 selects income as the root property for
the first step
ID3 Decision Tree
ID3 Decision Tree
 How to select the 1st node? (and the following
nodes)
 ID3 measures the information gained by making
each property the root of current subtree
 It picks the property that provides the greatest
information gain
ID3 Decision Tree
 If we assume that all the examples in the table occur
with equal probability, then:
P(risk is high)=6/14
P(risk is moderate)=3/14
P(risk is low)=5/14
ID3 Decision Tree
 I[6,3,5]=
 Based on
531.1)
14
5
(log
14
5
)
14
3
(log
14
3
)
14
6
(log
14
6
)5,3,6()( 222  IDInfo


n
i
ii mpmpMI
1
2 ))((log)()(
ID3 Decision Tree
 The information gain form income is:
Gain(income)= I[6,3,5]-E[income]= 1.531-0.564=0.967
Similarly,
 Gain(credit history)=0.266
 Gain(debt)=0.063
 Gain(colletral)=0.206
ID3 Decision Tree
 Since income provides the greatest information gain, ID3
will select it as the root of the tree
ID3 Decision Tree
Pseudo Code
Unsupervised Learning
 The learning algorithms discussed so far implement
forms of supervised learning
 They assume the existence of a teacher, some fitness
measure, or other external method of classifying training
instances
 Unsupervised Learning eliminates the teacher and
requires that the learners form and evaluate concepts
their own
Unsupervised Learning
 Science is perhaps the best example of unsupervised
learning in humans
 Scientists do not have the benefit of a teacher.
 Instead, they propose hypotheses to explain
observations,
Unsupervised Learning
 The result of this algorithm is a Binary Tree whose leaf
nodes are instances and whose internal nodes are
clusters of increasing size
 We may also extend this algorithm to objects
represented as sets of symbolic features.
Unsupervised Learning
 Object1={small, red, rubber, ball}
 Object1={small, blue, rubber, ball}
 Object1={large, black, wooden, ball}
 This metric would compute the similary
values:
Similarity(object1, object2)= ¾
Similarity(object1, object3)=1/4
Machine Learning
 Up till now: how to search or reason using a model
 Machine learning: how to select a model on the basis of
data / experience
Learning parameters (e.g. probabilities)
Learning hidden concepts (e.g. clustering)
Classification
 In classification, we learn to predict labels (classes) for
inputs
 Examples:
 Spam detection (input: document, classes: spam / ham)
 OCR (input: images, classes: characters)
 Medical diagnosis (input: symptoms, classes: diseases)
 Automatic essay grader (input: document, classes: grades)
 Fraud detection (input: account activity, classes: fraud / no fraud)
 Customer service email routing
 … many more
 Classification is an important commercial technology!
Classification
 Data:
 Inputs x, class labels y
 We imagine that x is something that has a lot of structure, like an
image or document
 In the basic case, y is a simple N-way choice
 Basic Setup:
 Training data: D = bunch of <x,y> pairs
 Feature extractors: functions fi which provide attributes of an
example x
 Test data: more x’s, we must predict y’s
Bayes Nets for Classification
 One method of classification:
Features are values for observed variables
Y is a query variable
Use probabilistic inference to compute most likely Y
Simple Classification
 Simple example: two binary features
This is a naïve Bayes model
M
S F
direct estimate
Bayes estimate
(no assumptions)
Conditional
independence
+
General Naïve Bayes
 A general naive Bayes model:
C
E1 EnE2
|C| parameters
n x |E| x |C|
parameters
|C| x |E|n
parameters
Inference for Naïve Bayes
 Goal: compute posterior over causes
 Step 1: get joint probability of causes and evidence
 Step 2: get probability of evidence
 Step 3: renormalize
+
A Digit Recognizer
 Input: pixel grids
 Output: a digit 0-9
Examples: CPTs
1 0.1
2 0.1
3 0.1
4 0.1
5 0.1
6 0.1
7 0.1
8 0.1
9 0.1
0 0.1
1 0.01
2 0.05
3 0.05
4 0.30
5 0.80
6 0.90
7 0.05
8 0.60
9 0.50
0 0.80
1 0.05
2 0.01
3 0.90
4 0.80
5 0.90
6 0.90
7 0.25
8 0.85
9 0.60
0 0.80
Parameter Estimation
 Estimating the distribution of a random variable X or X|Y
 Empirically: use training data
 For each value x, look at the empirical rate of that value:
 This estimate maximizes the likelihood of the data
 Elicitation: ask a human!
 Usually need domain experts, and sophisticated ways of eliciting
probabilities (e.g. betting games)
 Trouble calibrating
r g g
Handwritten characters classification
Gray level pictures:object
classification
Gray level pictures: human action
classification
Expectation Maximization EM
when to use
 data is only partially observable
 unsupervised clustering: target value unobservable
 supervised learning: some instance attributes
unobservable
applications
 training Bayesian Belief Networks
 unsupervised clustering
 learning hidden Markov models
Generating Data from Mixture of
Gaussians
Each instance x generated by
 choosing one of the k Gaussians at random
 Generating an instance according to that Gaussian
EM for Estimating k Means
Given:
 instances from X generated by mixture of k Gaussians
 unknown means <m1,…,mk> of the k Gaussians
 don’t know which instance xi was generated by which
Gaussian
Determine:
 maximum likelihood estimates of <m1,…,mk>
Think of full description of each instance as yi=<xi,zi1,zi2>
 zij is 1 if xi generated by j-th Gaussian
 xi observable
 zij unobservable
EM Algorithm
Converges to local maximum likelihood and provides
estimates of hidden variables zij.
In fact local maximum in E [ln (P(Y|h)]
 Y is complete (observable plus non-observable
variables) data
 Expected valued is taken over possible values of
unobserved variables in Y
General EM Problem
Given:
 observed data X = {x1,…,xm}
 unobserved data Z = {z1,…,zm}
 parameterized probability distribution P(Y|h) where
Y = {y1,…,ym} is the full data yi=<xi,zi>
h are the parameters
Determine:
 h that (locally) maximizes E[ln P(Y|h)]
Applications:
 train Bayesian Belief Networks
 unsupervised clustering
 hidden Markov models
General EM Method
Define likelihood function Q(h’|h) which calculates
Y = X  Z using observed X and current parameters h
to estimate Z
Q(h’|h) = E[ ln( P(Y|h’) | h, X]
EM algorithm:
Estimation (E) step: Calculate Q(h’|h) using the current
hypothesis h and the observed data X to estimate the
probability distribution over Y.
Q(h’|h) = E[ ln( P(Y|h’) | h, X]
Maximization (M) step: Replace hypothesis h by the
hypothesis h’ that maximizes this Q function.
h = argmaxh’H Q(h’|h)
Machine learning

Contenu connexe

Tendances

An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
butest
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

Tendances (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Machine learning
Machine learningMachine learning
Machine learning
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning ppt
Machine Learning pptMachine Learning ppt
Machine Learning ppt
 
Machine learning
Machine learningMachine learning
Machine learning
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Supervised Unsupervised and Reinforcement Learning
Supervised Unsupervised and Reinforcement Learning Supervised Unsupervised and Reinforcement Learning
Supervised Unsupervised and Reinforcement Learning
 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 

En vedette

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 

En vedette (14)

Modeling and Mining Sequential Data
Modeling and Mining Sequential DataModeling and Mining Sequential Data
Modeling and Mining Sequential Data
 
Learning With Complete Data
Learning With Complete DataLearning With Complete Data
Learning With Complete Data
 
Markov Models
Markov ModelsMarkov Models
Markov Models
 
Machine Learning and Artificial Intelligence; Our future relationship with th...
Machine Learning and Artificial Intelligence; Our future relationship with th...Machine Learning and Artificial Intelligence; Our future relationship with th...
Machine Learning and Artificial Intelligence; Our future relationship with th...
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 

Similaire à Machine learning

Lec 01
Lec 01Lec 01
Eick/Alpaydin Introduction
Eick/Alpaydin IntroductionEick/Alpaydin Introduction
Eick/Alpaydin Introduction
butest
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
butest
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
butest
 
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
ieijjournal1
 
slides
slidesslides
slides
butest
 

Similaire à Machine learning (20)

i2ml3e-chap1.pptx
i2ml3e-chap1.pptxi2ml3e-chap1.pptx
i2ml3e-chap1.pptx
 
Lec 01
Lec 01Lec 01
Lec 01
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)
 
Eick/Alpaydin Introduction
Eick/Alpaydin IntroductionEick/Alpaydin Introduction
Eick/Alpaydin Introduction
 
Ml topic1 a
Ml topic1 aMl topic1 a
Ml topic1 a
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
DSCI 552 machine learning for data science
DSCI 552 machine learning for data scienceDSCI 552 machine learning for data science
DSCI 552 machine learning for data science
 
i2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppti2ml-chap1-v1-1.ppt
i2ml-chap1-v1-1.ppt
 
NETWORJS3.pdf
NETWORJS3.pdfNETWORJS3.pdf
NETWORJS3.pdf
 
Basics of machine learning
Basics of machine learningBasics of machine learning
Basics of machine learning
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppt
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
Machine learning 101
Machine learning 101Machine learning 101
Machine learning 101
 
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
 
Buddi health class imbalance based deep learning
Buddi health   class imbalance based deep learningBuddi health   class imbalance based deep learning
Buddi health class imbalance based deep learning
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
 
slides
slidesslides
slides
 

Dernier

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Dernier (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Machine learning

  • 2. What is Machine Learning?  Adapt to / learn from data To optimize a performance function Can be used to: Extract knowledge from data Learn tasks that are difficult to formalise Create software that improves over time
  • 3.
  • 4.
  • 5. When to learn  Human expertise does not exist (navigating on Mars)  Humans are unable to explain their expertise (speech recognition)  Solution changes in time (routing on a computer network)  Solution needs to be adapted to particular cases (user biometrics) Learning involves  Learning general models from data  Data is cheap and abundant. Knowledge is expensive and scarce  Customer transactions to computer behaviour  Build a model that is a good and useful approximation to the data
  • 6. Applications  Speech and hand-writing recognition  Autonomous robot control  Data mining and bioinformatics: motifs, alignment, …  Playing games  Fault detection  Clinical diagnosis  Spam email detection  Credit scoring, fraud detection  Web mining: search engines  Market basket analysis, Applications are diverse but methods are generic
  • 7. Generic methods  Learning from labelled data (supervised learning) Eg. Classification, regression, prediction, function approx.  Learning from unlabelled data (unsupervised learning) Eg. Clustering, visualisation, dimensionality reduction  Learning from sequential data Eg. Speech recognition, DNA data analysis  Associations  Reinforcement Learning
  • 8. Statistical Learning Machine learning methods can be unified within the framework of statistical learning: Data is considered to be a sample from a probability distribution. Typically, we don’t expect perfect learning but only “probably correct” learning. Statistical concepts are the key to measuring our expected performance on novel problem instances.
  • 9. Induction and inference  Induction: Generalizing from specific examples.  Inference: Drawing conclusions from possibly incomplete knowledge. Learning machines need to do both.
  • 10. Inductive learning  Data produced by “target”.  Hypothesis learned from data in order to “explain”, “predict”,“model” or “control” target.  Generalisation ability is essential. Inductive learning hypothesis: “If the hypothesis works for enough data then it will work on new examples.”
  • 11. Example 1: Hand-written digits Data representation: Greyscale images Task: Classification (0,1,2,3…..9) Problem features:  Highly variable inputs from same class including some “weird” inputs,  imperfect human classification,  high cost associated with errors so “don’t know” may be useful.
  • 12.
  • 13. Example 2: Speech recognition Data representation: features from spectral analysis of speech signals (two in this simple example). Problem features:  Highly variable data with same classification.  Good feature selection is very important.  Speech recognition is often broken into a number of smaller tasks like this.
  • 14.
  • 15. Example 3: DNA microarrays  DNA from ~10000 genes attached to a glass slide (the microarray).  Green and red labels attached to mRNA from two different samples.  mRNA is hybridized (stuck) to the DNA on the chip and green/red ratio is used to measure relative abundance of gene products.
  • 16.
  • 17. DNA microarrays Data representation: ~10000 Green/red intensity levels ranging from 10-10000. Tasks: Sample classification, gene classification, visualisation and clustering of genes/samples. Problem features:  High-dimensional data but relatively small number of examples.  Extremely noisy data (noise ~ signal).  Lack of good domain knowledge.
  • 18. Projection of 10000 dimensional data onto 2D using PCA effectively separates cancer subtypes.
  • 19. Probabilistic models A large part of the module will deal with methods that have an explicit probabilistic interpretation:  Good for dealing with uncertainty eg. is a handwritten digit a three or an eight ?  Provides interpretable results  Unifies methods from different fields
  • 20. 20 of 15 Face Detection 1. Image pyramid used to locate faces of different sizes 2. Image lighting compensation 3. Neural Network detects rotation of face candidate 4. Final face candidate de-rotated ready for detection
  • 21. 21 of 15 Face Detection (Con’t) 5. Submit image to Neural Network a. Break image into segments b. Each segment is a unique input to the network c. Each segment looks for certain patterns (eyes, mouth, etc) 6. Output is likelihood of a face
  • 22. Supervised Learning: Uses  Prediction of future cases  Knowledge extraction  Compression of Data & knowledge
  • 23. Unsupervised Learning  Clustering: grouping similar instances  Example applications Customer segmentation in CRM Learning pattern in bioinformatics Clustering items based on similarity Clustering users based on interests
  • 24. Reinforcement Learning  Learning a policy: A sequence of outputs  No supervised output but delayed reward  Credit assignment problem  Game playing  Robot in a maze  Multiple agnts, partial observability
  • 25. ID3 Decision Tree  It is particularly interesting for Its representation of learned knowledge Its approach to the management of complexity Its heuristic for selecting candidate concepts Its potential for handling noisy data
  • 27. ID3 Decision Tree  The previous table can be represented as the following decision tree:
  • 28. ID3 Decision Tree  In a decision tree, each internal node represents a test on some property  Each possible value of that property corresponds to a branch of the tree  Leaf nodes represents classification, such as low or moderate risk
  • 29. ID3 Decision Tree  A simplified decision tree for credit risk management
  • 30. ID3 Decision Tree  ID3 constructs decision trees in a top-down fashion.  ID3 selects a property to test at the current node of the tree and uses this test to partition the set of examples  The algorithm recursively constructs a sub-tree for each parturition  This continues until all members of the partition are in the same class
  • 31. ID3 Decision Tree  For example, ID3 selects income as the root property for the first step
  • 33. ID3 Decision Tree  How to select the 1st node? (and the following nodes)  ID3 measures the information gained by making each property the root of current subtree  It picks the property that provides the greatest information gain
  • 34. ID3 Decision Tree  If we assume that all the examples in the table occur with equal probability, then: P(risk is high)=6/14 P(risk is moderate)=3/14 P(risk is low)=5/14
  • 35. ID3 Decision Tree  I[6,3,5]=  Based on 531.1) 14 5 (log 14 5 ) 14 3 (log 14 3 ) 14 6 (log 14 6 )5,3,6()( 222  IDInfo   n i ii mpmpMI 1 2 ))((log)()(
  • 36. ID3 Decision Tree  The information gain form income is: Gain(income)= I[6,3,5]-E[income]= 1.531-0.564=0.967 Similarly,  Gain(credit history)=0.266  Gain(debt)=0.063  Gain(colletral)=0.206
  • 37. ID3 Decision Tree  Since income provides the greatest information gain, ID3 will select it as the root of the tree
  • 39. Unsupervised Learning  The learning algorithms discussed so far implement forms of supervised learning  They assume the existence of a teacher, some fitness measure, or other external method of classifying training instances  Unsupervised Learning eliminates the teacher and requires that the learners form and evaluate concepts their own
  • 40. Unsupervised Learning  Science is perhaps the best example of unsupervised learning in humans  Scientists do not have the benefit of a teacher.  Instead, they propose hypotheses to explain observations,
  • 41. Unsupervised Learning  The result of this algorithm is a Binary Tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size  We may also extend this algorithm to objects represented as sets of symbolic features.
  • 42. Unsupervised Learning  Object1={small, red, rubber, ball}  Object1={small, blue, rubber, ball}  Object1={large, black, wooden, ball}  This metric would compute the similary values: Similarity(object1, object2)= ¾ Similarity(object1, object3)=1/4
  • 43. Machine Learning  Up till now: how to search or reason using a model  Machine learning: how to select a model on the basis of data / experience Learning parameters (e.g. probabilities) Learning hidden concepts (e.g. clustering)
  • 44. Classification  In classification, we learn to predict labels (classes) for inputs  Examples:  Spam detection (input: document, classes: spam / ham)  OCR (input: images, classes: characters)  Medical diagnosis (input: symptoms, classes: diseases)  Automatic essay grader (input: document, classes: grades)  Fraud detection (input: account activity, classes: fraud / no fraud)  Customer service email routing  … many more  Classification is an important commercial technology!
  • 45. Classification  Data:  Inputs x, class labels y  We imagine that x is something that has a lot of structure, like an image or document  In the basic case, y is a simple N-way choice  Basic Setup:  Training data: D = bunch of <x,y> pairs  Feature extractors: functions fi which provide attributes of an example x  Test data: more x’s, we must predict y’s
  • 46. Bayes Nets for Classification  One method of classification: Features are values for observed variables Y is a query variable Use probabilistic inference to compute most likely Y
  • 47. Simple Classification  Simple example: two binary features This is a naïve Bayes model M S F direct estimate Bayes estimate (no assumptions) Conditional independence +
  • 48. General Naïve Bayes  A general naive Bayes model: C E1 EnE2 |C| parameters n x |E| x |C| parameters |C| x |E|n parameters
  • 49. Inference for Naïve Bayes  Goal: compute posterior over causes  Step 1: get joint probability of causes and evidence  Step 2: get probability of evidence  Step 3: renormalize +
  • 50. A Digit Recognizer  Input: pixel grids  Output: a digit 0-9
  • 51. Examples: CPTs 1 0.1 2 0.1 3 0.1 4 0.1 5 0.1 6 0.1 7 0.1 8 0.1 9 0.1 0 0.1 1 0.01 2 0.05 3 0.05 4 0.30 5 0.80 6 0.90 7 0.05 8 0.60 9 0.50 0 0.80 1 0.05 2 0.01 3 0.90 4 0.80 5 0.90 6 0.90 7 0.25 8 0.85 9 0.60 0 0.80
  • 52. Parameter Estimation  Estimating the distribution of a random variable X or X|Y  Empirically: use training data  For each value x, look at the empirical rate of that value:  This estimate maximizes the likelihood of the data  Elicitation: ask a human!  Usually need domain experts, and sophisticated ways of eliciting probabilities (e.g. betting games)  Trouble calibrating r g g
  • 55. Gray level pictures: human action classification
  • 56. Expectation Maximization EM when to use  data is only partially observable  unsupervised clustering: target value unobservable  supervised learning: some instance attributes unobservable applications  training Bayesian Belief Networks  unsupervised clustering  learning hidden Markov models
  • 57. Generating Data from Mixture of Gaussians Each instance x generated by  choosing one of the k Gaussians at random  Generating an instance according to that Gaussian
  • 58. EM for Estimating k Means Given:  instances from X generated by mixture of k Gaussians  unknown means <m1,…,mk> of the k Gaussians  don’t know which instance xi was generated by which Gaussian Determine:  maximum likelihood estimates of <m1,…,mk> Think of full description of each instance as yi=<xi,zi1,zi2>  zij is 1 if xi generated by j-th Gaussian  xi observable  zij unobservable
  • 59. EM Algorithm Converges to local maximum likelihood and provides estimates of hidden variables zij. In fact local maximum in E [ln (P(Y|h)]  Y is complete (observable plus non-observable variables) data  Expected valued is taken over possible values of unobserved variables in Y
  • 60. General EM Problem Given:  observed data X = {x1,…,xm}  unobserved data Z = {z1,…,zm}  parameterized probability distribution P(Y|h) where Y = {y1,…,ym} is the full data yi=<xi,zi> h are the parameters Determine:  h that (locally) maximizes E[ln P(Y|h)] Applications:  train Bayesian Belief Networks  unsupervised clustering  hidden Markov models
  • 61. General EM Method Define likelihood function Q(h’|h) which calculates Y = X  Z using observed X and current parameters h to estimate Z Q(h’|h) = E[ ln( P(Y|h’) | h, X] EM algorithm: Estimation (E) step: Calculate Q(h’|h) using the current hypothesis h and the observed data X to estimate the probability distribution over Y. Q(h’|h) = E[ ln( P(Y|h’) | h, X] Maximization (M) step: Replace hypothesis h by the hypothesis h’ that maximizes this Q function. h = argmaxh’H Q(h’|h)