SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Introduction HMM Window Based MaxEnt CRF Summary References
Machine Learning for Sequential
Data: A Review
MD2K Reading Group
March 12, 2015
1 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Classical Supervised Learning
Given train set {(x1, y1), (x2, y2), ..., (xn, yn)}
x – features, independent variables, scalar/vector i.e. |x| ≥ 1; y ∈ Y –
labels/classes, dependent variables, scalar i.e. |y| = 1
Learn a model h ∈ H such that y = h(x)
Example: character classfication, x–image of hand written character,
y ∈ {A, B, ...Z}
y1
x1
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
1 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Sequential Supervised Learning (SSL)
Given train set (x1,n, y1,n), (x2,n, y2,n), ..., (xl,n, yl,n)
l – training instances each of length n (all training instances need not be of
the same length i.e. n could vary)
x – features, independent variables, scalar/vector; y ∈ Y – labels/classes,
dependent variables
Learn a model h ∈ H such that yl = h(xl)
SSL is different from time series prediction, sequence classification
Leverage sequential patterns and interactions (lines - L to R; dotted - R to
L)
Example: POS tagging, x–‘the dog saw a cat’ (English sentence),
y = {D, N, V, D, N}
yl,1
xl,1
yl,t−1
xl,t−1
yl,t
xl,t
yl,t+1
xl,t+1
yl,n
xl,n
2 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
1 Hidden Markov Models
2 Window based Approaches
3 Maximum Entropy Models
4 Conditional Random Fields
3 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Hidden Markov Models (HMM)
p(y|x) =
p(x|y) × p(y)
p(x)
(Baye’s rule, single class)
= p(x|y) × p(y) (since p(x) is the same across all classes)
= p(x, y)
= p(x1|x2, ..., xn, y) × p(x2|x3, ..., xn, y) × ... × p(y)
= p(x1|y) × p(x2|y) × ... × p(y) (N¨aive Bayes assumption)
∝ p(y)
n
i=1
p(xi|y) (N¨aive Bayes model, single class)
p(y|x) =
n
i=1
p(yi) × p(xi|yi) (predict whole sequence; x, y are vectors)
=
n
i=1
p(yi|yi−1) × p(xi|yi) (first order Markov property; tack on y0)
P(x) =
y∈Y
n
i=1
p(yi|yi−1) × p(xi|yi)
(Y–all possible combinations of y sequences)
4 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
HMM (contd...)
y1
x1
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
HMM‘s are generative models i.e. model joint probability p(x, y)
Predicts whole sequence
Models only the first order Markov property not suitable for many real
world applications
xt only influences yt. Cannot model dependencies like p(xt|yt−1, yt, yt+1)
which implies xt influences {yt−1, yt, yt+1}
5 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Sliding Window Approach
Sliding windows consider a window of features to make a decision e.g. yt
looks at xt−1, xt, xt+1 to make a decision
Predict single class
Can utilize any existing supervised learning algorithms without modification
e.g. SVM, logistic regression, etc
Cannot model dependencies between y labels (both short and long range)
y1
x1
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
6 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Recurrent Sliding Window Approach
Similar to sliding window approach
Models short range dependencies by using previous decision (yt−1) when
making current decision (yt)
Problem: Need y values when training and testing
y1
x1
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
7 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Maximum Entropy Model (MaxEnt)
Based on the Principle of Maximum Entropy (Jaynes, 1957)
– if incomplete information about a probability distribution
is available then the unbiased assumption that can be made
is a distribution which is as uniform as possible given the
available information
Uniform distribution - maximum entropy (primal problem)
Model available information - expressed as constraints over
training data (dual problem)
Discriminative model i.e. models p(y|x)
Predict a single class
8 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
MaxEnt (contd...)
I. Model the known (dual problem)
Train set = {(x1, y1), (x2, y2), ..., (xn, yn)} (given)
˜p(x, y) =
1
N
× No. of times (x, y) occurs in train set
(i.e. joint probability table)
fi(x, y) =
1, if y = k AND x = xk
0, otherwise
(e.g. y = physical activity AND x = HR ≥ 110bpm; 1 ≤ i ≤ m, m–number of features)
˜E(fi) =
x,y
˜p(x, y) × fi(x, y) (expected value of fi from training data)
E(fi) =
x,y
p(x, y) × fi(x, y)
(expected value of fi under model distribution)
=
x,y
p(y|x) × p(x) × fi(x, y)
=
x,y
p(y|x) × ˜p(x) × fi(x, y) (replace p(x) with ˜p(x))
we need to only learn the conditional probability as opposed to joint probability
9 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
MaxEnt (contd...)
˜E(fi) = E(fi)
x,y
˜p(x, y) × fi(x, y) =
x,y
p(y|x) × ˜p(x) × fi(x, y)
(goal is to find best conditional probability p∗(y|x))
II. Make zero assumptions about the unknown (primal problem)
H(y|x) = −
(x,y)∈(X×Y)
p(x, y) log p(y|x) (conditional Entropy)
III. Objective function and Lagrange multipliers
Λ(p∗
(y|x), ¯λ) = H(y|x) +
m
i=1
λi E(fi) − ˜E(fi) + λm+1


y∈Y
p(y|x) − 1


(objective function)
p∗
¯λ
(y|x) =
1
Z¯λ(x)
exp
m
i=1
λifi(x, y)
(maximize conditional distribution subject to constraints)
p∗
¯λ
(yt|yt−1, x) =
1
Z¯λ(yt−1, x)
exp
m
i=1
λifi(x, y)
(inducing the Markov property results in Maximum Entropy Markov Model (MEMM))
10 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Conditional Random Fields (CRF)
Discriminative model i.e. models p(y|x)
Conditional probability, p(y|x), is modeled as a product of
factors ψk(xk, yk)
Factors have log-linear representation –
ψk(Xk, yk) = exp(λk × φk(xk, yk))
Predicts whole sequence
p(y|x) =
1
Z(x)
C=C
ΨC(xC, yC) (CRF general form)
11 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Linear Chain CRF
y1
x1
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
φF φF φF φF φF φF φF φF φF
φT φT φT φT φT φT φT φT
p(yt|xt) =
1
Z(x)
exp (λF × φF (yt, xt) + λT × φT (yt, yt−1))
(individual prediction)
p(y|x) =
1
Z(x)
n
i=1
exp(λF × φF (yi, xi) + λT × φT (yi, yi−1))
(predict whole sequence; tack on y0)
p(y|x) =
1
Z(x)
n
i=1
exp


k
j=1
λj × φj(yi, yi−1, xi)


(general form of linear chain CRF‘s)
12 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
CRF (contd...)
y1
x1
y2
x2
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
φ1
φ2
p(yt|x, y1:t−1) =
1
Z(x)
exp(λ1 × φ1(yt, xt) + λ2 × φ2(yt, yt−1)+
λ3 × φ3(yt, x2) + λ4 × φ4(yt, xt−1) + λ5 × φ5(yt, xt+1)+
λ6 × φ6(yt, y1)) (additional features; induce loops)
13 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Figure: Sample CRF 14 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Model Space
Figure: Graphical models for sequential data[4]
Further reading refer to [3, 4, 2, 1]
15 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Berger, A.
A brief maxent tutorial.
www.cs.cmu.edu/afs/cs/user/aberger/www/html/tutorial/tutorial.html.
Blake, A., Kohli, P., and Rother, C.
Markov random fields for vision and image processing.
Mit Press, 2011.
Dietterich, T. G.
Machine learning for sequential data: A review.
In Structural, syntactic, and statistical pattern recognition. Springer, 2002,
pp. 15–30.
Klinger, R., and Tomanek, K.
Classical probabilistic models and conditional random fields.
TU, Algorithm Engineering, 2007.
16 / 16

Contenu connexe

Tendances

Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputszukun
 
Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Université de Liège (ULg)
 
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersLecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersMarina Santini
 
Lecture 5: Structured Prediction
Lecture 5: Structured PredictionLecture 5: Structured Prediction
Lecture 5: Structured PredictionMarina Santini
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
 
Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...Université de Liège (ULg)
 
Contribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric SpacesContribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric SpacesAM Publications,India
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
Loss Calibrated Variational Inference
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational InferenceTomasz Kusmierczyk
 
Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture" Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture" ieee_cis_cyprus
 
Module ii sp
Module ii spModule ii sp
Module ii spVijaya79
 

Tendances (20)

Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...
 
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersLecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
 
Lecture 5: Structured Prediction
Lecture 5: Structured PredictionLecture 5: Structured Prediction
Lecture 5: Structured Prediction
 
Slides ihp
Slides ihpSlides ihp
Slides ihp
 
Slides dauphine
Slides dauphineSlides dauphine
Slides dauphine
 
Slides compiegne
Slides compiegneSlides compiegne
Slides compiegne
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...
 
Contribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric SpacesContribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric Spaces
 
Madrid easy
Madrid easyMadrid easy
Madrid easy
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Loss Calibrated Variational Inference
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational Inference
 
Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture" Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture"
 
Module ii sp
Module ii spModule ii sp
Module ii sp
 

En vedette

Overview of solutions for machine monitoring
Overview of solutions for machine monitoringOverview of solutions for machine monitoring
Overview of solutions for machine monitoringIvan Zgela
 
Complex System Engineering
Complex System EngineeringComplex System Engineering
Complex System EngineeringEmmanuel Fuchs
 
Faulty radiographs
Faulty     radiographsFaulty     radiographs
Faulty radiographsmelbia shine
 
FINITE STATE MACHINE AND CHOMSKY HIERARCHY
FINITE STATE MACHINE AND CHOMSKY HIERARCHYFINITE STATE MACHINE AND CHOMSKY HIERARCHY
FINITE STATE MACHINE AND CHOMSKY HIERARCHYnishimanglani
 

En vedette (6)

Overview of solutions for machine monitoring
Overview of solutions for machine monitoringOverview of solutions for machine monitoring
Overview of solutions for machine monitoring
 
Theory of machines
Theory of machinesTheory of machines
Theory of machines
 
Complex System Engineering
Complex System EngineeringComplex System Engineering
Complex System Engineering
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Faulty radiographs
Faulty     radiographsFaulty     radiographs
Faulty radiographs
 
FINITE STATE MACHINE AND CHOMSKY HIERARCHY
FINITE STATE MACHINE AND CHOMSKY HIERARCHYFINITE STATE MACHINE AND CHOMSKY HIERARCHY
FINITE STATE MACHINE AND CHOMSKY HIERARCHY
 

Similaire à Machine Learning Models for Sequential Data

Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fieldslswing
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimizationhelalmohammad2
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdf02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdfJunZhao68
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
 
Monte Carlo Methods
Monte Carlo MethodsMonte Carlo Methods
Monte Carlo MethodsJames Bell
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Gaussian process in machine learning
Gaussian process in machine learningGaussian process in machine learning
Gaussian process in machine learningVARUN KUMAR
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..butest
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problemsSSA KPI
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel LearningMasahiro Suzuki
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted ParaproductsVjekoslavKovac1
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 

Similaire à Machine Learning Models for Sequential Data (20)

Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fields
 
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
rinko2010
rinko2010rinko2010
rinko2010
 
02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdf02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdf
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
 
Monte Carlo Methods
Monte Carlo MethodsMonte Carlo Methods
Monte Carlo Methods
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Gaussian process in machine learning
Gaussian process in machine learningGaussian process in machine learning
Gaussian process in machine learning
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problems
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 

Plus de BBKuhn

Sound shredding moustafa
Sound shredding moustafaSound shredding moustafa
Sound shredding moustafaBBKuhn
 
Smoking soujanya
Smoking soujanyaSmoking soujanya
Smoking soujanyaBBKuhn
 
Presentation yamin
Presentation yaminPresentation yamin
Presentation yaminBBKuhn
 
Md2k 0219 shang
Md2k 0219 shangMd2k 0219 shang
Md2k 0219 shangBBKuhn
 
2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learning2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learningBBKuhn
 
Md2 k 04_19_2015
Md2 k 04_19_2015Md2 k 04_19_2015
Md2 k 04_19_2015BBKuhn
 
March19 tun
March19 tunMarch19 tun
March19 tunBBKuhn
 
March12 rahman
March12 rahmanMarch12 rahman
March12 rahmanBBKuhn
 
March12 chatterjee
March12 chatterjeeMarch12 chatterjee
March12 chatterjeeBBKuhn
 
March12 alzantot
March12 alzantotMarch12 alzantot
March12 alzantotBBKuhn
 
March5 gao
March5 gaoMarch5 gao
March5 gaoBBKuhn
 
March5 bargar
March5 bargarMarch5 bargar
March5 bargarBBKuhn
 
MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)BBKuhn
 

Plus de BBKuhn (13)

Sound shredding moustafa
Sound shredding moustafaSound shredding moustafa
Sound shredding moustafa
 
Smoking soujanya
Smoking soujanyaSmoking soujanya
Smoking soujanya
 
Presentation yamin
Presentation yaminPresentation yamin
Presentation yamin
 
Md2k 0219 shang
Md2k 0219 shangMd2k 0219 shang
Md2k 0219 shang
 
2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learning2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learning
 
Md2 k 04_19_2015
Md2 k 04_19_2015Md2 k 04_19_2015
Md2 k 04_19_2015
 
March19 tun
March19 tunMarch19 tun
March19 tun
 
March12 rahman
March12 rahmanMarch12 rahman
March12 rahman
 
March12 chatterjee
March12 chatterjeeMarch12 chatterjee
March12 chatterjee
 
March12 alzantot
March12 alzantotMarch12 alzantot
March12 alzantot
 
March5 gao
March5 gaoMarch5 gao
March5 gao
 
March5 bargar
March5 bargarMarch5 bargar
March5 bargar
 
MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)
 

Dernier

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 

Dernier (20)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 

Machine Learning Models for Sequential Data

  • 1. Introduction HMM Window Based MaxEnt CRF Summary References Machine Learning for Sequential Data: A Review MD2K Reading Group March 12, 2015 1 / 16
  • 2. Introduction HMM Window Based MaxEnt CRF Summary References Classical Supervised Learning Given train set {(x1, y1), (x2, y2), ..., (xn, yn)} x – features, independent variables, scalar/vector i.e. |x| ≥ 1; y ∈ Y – labels/classes, dependent variables, scalar i.e. |y| = 1 Learn a model h ∈ H such that y = h(x) Example: character classfication, x–image of hand written character, y ∈ {A, B, ...Z} y1 x1 yt−1 xt−1 yt xt yt+1 xt+1 yn xn 1 / 16
  • 3. Introduction HMM Window Based MaxEnt CRF Summary References Sequential Supervised Learning (SSL) Given train set (x1,n, y1,n), (x2,n, y2,n), ..., (xl,n, yl,n) l – training instances each of length n (all training instances need not be of the same length i.e. n could vary) x – features, independent variables, scalar/vector; y ∈ Y – labels/classes, dependent variables Learn a model h ∈ H such that yl = h(xl) SSL is different from time series prediction, sequence classification Leverage sequential patterns and interactions (lines - L to R; dotted - R to L) Example: POS tagging, x–‘the dog saw a cat’ (English sentence), y = {D, N, V, D, N} yl,1 xl,1 yl,t−1 xl,t−1 yl,t xl,t yl,t+1 xl,t+1 yl,n xl,n 2 / 16
  • 4. Introduction HMM Window Based MaxEnt CRF Summary References 1 Hidden Markov Models 2 Window based Approaches 3 Maximum Entropy Models 4 Conditional Random Fields 3 / 16
  • 5. Introduction HMM Window Based MaxEnt CRF Summary References Hidden Markov Models (HMM) p(y|x) = p(x|y) × p(y) p(x) (Baye’s rule, single class) = p(x|y) × p(y) (since p(x) is the same across all classes) = p(x, y) = p(x1|x2, ..., xn, y) × p(x2|x3, ..., xn, y) × ... × p(y) = p(x1|y) × p(x2|y) × ... × p(y) (N¨aive Bayes assumption) ∝ p(y) n i=1 p(xi|y) (N¨aive Bayes model, single class) p(y|x) = n i=1 p(yi) × p(xi|yi) (predict whole sequence; x, y are vectors) = n i=1 p(yi|yi−1) × p(xi|yi) (first order Markov property; tack on y0) P(x) = y∈Y n i=1 p(yi|yi−1) × p(xi|yi) (Y–all possible combinations of y sequences) 4 / 16
  • 6. Introduction HMM Window Based MaxEnt CRF Summary References HMM (contd...) y1 x1 yt−1 xt−1 yt xt yt+1 xt+1 yn xn HMM‘s are generative models i.e. model joint probability p(x, y) Predicts whole sequence Models only the first order Markov property not suitable for many real world applications xt only influences yt. Cannot model dependencies like p(xt|yt−1, yt, yt+1) which implies xt influences {yt−1, yt, yt+1} 5 / 16
  • 7. Introduction HMM Window Based MaxEnt CRF Summary References Sliding Window Approach Sliding windows consider a window of features to make a decision e.g. yt looks at xt−1, xt, xt+1 to make a decision Predict single class Can utilize any existing supervised learning algorithms without modification e.g. SVM, logistic regression, etc Cannot model dependencies between y labels (both short and long range) y1 x1 yt−1 xt−1 yt xt yt+1 xt+1 yn xn 6 / 16
  • 8. Introduction HMM Window Based MaxEnt CRF Summary References Recurrent Sliding Window Approach Similar to sliding window approach Models short range dependencies by using previous decision (yt−1) when making current decision (yt) Problem: Need y values when training and testing y1 x1 yt−1 xt−1 yt xt yt+1 xt+1 yn xn 7 / 16
  • 9. Introduction HMM Window Based MaxEnt CRF Summary References Maximum Entropy Model (MaxEnt) Based on the Principle of Maximum Entropy (Jaynes, 1957) – if incomplete information about a probability distribution is available then the unbiased assumption that can be made is a distribution which is as uniform as possible given the available information Uniform distribution - maximum entropy (primal problem) Model available information - expressed as constraints over training data (dual problem) Discriminative model i.e. models p(y|x) Predict a single class 8 / 16
  • 10. Introduction HMM Window Based MaxEnt CRF Summary References MaxEnt (contd...) I. Model the known (dual problem) Train set = {(x1, y1), (x2, y2), ..., (xn, yn)} (given) ˜p(x, y) = 1 N × No. of times (x, y) occurs in train set (i.e. joint probability table) fi(x, y) = 1, if y = k AND x = xk 0, otherwise (e.g. y = physical activity AND x = HR ≥ 110bpm; 1 ≤ i ≤ m, m–number of features) ˜E(fi) = x,y ˜p(x, y) × fi(x, y) (expected value of fi from training data) E(fi) = x,y p(x, y) × fi(x, y) (expected value of fi under model distribution) = x,y p(y|x) × p(x) × fi(x, y) = x,y p(y|x) × ˜p(x) × fi(x, y) (replace p(x) with ˜p(x)) we need to only learn the conditional probability as opposed to joint probability 9 / 16
  • 11. Introduction HMM Window Based MaxEnt CRF Summary References MaxEnt (contd...) ˜E(fi) = E(fi) x,y ˜p(x, y) × fi(x, y) = x,y p(y|x) × ˜p(x) × fi(x, y) (goal is to find best conditional probability p∗(y|x)) II. Make zero assumptions about the unknown (primal problem) H(y|x) = − (x,y)∈(X×Y) p(x, y) log p(y|x) (conditional Entropy) III. Objective function and Lagrange multipliers Λ(p∗ (y|x), ¯λ) = H(y|x) + m i=1 λi E(fi) − ˜E(fi) + λm+1   y∈Y p(y|x) − 1   (objective function) p∗ ¯λ (y|x) = 1 Z¯λ(x) exp m i=1 λifi(x, y) (maximize conditional distribution subject to constraints) p∗ ¯λ (yt|yt−1, x) = 1 Z¯λ(yt−1, x) exp m i=1 λifi(x, y) (inducing the Markov property results in Maximum Entropy Markov Model (MEMM)) 10 / 16
  • 12. Introduction HMM Window Based MaxEnt CRF Summary References Conditional Random Fields (CRF) Discriminative model i.e. models p(y|x) Conditional probability, p(y|x), is modeled as a product of factors ψk(xk, yk) Factors have log-linear representation – ψk(Xk, yk) = exp(λk × φk(xk, yk)) Predicts whole sequence p(y|x) = 1 Z(x) C=C ΨC(xC, yC) (CRF general form) 11 / 16
  • 13. Introduction HMM Window Based MaxEnt CRF Summary References Linear Chain CRF y1 x1 yt−1 xt−1 yt xt yt+1 xt+1 yn xn φF φF φF φF φF φF φF φF φF φT φT φT φT φT φT φT φT p(yt|xt) = 1 Z(x) exp (λF × φF (yt, xt) + λT × φT (yt, yt−1)) (individual prediction) p(y|x) = 1 Z(x) n i=1 exp(λF × φF (yi, xi) + λT × φT (yi, yi−1)) (predict whole sequence; tack on y0) p(y|x) = 1 Z(x) n i=1 exp   k j=1 λj × φj(yi, yi−1, xi)   (general form of linear chain CRF‘s) 12 / 16
  • 14. Introduction HMM Window Based MaxEnt CRF Summary References CRF (contd...) y1 x1 y2 x2 yt−1 xt−1 yt xt yt+1 xt+1 yn xn φ1 φ2 p(yt|x, y1:t−1) = 1 Z(x) exp(λ1 × φ1(yt, xt) + λ2 × φ2(yt, yt−1)+ λ3 × φ3(yt, x2) + λ4 × φ4(yt, xt−1) + λ5 × φ5(yt, xt+1)+ λ6 × φ6(yt, y1)) (additional features; induce loops) 13 / 16
  • 15. Introduction HMM Window Based MaxEnt CRF Summary References Figure: Sample CRF 14 / 16
  • 16. Introduction HMM Window Based MaxEnt CRF Summary References Model Space Figure: Graphical models for sequential data[4] Further reading refer to [3, 4, 2, 1] 15 / 16
  • 17. Introduction HMM Window Based MaxEnt CRF Summary References Berger, A. A brief maxent tutorial. www.cs.cmu.edu/afs/cs/user/aberger/www/html/tutorial/tutorial.html. Blake, A., Kohli, P., and Rother, C. Markov random fields for vision and image processing. Mit Press, 2011. Dietterich, T. G. Machine learning for sequential data: A review. In Structural, syntactic, and statistical pattern recognition. Springer, 2002, pp. 15–30. Klinger, R., and Tomanek, K. Classical probabilistic models and conditional random fields. TU, Algorithm Engineering, 2007. 16 / 16