SlideShare une entreprise Scribd logo
1  sur  7
Multiclass Logistic Regression :
Derivation and Apache Spark Examples
Author: Marjan Sterjev
Logistic Regression is supervised classification algorithm. These kind of algorithms work this way:
1. Classification model is trained based on a provided training set of N samples those class
labels are known (the class labels are usually provided manually by human).
2. The class labels for new, previously unseen samples are predicted based on the model
generated in the previous step. This is known as sample classification.
Each sample in the training data set has M numerical features (coordinates) as well as class label
y . The number of classes is K and the class label can have one of the following values
0,1,2,... , K−1 . If K=2 the classifier is binary. For K≥3 the classifier is multiclass.
Particular sample Xi is represented as column vector of length M +1 :
Xi
T
=[1, x1 , x2 ,... xM ] (1)
Note that the first feature x0 for each sample vector is 1 and it is “artificially” added to the “native”
sample features in order to support the intercept in the model vectors.
The model is represented with K−1 vectors of size M +1 or equivalently, a matrix of dimension
[ K−1][ M +1] :
W =[
w1,0 w1,1 ... w1,M
w2, 0 w2,1 ... w2,M
... ... ... ...
wK−1,0 wK −1,1 ... wK −1,M
] (2)
We will denote the model column vectors as:
W 1
T
=[w1,0 ,w1, 1 ,..,w1,M ]
W 2
T
=[w2,0 ,w2,1 ,..,w2, M ]
...
W K −1
T
=[wK−1,0 ,wK −1,1 ,.., wK−1, M ]
(3)
The purpose of the Logistic Regression algorithm is to build a model that predicts the class label for a
given sample. In Multiclass Logistic Regression this is a two step process. First, the sample X is
projected into K probabilities, one probability per class:
1
P(0∣X ,W ) ,P(1∣X ,W ),... ,P( K−1∣X ,W ) (4)
Each probability is obtained as a function of the sample and the model vectors:
P(i∣X ,W )= f ( X ,W 1 ,W 2 ,...,W K−1) (5)
The class prediction output equals to the class having maximum probability for that particular sample.
The class probabilities are defined as:
P(0∣X ,W )=
1
1+∑
j=1
K −1
e
W j
T
X
and for k=1,2,... ,K−1 :
(6)
P(k∣X ,W )=
e
W k
T
X
1+∑
j=1
K−1
e
W j
T
X
,
(7)
Note that ∑
j=0
K −1
P( j∣X ,W )=1 i.e. the (6) and (7) define probability distribution.
The training set consists of N samples with known class labels:
Sample0=X 0 , y0
Sample1=X 1 , y1
...
SampleN −1=X N −1 , yN −1
(8)
The joint likelihood probability of the training set is:
∏
i=0
N−1
P( yi∣Xi ,W ) (9)
The Logistic Regression model training process is procedure that shall search for and find model
vectors W 1 ,W 2 ,..W K −1 that will maximize the above joint probability. The procedure is also known
as MLE (Maximum Likelihood Estimator).
Maximizing logarithm of some function is the same as maximizing the function itself. If we also divide
the logarithm with the number of samples in order to deal with average likelihood the result is:
2
L=
1
N
log(∏
i=0
N −1
P( yi∣X i ,W ))
L=
1
N
∑
i=0
N −1
log( P( yi∣X i ,W ))
(10)
If we substitute the probability formulas defined above we get:
L=
1
N
∑
i=0
N −1
log( P( yi∣X i ,W ))
L=
1
N
∑
i=0
N −1
( I ( yi =0)log(
1
1+∑
j=1
K −1
e
W j
T
X i
)+(1−I ( yi=0))log(
e
W yi
T
Xi
1+∑
j=1
K −1
e
W j
T
Xi
))
L=
1
N
∑
i=0
N −1
(1−( I ( yi=0))W yi
T
Xi+log(
1
1+∑
j=1
K −1
e
W j
T
Xi
))
L=
1
N
∑
i=0
N −1
((1−I ( yi=0))W yi
T
X i−log(1+∑
j=1
K −1
e
W j
T
Xi
))
(11)
where I is indicator function defined as I (true)=1, I ( false)=0 .
The likelihood depends on each model vector and each coefficient therein. The gradient against the
m-th coefficient in the k-th model vector, where k=1,2,... K−1 and m=0,1,2,... M is:
∂ L
∂wk ,m
=
1
N
∑
i=0
N −1
( I ( yi=k) Xi ,m−
e
W k
T
X
X i, m
1+∑
j=1
K −1
eW j
T
X i
)
∂ L
∂wk ,m
=
1
N
∑
i=0
N −1
Xi , m(I ( yi=k)−
e
W k
T
Xi
1+∑
j=1
K −1
e
W j
T
X i
)
∂ L
∂ wk , m
=
1
N
∑
i=0
N −1
X i, m(I ( yi=k)−P(k∣Xi ,W ))
(12)
The gradient against the whole model vector is a vector of gradients against each coefficient, i.e.:
3
∂ L
∂W k
=[
∂ L
∂ wk , 0
,
∂ L
∂ wk , 1
,
∂ L
∂ wk , 2
,...,
∂ L
∂wk , M
]
T
∂ L
∂ W k
=
1
N
∑
i=0
N −1
X i(I ( yi=k)−P(k∣X i ,W ))
(13)
If the coefficient gradient is positive, then likelihood increases if the coefficient increases. On the
contrary, if the gradient is negative then likelihood decreases if the coefficient increases.
The likelihood will be maximized if we update the coefficients proportionally, in the same direction with
the gradient:
wk ,m=wk ,m+λ
∂ L
∂ wk , m
W k =W k +λ
∂ L
∂W k
(14)
The procedure is known as Gradient Ascent. If we were dealing with minimization of loss function
(quadratic loss, negative likelihood) then the update in (14) will be in the opposite direction (minus)
which is known as Gradient Descent.
Logistic Regression is iterative algorithm. The model vectors start from some initial state (all zeros for
example) and they are recalculated in each iteration. The updated vectors are input for the gradient
calculations in the next iteration step. The iterative procedure ends after some maximum number of
iterations or in the case when model vectors do not change substantially with each next iteration.
The parameter λ is the update step size. It is usually a number like 0.1 or number that decreases
with each iteration. For example:
a=−2log(5)
λ=stepe
a∗i
maxIterations
(15)
The Iris Data Set Preparation
The examples below demonstrate training of Multiclass Logistic Regression model against the Iris
data set. The Iris data set is well known and can be found online. The algorithm requires numeric class
labels. For that purpose the labels Iris-setosa, Iris-versicolor, Iris-virginica shall be replaced with 0, 1 or
2 accordingly. Most text editors support this kind of find/replace modification.
You can try the following examples in the Spark shell.
4
Apache Spark Multiclass Logistic Regression Example
import scala.util._
import org.apache.spark.sql._
object ArrayExt extends Serializable{
implicit class ArrayIntOperations(a: Array[Int]) extends Serializable{
def +(b: Array[Int]): Array[Int] = (a, b).zipped.map(_ + _)
}
implicit class ArrayDoubleOperations(a: Array[Double]) extends Serializable{
def +(b: Array[Double]): Array[Double] = (a, b).zipped.map(_ + _)
def -(b: Array[Double]): Array[Double] = (a, b).zipped.map(_ - _)
def *(b: Array[Double]): Array[Double] = (a, b).zipped.map(_ * _)
def *(b: Double): Array[Double] = a.map(_ * b)
def /(b: Array[Double]): Array[Double] = (a,b).zipped.map(_ / _)
def /(b: Double): Array[Double] = a.map(_ / b)
def dot(b: Array[Double]): Double = (a,b).zipped.map(_ * _).sum
}
implicit class ArrayDouble2Operations(a: Array[Array[Double]]) extends Serializable{
def +(b: Array[Array[Double]]): Array[Array[Double]] = (a, b).zipped.map(_ + _)
def -(b: Array[Array[Double]]): Array[Array[Double]] = (a, b).zipped.map(_ - _)
def *(b: Array[Array[Double]]): Array[Array[Double]] = (a, b).zipped.map(_ * _)
def *(b: Double): Array[Array[Double]]= a.map(_ * b)
def /(b: Array[Array[Double]]): Array[Array[Double]] = (a, b).zipped.map(_ / _)
def /(b: Int):Array[Array[Double]] = a.map(_ / b)
}
}
import ArrayExt._
case class IrisSample(values: Array[Double], label: Int, var predicted: Int, sampler:
Double)
case class Accumulator(var gradient: Array[Array[Double]], var count: Int)
//Load the Iris data set
val data = sc.textFile("C:/ml/iris-mv.data").map(line=>{
val parts = line.split(",")
val values = Array(1.0, parts(0).toDouble, parts(1).toDouble, parts(2).toDouble,
parts(3).toDouble)
val label = parts(4).toInt
IrisSample(values, label, -1, Random.nextDouble)
})
data.cache()
val trainData = data.filter(sample => sample.sampler >= 0.4)
trainData.cache()
val testData = data.filter(sample => sample.sampler < 0.4)
testData.cache()
//MLE Logistic Regression
val numFeatures = 4
val numClasses = 3
val maxNumIterations = 200
val step = 1.0
val mi = 0.01
val a = -2 * Math.log(5)
5
var w = Array.ofDim[Double](numClasses - 1, numFeatures + 1)
var finished = false
for(i<-0 to maxNumIterations if ! finished){
val lambda = step * Math.exp(a * i / maxNumIterations)
println(s"Round $i with lambda = $lambda ...")
val accumulator = trainData.aggregate(Accumulator(Array.ofDim[Double](numClasses - 1,
numFeatures + 1), 0))(
(a, sample) => {
val exponents: Array[Double] = w.map(v => Math.exp(v dot sample.values))
val referenceClassProbability = 1.0 / (1.0 + exponents.sum)
val classProbabilities = exponents * referenceClassProbability
for(i <- a.gradient.indices) {
val indicator = if(sample.label == (i+1)) 1 else 0
val probability = classProbabilities(i)
a.gradient(i) = a.gradient(i)+sample.values*(indicator - probability)
}
a.count = a.count +1
a
},
(x,y)=>{Accumulator(x.gradient + y.gradient, x.count + y.count)}
)
val w_old = w.clone
val gradient = accumulator.gradient / accumulator.count
val update = gradient - w * mi
w = w + update * lambda
val w_diff = w-w_old
finished = w_diff.map(x => Math.sqrt(x dot x)).forall(_ < 0.01)
}
def predict(w: Array[Array[Double]],sample: IrisSample): Int = {
val exponents: Array[Double] = w.map(v => Math.exp(v dot sample.values))
val referenceClassProbability = 1.0 / (1.0 + exponents.sum)
val classProbabilities = exponents * referenceClassProbability
val classMaxProbability = classProbabilities.max
val predicted = if(classMaxProbability < referenceClassProbability) 0 else
(classProbabilities.indexOf(classMaxProbability) + 1)
predicted
}
val irisPredictions = testData.map(sample => {
sample.predicted=predict(w, sample)
sample
})
val sqlContext=new SQLContext(sc)
import sqlContext.implicits._
val irisPredictionsDF = irisPredictions.zipWithIndex.map({case (sample,i)=>
(i,sample.label, sample.predicted)
}).toDF("id","label","predicted")
irisPredictionsDF.registerTempTable("iris_predictions")
sqlContext.sql("SELECT id, label,predicted FROM iris_predictions WHERE label !=
predicted").show(100)
6
Apache Spark MLlib Logistic Regression Example
import org.apache.spark.SparkContext
import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.sql._
//Load the Iris data set
val data=sc.textFile("C:/ml/iris-mv.data").map(line => {
val parts = line.split(",")
val values = Vectors.dense(parts(0).toDouble, parts(1).toDouble, parts(2).toDouble,
parts(3).toDouble)
val label = parts(4).toDouble
LabeledPoint(label, values)
})
val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
val training = splits(0).cache()
val test = splits(1).cache()
val model = new LogisticRegressionWithLBFGS()
.setNumClasses(3)
.run(training)
val predictionAndLabels = test.map { case LabeledPoint(label, features) =>
val prediction = model.predict(features)
(prediction, label)
}
val sqlContext=new SQLContext(sc)
import sqlContext.implicits._
val irisPredictionsDF=predictionAndLabels.toDF("label","predicted")
irisPredictionsDF.registerTempTable("iris_predictions")
sqlContext.sql("SELECT label,predicted FROM iris_predictions WHERE label !=
predicted").show(100)
7

Contenu connexe

Tendances

MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2arogozhnikov
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsDmitriy Selivanov
 
Recsys matrix-factorizations
Recsys matrix-factorizationsRecsys matrix-factorizations
Recsys matrix-factorizationsDmitriy Selivanov
 
Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Palak Sanghani
 
MLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackMLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackarogozhnikov
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackarogozhnikov
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Simplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution AlgorithmsSimplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution AlgorithmsPK Lehre
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussiannozomuhamada
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydSri Ambati
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting treeDong Guo
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines SimplyEmad Nabil
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic netVivian S. Zhang
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackarogozhnikov
 
Reweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEPReweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEParogozhnikov
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 

Tendances (20)

MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender Systems
 
Recsys matrix-factorizations
Recsys matrix-factorizationsRecsys matrix-factorizations
Recsys matrix-factorizations
 
Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]
 
MLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackMLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic track
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Simplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution AlgorithmsSimplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution Algorithms
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian2012 mdsp pr12 k means mixture of gaussian
2012 mdsp pr12 k means mixture of gaussian
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 
Mechanical Engineering Assignment Help
Mechanical Engineering Assignment HelpMechanical Engineering Assignment Help
Mechanical Engineering Assignment Help
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
Reweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEPReweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEP
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 

Similaire à Multiclass Logistic Regression: Derivation and Apache Spark Examples

Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlabBilawalBaloch1
 
Principles of functional progrmming in scala
Principles of functional progrmming in scalaPrinciples of functional progrmming in scala
Principles of functional progrmming in scalaehsoon
 
Intelligent System Optimizations
Intelligent System OptimizationsIntelligent System Optimizations
Intelligent System OptimizationsMartin Zapletal
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache sparkEmiliano Martinez Sanchez
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Languageleague
 
Introduction à Scala - Michel Schinz - January 2010
Introduction à Scala - Michel Schinz - January 2010Introduction à Scala - Michel Schinz - January 2010
Introduction à Scala - Michel Schinz - January 2010JUG Lausanne
 
Introduction to scala
Introduction to scalaIntroduction to scala
Introduction to scalaMichel Perez
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_reportRavi Gupta
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
 

Similaire à Multiclass Logistic Regression: Derivation and Apache Spark Examples (20)

Introducing scala
Introducing scalaIntroducing scala
Introducing scala
 
Scala Bootcamp 1
Scala Bootcamp 1Scala Bootcamp 1
Scala Bootcamp 1
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Scala Collections
Scala CollectionsScala Collections
Scala Collections
 
Scala collections
Scala collectionsScala collections
Scala collections
 
Principles of functional progrmming in scala
Principles of functional progrmming in scalaPrinciples of functional progrmming in scala
Principles of functional progrmming in scala
 
I stata
I stataI stata
I stata
 
Intelligent System Optimizations
Intelligent System OptimizationsIntelligent System Optimizations
Intelligent System Optimizations
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
An introduction to scala
An introduction to scalaAn introduction to scala
An introduction to scala
 
Chapter 16
Chapter 16Chapter 16
Chapter 16
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache spark
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Language
 
Introduction à Scala - Michel Schinz - January 2010
Introduction à Scala - Michel Schinz - January 2010Introduction à Scala - Michel Schinz - January 2010
Introduction à Scala - Michel Schinz - January 2010
 
Introduction to scala
Introduction to scalaIntroduction to scala
Introduction to scala
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
 

Dernier

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Dernier (20)

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Multiclass Logistic Regression: Derivation and Apache Spark Examples

  • 1. Multiclass Logistic Regression : Derivation and Apache Spark Examples Author: Marjan Sterjev Logistic Regression is supervised classification algorithm. These kind of algorithms work this way: 1. Classification model is trained based on a provided training set of N samples those class labels are known (the class labels are usually provided manually by human). 2. The class labels for new, previously unseen samples are predicted based on the model generated in the previous step. This is known as sample classification. Each sample in the training data set has M numerical features (coordinates) as well as class label y . The number of classes is K and the class label can have one of the following values 0,1,2,... , K−1 . If K=2 the classifier is binary. For K≥3 the classifier is multiclass. Particular sample Xi is represented as column vector of length M +1 : Xi T =[1, x1 , x2 ,... xM ] (1) Note that the first feature x0 for each sample vector is 1 and it is “artificially” added to the “native” sample features in order to support the intercept in the model vectors. The model is represented with K−1 vectors of size M +1 or equivalently, a matrix of dimension [ K−1][ M +1] : W =[ w1,0 w1,1 ... w1,M w2, 0 w2,1 ... w2,M ... ... ... ... wK−1,0 wK −1,1 ... wK −1,M ] (2) We will denote the model column vectors as: W 1 T =[w1,0 ,w1, 1 ,..,w1,M ] W 2 T =[w2,0 ,w2,1 ,..,w2, M ] ... W K −1 T =[wK−1,0 ,wK −1,1 ,.., wK−1, M ] (3) The purpose of the Logistic Regression algorithm is to build a model that predicts the class label for a given sample. In Multiclass Logistic Regression this is a two step process. First, the sample X is projected into K probabilities, one probability per class: 1
  • 2. P(0∣X ,W ) ,P(1∣X ,W ),... ,P( K−1∣X ,W ) (4) Each probability is obtained as a function of the sample and the model vectors: P(i∣X ,W )= f ( X ,W 1 ,W 2 ,...,W K−1) (5) The class prediction output equals to the class having maximum probability for that particular sample. The class probabilities are defined as: P(0∣X ,W )= 1 1+∑ j=1 K −1 e W j T X and for k=1,2,... ,K−1 : (6) P(k∣X ,W )= e W k T X 1+∑ j=1 K−1 e W j T X , (7) Note that ∑ j=0 K −1 P( j∣X ,W )=1 i.e. the (6) and (7) define probability distribution. The training set consists of N samples with known class labels: Sample0=X 0 , y0 Sample1=X 1 , y1 ... SampleN −1=X N −1 , yN −1 (8) The joint likelihood probability of the training set is: ∏ i=0 N−1 P( yi∣Xi ,W ) (9) The Logistic Regression model training process is procedure that shall search for and find model vectors W 1 ,W 2 ,..W K −1 that will maximize the above joint probability. The procedure is also known as MLE (Maximum Likelihood Estimator). Maximizing logarithm of some function is the same as maximizing the function itself. If we also divide the logarithm with the number of samples in order to deal with average likelihood the result is: 2
  • 3. L= 1 N log(∏ i=0 N −1 P( yi∣X i ,W )) L= 1 N ∑ i=0 N −1 log( P( yi∣X i ,W )) (10) If we substitute the probability formulas defined above we get: L= 1 N ∑ i=0 N −1 log( P( yi∣X i ,W )) L= 1 N ∑ i=0 N −1 ( I ( yi =0)log( 1 1+∑ j=1 K −1 e W j T X i )+(1−I ( yi=0))log( e W yi T Xi 1+∑ j=1 K −1 e W j T Xi )) L= 1 N ∑ i=0 N −1 (1−( I ( yi=0))W yi T Xi+log( 1 1+∑ j=1 K −1 e W j T Xi )) L= 1 N ∑ i=0 N −1 ((1−I ( yi=0))W yi T X i−log(1+∑ j=1 K −1 e W j T Xi )) (11) where I is indicator function defined as I (true)=1, I ( false)=0 . The likelihood depends on each model vector and each coefficient therein. The gradient against the m-th coefficient in the k-th model vector, where k=1,2,... K−1 and m=0,1,2,... M is: ∂ L ∂wk ,m = 1 N ∑ i=0 N −1 ( I ( yi=k) Xi ,m− e W k T X X i, m 1+∑ j=1 K −1 eW j T X i ) ∂ L ∂wk ,m = 1 N ∑ i=0 N −1 Xi , m(I ( yi=k)− e W k T Xi 1+∑ j=1 K −1 e W j T X i ) ∂ L ∂ wk , m = 1 N ∑ i=0 N −1 X i, m(I ( yi=k)−P(k∣Xi ,W )) (12) The gradient against the whole model vector is a vector of gradients against each coefficient, i.e.: 3
  • 4. ∂ L ∂W k =[ ∂ L ∂ wk , 0 , ∂ L ∂ wk , 1 , ∂ L ∂ wk , 2 ,..., ∂ L ∂wk , M ] T ∂ L ∂ W k = 1 N ∑ i=0 N −1 X i(I ( yi=k)−P(k∣X i ,W )) (13) If the coefficient gradient is positive, then likelihood increases if the coefficient increases. On the contrary, if the gradient is negative then likelihood decreases if the coefficient increases. The likelihood will be maximized if we update the coefficients proportionally, in the same direction with the gradient: wk ,m=wk ,m+λ ∂ L ∂ wk , m W k =W k +λ ∂ L ∂W k (14) The procedure is known as Gradient Ascent. If we were dealing with minimization of loss function (quadratic loss, negative likelihood) then the update in (14) will be in the opposite direction (minus) which is known as Gradient Descent. Logistic Regression is iterative algorithm. The model vectors start from some initial state (all zeros for example) and they are recalculated in each iteration. The updated vectors are input for the gradient calculations in the next iteration step. The iterative procedure ends after some maximum number of iterations or in the case when model vectors do not change substantially with each next iteration. The parameter λ is the update step size. It is usually a number like 0.1 or number that decreases with each iteration. For example: a=−2log(5) λ=stepe a∗i maxIterations (15) The Iris Data Set Preparation The examples below demonstrate training of Multiclass Logistic Regression model against the Iris data set. The Iris data set is well known and can be found online. The algorithm requires numeric class labels. For that purpose the labels Iris-setosa, Iris-versicolor, Iris-virginica shall be replaced with 0, 1 or 2 accordingly. Most text editors support this kind of find/replace modification. You can try the following examples in the Spark shell. 4
  • 5. Apache Spark Multiclass Logistic Regression Example import scala.util._ import org.apache.spark.sql._ object ArrayExt extends Serializable{ implicit class ArrayIntOperations(a: Array[Int]) extends Serializable{ def +(b: Array[Int]): Array[Int] = (a, b).zipped.map(_ + _) } implicit class ArrayDoubleOperations(a: Array[Double]) extends Serializable{ def +(b: Array[Double]): Array[Double] = (a, b).zipped.map(_ + _) def -(b: Array[Double]): Array[Double] = (a, b).zipped.map(_ - _) def *(b: Array[Double]): Array[Double] = (a, b).zipped.map(_ * _) def *(b: Double): Array[Double] = a.map(_ * b) def /(b: Array[Double]): Array[Double] = (a,b).zipped.map(_ / _) def /(b: Double): Array[Double] = a.map(_ / b) def dot(b: Array[Double]): Double = (a,b).zipped.map(_ * _).sum } implicit class ArrayDouble2Operations(a: Array[Array[Double]]) extends Serializable{ def +(b: Array[Array[Double]]): Array[Array[Double]] = (a, b).zipped.map(_ + _) def -(b: Array[Array[Double]]): Array[Array[Double]] = (a, b).zipped.map(_ - _) def *(b: Array[Array[Double]]): Array[Array[Double]] = (a, b).zipped.map(_ * _) def *(b: Double): Array[Array[Double]]= a.map(_ * b) def /(b: Array[Array[Double]]): Array[Array[Double]] = (a, b).zipped.map(_ / _) def /(b: Int):Array[Array[Double]] = a.map(_ / b) } } import ArrayExt._ case class IrisSample(values: Array[Double], label: Int, var predicted: Int, sampler: Double) case class Accumulator(var gradient: Array[Array[Double]], var count: Int) //Load the Iris data set val data = sc.textFile("C:/ml/iris-mv.data").map(line=>{ val parts = line.split(",") val values = Array(1.0, parts(0).toDouble, parts(1).toDouble, parts(2).toDouble, parts(3).toDouble) val label = parts(4).toInt IrisSample(values, label, -1, Random.nextDouble) }) data.cache() val trainData = data.filter(sample => sample.sampler >= 0.4) trainData.cache() val testData = data.filter(sample => sample.sampler < 0.4) testData.cache() //MLE Logistic Regression val numFeatures = 4 val numClasses = 3 val maxNumIterations = 200 val step = 1.0 val mi = 0.01 val a = -2 * Math.log(5) 5
  • 6. var w = Array.ofDim[Double](numClasses - 1, numFeatures + 1) var finished = false for(i<-0 to maxNumIterations if ! finished){ val lambda = step * Math.exp(a * i / maxNumIterations) println(s"Round $i with lambda = $lambda ...") val accumulator = trainData.aggregate(Accumulator(Array.ofDim[Double](numClasses - 1, numFeatures + 1), 0))( (a, sample) => { val exponents: Array[Double] = w.map(v => Math.exp(v dot sample.values)) val referenceClassProbability = 1.0 / (1.0 + exponents.sum) val classProbabilities = exponents * referenceClassProbability for(i <- a.gradient.indices) { val indicator = if(sample.label == (i+1)) 1 else 0 val probability = classProbabilities(i) a.gradient(i) = a.gradient(i)+sample.values*(indicator - probability) } a.count = a.count +1 a }, (x,y)=>{Accumulator(x.gradient + y.gradient, x.count + y.count)} ) val w_old = w.clone val gradient = accumulator.gradient / accumulator.count val update = gradient - w * mi w = w + update * lambda val w_diff = w-w_old finished = w_diff.map(x => Math.sqrt(x dot x)).forall(_ < 0.01) } def predict(w: Array[Array[Double]],sample: IrisSample): Int = { val exponents: Array[Double] = w.map(v => Math.exp(v dot sample.values)) val referenceClassProbability = 1.0 / (1.0 + exponents.sum) val classProbabilities = exponents * referenceClassProbability val classMaxProbability = classProbabilities.max val predicted = if(classMaxProbability < referenceClassProbability) 0 else (classProbabilities.indexOf(classMaxProbability) + 1) predicted } val irisPredictions = testData.map(sample => { sample.predicted=predict(w, sample) sample }) val sqlContext=new SQLContext(sc) import sqlContext.implicits._ val irisPredictionsDF = irisPredictions.zipWithIndex.map({case (sample,i)=> (i,sample.label, sample.predicted) }).toDF("id","label","predicted") irisPredictionsDF.registerTempTable("iris_predictions") sqlContext.sql("SELECT id, label,predicted FROM iris_predictions WHERE label != predicted").show(100) 6
  • 7. Apache Spark MLlib Logistic Regression Example import org.apache.spark.SparkContext import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.sql._ //Load the Iris data set val data=sc.textFile("C:/ml/iris-mv.data").map(line => { val parts = line.split(",") val values = Vectors.dense(parts(0).toDouble, parts(1).toDouble, parts(2).toDouble, parts(3).toDouble) val label = parts(4).toDouble LabeledPoint(label, values) }) val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L) val training = splits(0).cache() val test = splits(1).cache() val model = new LogisticRegressionWithLBFGS() .setNumClasses(3) .run(training) val predictionAndLabels = test.map { case LabeledPoint(label, features) => val prediction = model.predict(features) (prediction, label) } val sqlContext=new SQLContext(sc) import sqlContext.implicits._ val irisPredictionsDF=predictionAndLabels.toDF("label","predicted") irisPredictionsDF.registerTempTable("iris_predictions") sqlContext.sql("SELECT label,predicted FROM iris_predictions WHERE label != predicted").show(100) 7