SlideShare a Scribd company logo
1 of 64
Download to read offline
Universit´e des Actuaires
Machine-Learning pour les donn´ees massives:
algorithmes randomis´es, en ligne et distribu´es
St´ephan Cl´emen¸con
Institut Mines T´el´ecom - T´el´ecom ParisTech
July 7, 2014
Agenda
I Technologies Big Data: contexte et opportunit´es
I Machine-Learning: un bref tour d’horizon
I Les d´efis des applications du ML dans l’industrie et les
services:
Vers une g´en´eralisation?
”Big Data” - Le contexte
Une accumulation de donn´ees massives dans de nombreux
domaines:
I Biologie/M´edecine (g´enomique, m´etabolomique, essais
cliniques, imagerie, etc.
I Grande distribution, marketing (CRM), e-commerce
I Moteurs de recherche internet (contenu multimedia)
I R´eseaux sociaux (Facebook, Tweeter, ...)
I Banque/Finance (risque de march´e/liquidit´e, acc`es au cr´edit)
I S´ecurit´e (ex: biom´etrie, vid´eosurveillance)
I Administrations (Sant´e Publique, Douanes)
I Risques op´erationnels
”Big Data” - Le contexte
Un d´eluge de donn´ees qui rend inop´erant:
I les outils basiques de
I stockage de donn´ees
I gestion de base de donn´ees (MySQL)
I le pr´etraitement reposant sur l’expertise humaine
I indexation, analyse s´emantique
I mod´elisation
I intelligence d´ecisionnelle
”Big Data” - Le contexte
Une multitude de briques technologiques et de services disponibles
pour:
I La parall´elisation massive (Velocity)
I Le calcul distribu´e (Volume)
I La gestion de donn´ees sans sch´ema pr´ed´efini (Variety)
parmi lesquels:
I Le mod`ele de programmation MapReduce: calculs
parall´elis´es/distribu´ees
I Framework Hadoop
I NoSQL: SGBD Cassandra, MongoDB, bases de donn´ees
orient´ees graphe, moteur de recherche Elasticsearch, etc.
I Clouds: infrastructures, plate-formes, logiciels as a Service
promus par Google, Amazon, Facebook, etc.
”Big Data” - Les opportunit´es
Des avanc´ees spectaculaires pour
I la collecte et le stockage (distribu´e) des donn´ees
I la recherche automatique d’objets, de contenu
I le partage de donn´ees peu structur´ees
Le Big Data: un moteur pour la technologie, la science,
l’´economie
I Moteurs de recherche, moteurs de recommandation
I Maintenance pr´edictive
I Marketing viral `a travers les r´eseaux sociaux
I D´etection des fraudes
I M´edecine individualis´ee
I Publicit´e en ligne (retargeting)
”Big Data” - Les opportunit´es
Ubiquit´e
De nombreux secteurs d’activit´e sont concern´es:
I (e-) Commerce
I CRM
I Sant´e
I D´efense, renseignement (e.g. cybers´ecurit´e, biom´etrie)
I Banque/Finance
I Transports ”intelligents”
I etc.
Big Data - Recherche
Afin d’exploiter les donn´ees (pr´ediction, interp´etation),
d´evelopper des technologies math´ematiques permettant de
r´esoudre les probl`emes computationnels li´es:
I aux contraintes du quasi-temps r´eel
! apprentissage automatique s´equentiel (”on-line”) 6= batch,
par renforcement
I au caract`ere distribu´e des donn´ees/ressources
! apprentissage automatique distribu´e
I `a la volum´etrie des donn´ees
! impact des techniques de sondages sur la performance des
algorithmes
”Big Data” - Recherche
Des techniques de visualisation, repr´esentation de donn´ees
complexes
I Graphes (´evolutifs) - clustering, graph-mining
I Image, audio, video - filtrage, compression
I Donn´ees textuelles (e.g. page web, tweet)
Domaines
I Probabilit´e, Statistique
I Machine-Learning
I Optimisation
I Traitement du signal et de l’image
I Analyse Harmonique Computationnelle
I Analyse s´emantique
I etc.
Goals of Statistical Learning
I Statistical issues cast as M-estimation problems:
I Classification
I Regression
I Density level set estimation
I Compression, sparse representation
I ... and their variants
I Minimal assumptions on the distribution
I Build realistic M-estimators for special criteria
I Questions
I Theory: optimal elements, consistency, non-asymptotic excess
risk bounds, fast rates of convergence, oracle inequalities
I Practice: numerical optimization, convexification,
randomization, relaxation, constraints (distributed
architectures, real-time, memory, etc.)
Main Example: Classification (Pattern Recognition)
I (X, Y ) random pair with unknown distribution P
I X 2 X observation vector
I Y 2 { 1, +1} binary label/class
I A posteriori probability ⇠ regression function
8x 2 X, ⌘(x) = {Y = 1 | X = x}
I g : X ! { 1, +1} classifier
I Performance measure = classification error
L(g) = g(X) 6= Y ! min
g
I Solution: Bayes rule
8x 2 X, g⇤
(x) = 2{⌘(x) > 1/2} 1
I Bayes error L⇤ = L(g⇤)
Main Paradigm - Empirical Risk Minimization
I Sample (X1, Y1), . . . , (Xn, Yn) with i.i.d. copies of
(X, Y ),class of classifiers
I Empirical Risk Minimization principle
bgn = Argmin
g2
Ln(g) :=
1
n
nX
i=1
{g(Xi ) 6= Yi }
I Best classifier in the class
¯g = Argmin
g2
L(g)
I Concentration inequality
With probability 1 :
sup
g2
| Ln(g) L(g) | C
r
V
n
+
r
2 log(1/ )
n
Machine-Learning - Achievements
I Numerous applications:
I Supervised anomaly detection
I Handwritten digit recognition
I Face recognition
I Medical diagnosis
I Credit-risk screening
I CRM
I Speech recognition
I Monitoring of complex systems
I etc.
I Many ”o↵-the-shelf” methods
I Neural Networks
I Support Vector Machines
I Boosting
I Vector quantization
I etc.
Machine-Learning - Challenges
I Ongoing intense research activity, motivated by
I Need for increasing performance
I Evolution of computing environments (data centers, clouds,
HDFS)
I New applications/problems: recommending systems, search
engines, medical imagery, yield management etc.
I The Big Data era
I Mathematical/computational challenges
I Volume (data deluge): ubiquity of sensors, high dimension,
distributed storage/processing systems
I Variety (of data structures): text, graphs, images, signals
I Velocity (real-time): on-line prediction, evolutionary
environment, reinforcement learning strategies (exploration vs
exploitation)
Statistical Learning - Milestones
I The 30’s - Fisher’s (parametric/Gaussian) statistics
I Linear Discriminant Analysis
I Linear (logistic) regression
I PCA, . . .
Statistical Learning - Milestones
I The 60’s and 70’s - F. Rosenblatt’s perceptron & VC theory
I First ”machine-learning” algorithm (linear binary classification)
I Inspired by congnitive sciences
I Convexification, one-pass/on-line (stochastic gradient descent)
I Relaxation, large margin linear classifiers, structural ERM
Statistical Learning - Milestones
I The 80’s - Neural Networks & Decision Trees
I Artificial Intelligence ”A theory of learnability” Valiant ’84
I The Backpropagation algorithm
I The CART algorithm (’84)
Statistical Learning - Milestones
I From the 90’s - Kernels & Boosting
I Kernel trick: SVM, nonlinear PCA, . . .
I AdaBoost (’95)
I Lasso, compressed sensing
I A comprehensive theory beyond VC concepts
I Rebirth of Q-learning
Applications
Supervised Learning - Pattern Recognition/Regression
I Data with labels, e.g. (Xi , Yi ) 2 Rd
⇥ { 1, +1}, i = 1, . . . , n.
Learn to predict Y based on X
I Example: in Quality Control, X features of the product and/or
production factors, Y = +1 if ”defect” and Y = 1 otherwise.
Build a decision rule C minimizing L(C) = P{Y 6= C(X)}
Applications
Supervised Learning - Scoring
I Data with labels, e.g. (Xi , Yi ) 2 Rd
⇥ { 1, +1}, i = 1, . . . , n.
Learn to rank all possible observations X in the same order as
that induced by P{Y = +1 | X} through a scoring function
s(X)
Applications
Supervised Learning - Image Recognition
I Objects are assigned to data (pixels), e.g. biometrics
I Goal: learn to assign objects to new data
Empirical Risk Minimization and Stochastic Approximation
I Most learning problems consists of minimizing a functional
L(f ) = E[ (Z, f )]
where Z is the observation, f a decision rule candidate
I In general, a stochastic approximation inductive method must
be implemented
ft+1 = ft ⇢t
drf L(ft),
where drf L is a statistical estimate of L’s gradient based on
training data Z1, . . . , Zn
Empirical Risk Minimization and Stochastic Approximation
I Popular algorithms are based on these principles
I Examples: Logit, Neural Networks, linear SVM, etc.
I Computational advantages but too rigid (underfitting)
Kernel Trick
I Apply a simple algorithm but... in a transformed space
K(x, x0
) = h (x), (x0
)i
I Examples: Nonlinear SVM, Kernel PCA, SVR
I Kernels for images, text data, biological sequences, etc.
Greedy Algorithms
I Recursive methods exploring exhaustively a structured
space at each step
I Examples: CART, projection pursuit, matching pursuit, etc.
I Highly interpretable/visualizable but poor performance
No Free Lunch
Ensemble learning
Heuristic: combine predictions output by weak decision rules
Amit & Geman (’97) for image recognition
I Example committee-based binary classification:
predict Y 2 { 1, +1} based on X
Cagg (X) = sgn
MX
m=1
!mCm(X)
!
,
where !m controls the impact of the vote of weak rule Cm
I The Bootstrap Aggregating method - Breiman (’96)
The Cm’s re learnt from bootstrap versions of the training
data and !m ⌘ 1
) Bagging reduces instability of prediction rules
Ensemble learning
I The Adaptive Boosting algorithm for binary classification
Freund & Shapire (’95) - Slow learning
I AdaBoost can be interpreted as a forward additive
stagewise modelling strategy to minimize a convexified
version of the risk
E[exp( Y
MX
m=1
↵mCm(X))]
I A serious competitor: Random Forest, Breiman (’01)
Bagging applied to randomized decision trees
I Boosting methods and Random Forests outperform older
methods in most cases
Applications
Unsupervised Learning - Anomaly/Novelty Detection
I Data with no labels, e.g. Xi 2 Rd
, i = 1, . . . , n
I Example: monitoring of complex systems,
e.g. aircraft systems, fraud detection, predictive maintenance,
cybersecurity
I Detect abnormal observations - Rarity replaces labeling
I 1-class SVM: bG↵ = {x 2 X :
Pn
i=1 ↵i K(x, Xi ) tµ}
Applications
Unsupervised Learning - Anomaly/Novelty Ranking
I Data with no labels, e.g. Xi 2 Rd , i = 1, . . . , n
I Rank data by degree of novelty/abnormality
I Distributed Fleet Monitoring: check the 5 % the most
abnormal, then the next 5 %, etc.
10
Optimal
Good
Bad
(Extremal behavior)
MASS
VOLUME
(Modes)
Feature Selection
A quick algorithm has been proposed by Efron et al (2002) to
compute
bn( ) = Argmin
"
1
n
nX
i=1
(yi xT
i )2
+
pX
i=1
| i |
#
,
for all > 0.
As decreases one obtains more and more active (i.e. non zero)
coe cients. The regularization path
7! bn( )
thus defines as # 0 a sequence of models with increasing
dimension.
Variable selection
Under `1 constraints, the points that are the `2-furthest away from
the origin are on the axes (zero coe cient):
−1.0 −0.5 0.0 0.5 1.0
−1.0−0.50.00.51.0
l1
unit ball
containing l2
ball
contained l2
ball
Industrial example (Renault Technocentre)
Regularization path: 1,2, . . . , 1,p VS
0 0.5 1 1.5 2 2.5 3
−1.2
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
v1
v2
v3
v4
v5
v6
v7
v8
v9
v10
v11
v12
v13
v14
v15
Lasso - L1 penalty
I Many variants: group Lasso, lasso and elastic net
I L1 penalty ensures sparsity
I Compressed sensing: Cand`es & Tao (’04), Donoho (’04)
I Numerous applications, e.g. matrix completion, recommender
systems
Spectral clustering - Ng et al. (’01)
I Partition the vertices of a graph, clustering
I Graph Laplacian L = D W , D is the degree matrix and W
is the adjacency/weight matrix
I Spectral Clustering using the normalised version
˜L = D 1/2LD 1/2
(i) Find k-smallest eigenvectors Vk = (v1, . . . , vk ) of ˜L
(ii) Normalise Vk ’s rows: Vk diag(Vk V t
k ) 1/2
Vk
(iii) Cluster rows of Vk with the k-means algorithm
Spectral clustering - Ng et al. (’01)
Spectral clustering - Ng et al. (’01)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Community Detection
Spectral clustering - Ng et al. (’01)
I Partition the vertices of a graph, clustering
I Graph Laplacian L = D W , D is the degree matrix and W
is the adjacency/weight matrix
I Spectral Clustering using the normalised version
˜L = D 1/2LD 1/2
(i) Find k-smallest eigenvectors Vk = (v1, . . . , vk ) of ˜L
(ii) Normalise Vk ’s rows: Vk diag(Vk V t
k ) 1/2
Vk
(iii) Cluster rows of Vk with the k-means algorithm
Scaling-up Machine-Learning Algorithms
I ”Smart Randomization/Sampling”
I Massive Parallelization: break a large optimization problem
into smaller problems
e.g. Cascade SVM, parallel large-scale feature selection,
parallel clustering
I Distributed Optimization
I Many frameworks are available:
MapReduce (+In Memory=PLANET, IBM PML, Mahout),
DryadLINQ, MADlib, Storm, etc.
How to apply the ERM paradigm to Big Data?
I Suppose that n is too large to evaluate the empirical risk
Ln(g)
I Common sense: run your preferred learning algorithm using a
subsample of ”reasonable” size B << n, e.g. by drawing with
replacement in the original training data set...
How to apply the ERM paradigm to Big Data?
I Suppose that n is too large to evaluate the empirical risk
Ln(g)
I Common sense: run your preferred learning algorithm using a
subsample of ”reasonable” size B << n, e.g. by drawing with
replacement in the original training data set...
I ... but of course, statistical performance is downgraded
1/
p
n << 1/
p
B
”Smart sampling”
I Use side information and implement your mini-batch SGD
with a Horvitz-Thompson estimate of the local gradient
I In various situations, the performance criterion is not a basic
sample mean statistic any more but a U-statistic
I Examples:
I Clustering: within cluster point scatter related to a partition P
2
n(n 1)
X
i<j
D(Xi , Xj )
X
C2P
I{(Xi , Xj ) 2 C2
}
I Graph inference (link prediction)
I Ranking
I · · ·
Example: Ranking
I Data with ordinal label:
(X1, Y1), . . . , (Xn, Yn) 2 X ⇥ {1, . . . , K}
⌦n
Example: Ranking
I Data with ordinal label:
(X1, Y1), . . . , (Xn, Yn) 2 X ⇥ {1, . . . , K}
⌦n
I Want to: rank X1, . . . , Xn through a scoring function
s : X ! s.t.
s(X) and Y tend to increase/decrease together with high
probability
Example: Ranking
I Data with ordinal label:
(X1, Y1), . . . , (Xn, Yn) 2 X ⇥ {1, . . . , K}
⌦n
I Want to: rank X1, . . . , Xn through a scoring function
s : X ! s.t.
s(X) and Y tend to increase/decrease together with high
probability
I Quantitative formulation: maximize the criterion
L(s) = P{s(X(1)
) < . . . < s(X(k)
) | Y (1)
= 1, . . . , Y (K)
= K}
Example: Ranking
I Data with ordinal label:
(X1, Y1), . . . , (Xn, Yn) 2 X ⇥ {1, . . . , K}
⌦n
I Want to: rank X1, . . . , Xn through a scoring function
s : X ! s.t.
s(X) and Y tend to increase/decrease together with high
probability
I Quantitative formulation: maximize the criterion
L(s) = P{s(X(1)
) < . . . < s(X(k)
) | Y (1)
= 1, . . . , Y (K)
= K}
I Observations: nk i.i.d. copies of X given Y = k,
X
(k)
1 , . . . , X
(k)
nk
n = n1 + . . . + nK
Example: Ranking
I A natural empirical counterpart of L(s) is
bLn(s) =
Pn1
i1=1 · · ·
PnK
iK =1 I
n
s(X
(1)
i1
) < . . . < s(X
(K)
iK
)
o
n1 ⇥ · · · ⇥ nK
,
Example: Ranking
I A natural empirical counterpart of L(s) is
bLn(s) =
Pn1
i1=1 · · ·
PnK
iK =1 I
n
s(X
(1)
i1
) < . . . < s(X
(K)
iK
)
o
n1 ⇥ · · · ⇥ nK
,
I But the number of terms to be summed is prohibitive!
n1 ⇥ . . . ⇥ nK
Example: Ranking
I A natural empirical counterpart of L(s) is
bLn(s) =
Pn1
i1=1 · · ·
PnK
iK =1 I
n
s(X
(1)
i1
) < . . . < s(X
(K)
iK
)
o
n1 ⇥ · · · ⇥ nK
,
I But the number of terms to be summed is prohibitive!
n1 ⇥ . . . ⇥ nK
I Maximization of bLn(s) is computationally unfeasible...
Generalized U-statistics
I K 1 samples and degrees (d1, . . . , dK ) 2 N⇤K
I (X
(k)
1 , . . . , X
(k)
nk ), 1  k  K, K independent i.i.d. samples
drawn from Fk(dx) on Xk respectively
I Kernel H : Xd1
1 ⇥ · · · ⇥ XdK
K ! R, square integrable w.r.t.
µ = F⌦d1
1 ⌦ · · · ⌦ F⌦dK
K
Generalized U-statistics
Definition
The K-sample U-statistic of degrees (d1, . . . , dK ) with kernel H is
Un(H) =
P
I1
. . .
P
IK
H(X
(1)
I1
; X
(2)
I2
; . . . ; X
(K)
IK
)
n1
d1
⇥ · · · nK
dK
,
where
P
Ik
refers to summation over all nk
dk
subsets
X
(k)
Ik
= (X
(k)
i1
, . . . , X
(k)
idk
) related to a set Ik of dk indexes
1  i1 < . . . < idk
 nk
It is said symmetric when H is permutation symmetric in each set
of dk arguments X
(k)
Ik
.
References: Lee (1990)
Generalized U-statistics
I Unbiased estimator of
✓(H) = E[H(X
(1)
1 , . . . , X
(1)
d1
, . . . , X
(K)
1 , . . . , X
(K)
dk
)]
with minimum variance
I Asymptotically Gaussian as nk/n ! k > 0 for
k = 1, . . . , K
I Its computation requires the summation of
KY
k=1
✓
nk
dk
◆
terms
I K-partite ranking: dk = 1 for 1  k  K
Hs(x1, . . . , xK ) = I {s(x1) < s(x2) < · · · < s(xK )}
Incomplete U-statistics
I Replace Un(H) by an incomplete version, involving much less
terms
I Build a set DB of cardinality B built by sampling with
replacement in the set ⇤ of indexes
((i
(1)
1 , . . . , i
(1)
d1
), . . . , (i
(K)
1 , . . . , i
(K)
dK
))
with 1  i
(k)
1 < . . . < i
(k)
dk
 nk, 1  k  K
I Compute the Monte-Carlo version based on B terms
eUB(H) =
1
B
X
(I1, ..., IK )2DB
H(X
(1)
I1
, . . . , X
(K)
IK
)
I An incomplete U-statistic is NOT a U-statistic
M-Estimation based on incomplete U-statistics
I Replace the criterion by a tractable incomplete version based
on B = O(n) terms
min
H2H
eUB(H)
I This leads to investigate the maximal deviations
sup
H2H
eUB(H) Un(H)
Main Result
Theorem
Let H be a VC major class of bounded symmetric kernels of finite
VC dimension V < +1. Set MH = sup(H,x)2H⇥X |H(x)|. Then,
(i) P
n
supH2H
eUB(H) Un(H) > ⌘
o

2(1 + #⇤)V ⇥ e B⌘2/M2
H
(ii) for all 2 (0, 1), with probability at least 1 , we have:
1
MH
sup
H2H
eUB(H) E
h
eUB(H)
i
 2
r
2V log(1 + )

+
r
log(2/ )

+
r
V log(1 + #⇤) + log(4/ )
B
,
where  = min{bn1/d1c, . . . , bnK /dK c}
Consequences
I Empirical risk sampling with B = O(n) yields a rate bound of
the order O(
p
log n/n)
I One su↵ers no loss in terms of learning rate, while drastically
reducing computational cost
Example: Ranking
Empirical ranking performance for SVMrank based on 1%, 5%,
10%, 20% and 100% of the ”LETOR 2007” dataset.
Context
Consider a network composed of N agents
I Agents process local data
I Agents cooperate to estimate some global parameter
1/5
A regression example
Data set formed by n samples (Xi , Yi ) (i = 1 . . . n)
I Yi = variable to be explained
I Xi = explanatory features
Looking for a model.
Linear regression example:
min
x
nX
i=1
kYi xT
Xi k2
2/5
A regression example
Data set formed by n samples (Xi , Yi ) (i = 1 . . . n)
I Yi = variable to be explained
I Xi = explanatory features
Looking for a model.
min
x
nX
i=1
`(xT
Xi , Yi ) + r(x)
2/5
A regression example
Data set formed by n samples (Xi , Yi ) (i = 1 . . . n)
I Yi = variable to be explained
I Xi = explanatory features
Looking for a model.
min
x
nX
i=1
`(xT
Xi , Yi ) + r(x)
Distributed processing: the problem is separable
min
x
X
v2V
nX
i=1
`(xT
Xi,v , Yi,v ) + r(x)
[Boyd’11, Agarwal’11]
2/5
Formally
min
x
X
v2V
fv (x)
I G = (V , E) is the graph modelling the network
I fv is the cost function of agent v
Di culty :
P
v fv is nowhere observed.
Methods : from distributed gradient algorithms to advanced proximal methods
Common principle :
1. process local data
2. exchange information with neighbors
3. iterate.
3/5
Key issues
I Distribute cutting-edge optimization algorithms (eq. primal-dual methods,
fast-admm, etc.)
I Include stochastic perturbations:
I On-line algorithms
I Asynchronism
I Investigate specific ML application, e.g. ranking
4/5

More Related Content

Viewers also liked

Détection de profils, application en santé et en économétrie geissler
Détection de profils, application en santé et en économétrie   geisslerDétection de profils, application en santé et en économétrie   geissler
Détection de profils, application en santé et en économétrie geisslerKezhan SHI
 
Insurance fraud through collusion - Pierre Picard
Insurance fraud through collusion - Pierre PicardInsurance fraud through collusion - Pierre Picard
Insurance fraud through collusion - Pierre PicardKezhan SHI
 
Les enjeux de la dépendance – laure de montesquieu, scor
Les enjeux de la dépendance – laure de montesquieu, scorLes enjeux de la dépendance – laure de montesquieu, scor
Les enjeux de la dépendance – laure de montesquieu, scorKezhan SHI
 
Optimal discretization of hedging strategies rosenbaum
Optimal discretization of hedging strategies   rosenbaumOptimal discretization of hedging strategies   rosenbaum
Optimal discretization of hedging strategies rosenbaumKezhan SHI
 
Confidentialité des données michel béra
Confidentialité des données   michel béraConfidentialité des données   michel béra
Confidentialité des données michel béraKezhan SHI
 
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des ActuairesNorme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des ActuairesKezhan SHI
 
Eurocroissance arnaud cohen
Eurocroissance arnaud cohenEurocroissance arnaud cohen
Eurocroissance arnaud cohenKezhan SHI
 
L'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data scienceL'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data scienceKezhan SHI
 
Arbres de régression et modèles de durée
Arbres de régression et modèles de duréeArbres de régression et modèles de durée
Arbres de régression et modèles de duréeKezhan SHI
 
Loi hamon sébastien bachellier
Loi hamon sébastien bachellierLoi hamon sébastien bachellier
Loi hamon sébastien bachellierKezhan SHI
 
Présentation Françoise Soulié Fogelman
Présentation Françoise Soulié FogelmanPrésentation Françoise Soulié Fogelman
Présentation Françoise Soulié FogelmanKezhan SHI
 
Big data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuairesBig data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuairesKezhan SHI
 
Big data en (ré)assurance régis delayet
Big data en (ré)assurance   régis delayetBig data en (ré)assurance   régis delayet
Big data en (ré)assurance régis delayetKezhan SHI
 

Viewers also liked (13)

Détection de profils, application en santé et en économétrie geissler
Détection de profils, application en santé et en économétrie   geisslerDétection de profils, application en santé et en économétrie   geissler
Détection de profils, application en santé et en économétrie geissler
 
Insurance fraud through collusion - Pierre Picard
Insurance fraud through collusion - Pierre PicardInsurance fraud through collusion - Pierre Picard
Insurance fraud through collusion - Pierre Picard
 
Les enjeux de la dépendance – laure de montesquieu, scor
Les enjeux de la dépendance – laure de montesquieu, scorLes enjeux de la dépendance – laure de montesquieu, scor
Les enjeux de la dépendance – laure de montesquieu, scor
 
Optimal discretization of hedging strategies rosenbaum
Optimal discretization of hedging strategies   rosenbaumOptimal discretization of hedging strategies   rosenbaum
Optimal discretization of hedging strategies rosenbaum
 
Confidentialité des données michel béra
Confidentialité des données   michel béraConfidentialité des données   michel béra
Confidentialité des données michel béra
 
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des ActuairesNorme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
Norme IFRS - Pierre Thérond - Université d'été de l'Institut des Actuaires
 
Eurocroissance arnaud cohen
Eurocroissance arnaud cohenEurocroissance arnaud cohen
Eurocroissance arnaud cohen
 
L'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data scienceL'émergence d'une nouvelle filière de formation : data science
L'émergence d'une nouvelle filière de formation : data science
 
Arbres de régression et modèles de durée
Arbres de régression et modèles de duréeArbres de régression et modèles de durée
Arbres de régression et modèles de durée
 
Loi hamon sébastien bachellier
Loi hamon sébastien bachellierLoi hamon sébastien bachellier
Loi hamon sébastien bachellier
 
Présentation Françoise Soulié Fogelman
Présentation Françoise Soulié FogelmanPrésentation Françoise Soulié Fogelman
Présentation Françoise Soulié Fogelman
 
Big data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuairesBig data analytics focus technique et nouvelles perspectives pour les actuaires
Big data analytics focus technique et nouvelles perspectives pour les actuaires
 
Big data en (ré)assurance régis delayet
Big data en (ré)assurance   régis delayetBig data en (ré)assurance   régis delayet
Big data en (ré)assurance régis delayet
 

Similar to Machine learning pour les données massives algorithmes randomis´es, en ligne et distribu´es

06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptxSaharA84
 
Machine Learning: an Introduction and cases
Machine Learning: an Introduction and casesMachine Learning: an Introduction and cases
Machine Learning: an Introduction and casesHendri Karisma
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Artificial Intelligence Institute at UofSC
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsIstituto nazionale di statistica
 
Ml topic1 a
Ml topic1 aMl topic1 a
Ml topic1 abosycs1
 
Machine Learning
Machine LearningMachine Learning
Machine LearningVivek Garg
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataArthur Charpentier
 
Machine learning
Machine learningMachine learning
Machine learningRohit Kumar
 
Eick/Alpaydin Introduction
Eick/Alpaydin IntroductionEick/Alpaydin Introduction
Eick/Alpaydin Introductionbutest
 
Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...Kim Flintoff
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learningbutest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New BossAndreas Dewes
 
introduction to machin learning
introduction to machin learningintroduction to machin learning
introduction to machin learningnilimapatel6
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningAI Summary
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
 
Clase 0
Clase 0Clase 0
Clase 0butest
 

Similar to Machine learning pour les données massives algorithmes randomis´es, en ligne et distribu´es (20)

06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx
 
Machine Learning: an Introduction and cases
Machine Learning: an Introduction and casesMachine Learning: an Introduction and cases
Machine Learning: an Introduction and cases
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 
Ml topic1 a
Ml topic1 aMl topic1 a
Ml topic1 a
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
 
Machine learning
Machine learningMachine learning
Machine learning
 
Eick/Alpaydin Introduction
Eick/Alpaydin IntroductionEick/Alpaydin Introduction
Eick/Alpaydin Introduction
 
Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
introduction to machin learning
introduction to machin learningintroduction to machin learning
introduction to machin learning
 
i2ml3e-chap1.pptx
i2ml3e-chap1.pptxi2ml3e-chap1.pptx
i2ml3e-chap1.pptx
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
Clase 0
Clase 0Clase 0
Clase 0
 

More from Kezhan SHI

Big data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septBig data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septKezhan SHI
 
Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Kezhan SHI
 
Big data ads gouvernance ads v2[
Big data ads   gouvernance ads v2[Big data ads   gouvernance ads v2[
Big data ads gouvernance ads v2[Kezhan SHI
 
Big data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septBig data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septKezhan SHI
 
B -technical_specification_for_the_preparatory_phase__part_ii_
B  -technical_specification_for_the_preparatory_phase__part_ii_B  -technical_specification_for_the_preparatory_phase__part_ii_
B -technical_specification_for_the_preparatory_phase__part_ii_Kezhan SHI
 
A -technical_specification_for_the_preparatory_phase__part_i_
A  -technical_specification_for_the_preparatory_phase__part_i_A  -technical_specification_for_the_preparatory_phase__part_i_
A -technical_specification_for_the_preparatory_phase__part_i_Kezhan SHI
 
20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standardKezhan SHI
 
20140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-201420140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-2014Kezhan SHI
 
20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilanKezhan SHI
 
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_Kezhan SHI
 
Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Kezhan SHI
 
Directive solvabilité 2
Directive solvabilité 2Directive solvabilité 2
Directive solvabilité 2Kezhan SHI
 
Directive omnibus 2
Directive omnibus 2Directive omnibus 2
Directive omnibus 2Kezhan SHI
 
Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Kezhan SHI
 
Rapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNILRapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNILKezhan SHI
 
Xavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régressionXavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régressionKezhan SHI
 

More from Kezhan SHI (16)

Big data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septBig data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-sept
 
Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Big data fiche data science 15 09 14
Big data fiche data science 15 09 14
 
Big data ads gouvernance ads v2[
Big data ads   gouvernance ads v2[Big data ads   gouvernance ads v2[
Big data ads gouvernance ads v2[
 
Big data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septBig data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-sept
 
B -technical_specification_for_the_preparatory_phase__part_ii_
B  -technical_specification_for_the_preparatory_phase__part_ii_B  -technical_specification_for_the_preparatory_phase__part_ii_
B -technical_specification_for_the_preparatory_phase__part_ii_
 
A -technical_specification_for_the_preparatory_phase__part_i_
A  -technical_specification_for_the_preparatory_phase__part_i_A  -technical_specification_for_the_preparatory_phase__part_i_
A -technical_specification_for_the_preparatory_phase__part_i_
 
20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard
 
20140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-201420140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-2014
 
20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan
 
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
 
Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Qis5 technical specifications-20100706
Qis5 technical specifications-20100706
 
Directive solvabilité 2
Directive solvabilité 2Directive solvabilité 2
Directive solvabilité 2
 
Directive omnibus 2
Directive omnibus 2Directive omnibus 2
Directive omnibus 2
 
Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2
 
Rapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNILRapport d'activité 2013 - CNIL
Rapport d'activité 2013 - CNIL
 
Xavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régressionXavier Milaud - Techniques d'arbres de classification et de régression
Xavier Milaud - Techniques d'arbres de classification et de régression
 

Recently uploaded

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 

Recently uploaded (20)

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 

Machine learning pour les données massives algorithmes randomis´es, en ligne et distribu´es

  • 1. Universit´e des Actuaires Machine-Learning pour les donn´ees massives: algorithmes randomis´es, en ligne et distribu´es St´ephan Cl´emen¸con Institut Mines T´el´ecom - T´el´ecom ParisTech July 7, 2014
  • 2. Agenda I Technologies Big Data: contexte et opportunit´es I Machine-Learning: un bref tour d’horizon I Les d´efis des applications du ML dans l’industrie et les services: Vers une g´en´eralisation?
  • 3. ”Big Data” - Le contexte Une accumulation de donn´ees massives dans de nombreux domaines: I Biologie/M´edecine (g´enomique, m´etabolomique, essais cliniques, imagerie, etc. I Grande distribution, marketing (CRM), e-commerce I Moteurs de recherche internet (contenu multimedia) I R´eseaux sociaux (Facebook, Tweeter, ...) I Banque/Finance (risque de march´e/liquidit´e, acc`es au cr´edit) I S´ecurit´e (ex: biom´etrie, vid´eosurveillance) I Administrations (Sant´e Publique, Douanes) I Risques op´erationnels
  • 4. ”Big Data” - Le contexte Un d´eluge de donn´ees qui rend inop´erant: I les outils basiques de I stockage de donn´ees I gestion de base de donn´ees (MySQL) I le pr´etraitement reposant sur l’expertise humaine I indexation, analyse s´emantique I mod´elisation I intelligence d´ecisionnelle
  • 5. ”Big Data” - Le contexte Une multitude de briques technologiques et de services disponibles pour: I La parall´elisation massive (Velocity) I Le calcul distribu´e (Volume) I La gestion de donn´ees sans sch´ema pr´ed´efini (Variety) parmi lesquels: I Le mod`ele de programmation MapReduce: calculs parall´elis´es/distribu´ees I Framework Hadoop I NoSQL: SGBD Cassandra, MongoDB, bases de donn´ees orient´ees graphe, moteur de recherche Elasticsearch, etc. I Clouds: infrastructures, plate-formes, logiciels as a Service promus par Google, Amazon, Facebook, etc.
  • 6. ”Big Data” - Les opportunit´es Des avanc´ees spectaculaires pour I la collecte et le stockage (distribu´e) des donn´ees I la recherche automatique d’objets, de contenu I le partage de donn´ees peu structur´ees Le Big Data: un moteur pour la technologie, la science, l’´economie I Moteurs de recherche, moteurs de recommandation I Maintenance pr´edictive I Marketing viral `a travers les r´eseaux sociaux I D´etection des fraudes I M´edecine individualis´ee I Publicit´e en ligne (retargeting)
  • 7. ”Big Data” - Les opportunit´es Ubiquit´e De nombreux secteurs d’activit´e sont concern´es: I (e-) Commerce I CRM I Sant´e I D´efense, renseignement (e.g. cybers´ecurit´e, biom´etrie) I Banque/Finance I Transports ”intelligents” I etc.
  • 8. Big Data - Recherche Afin d’exploiter les donn´ees (pr´ediction, interp´etation), d´evelopper des technologies math´ematiques permettant de r´esoudre les probl`emes computationnels li´es: I aux contraintes du quasi-temps r´eel ! apprentissage automatique s´equentiel (”on-line”) 6= batch, par renforcement I au caract`ere distribu´e des donn´ees/ressources ! apprentissage automatique distribu´e I `a la volum´etrie des donn´ees ! impact des techniques de sondages sur la performance des algorithmes
  • 9. ”Big Data” - Recherche Des techniques de visualisation, repr´esentation de donn´ees complexes I Graphes (´evolutifs) - clustering, graph-mining I Image, audio, video - filtrage, compression I Donn´ees textuelles (e.g. page web, tweet) Domaines I Probabilit´e, Statistique I Machine-Learning I Optimisation I Traitement du signal et de l’image I Analyse Harmonique Computationnelle I Analyse s´emantique I etc.
  • 10. Goals of Statistical Learning I Statistical issues cast as M-estimation problems: I Classification I Regression I Density level set estimation I Compression, sparse representation I ... and their variants I Minimal assumptions on the distribution I Build realistic M-estimators for special criteria I Questions I Theory: optimal elements, consistency, non-asymptotic excess risk bounds, fast rates of convergence, oracle inequalities I Practice: numerical optimization, convexification, randomization, relaxation, constraints (distributed architectures, real-time, memory, etc.)
  • 11. Main Example: Classification (Pattern Recognition) I (X, Y ) random pair with unknown distribution P I X 2 X observation vector I Y 2 { 1, +1} binary label/class I A posteriori probability ⇠ regression function 8x 2 X, ⌘(x) = {Y = 1 | X = x} I g : X ! { 1, +1} classifier I Performance measure = classification error L(g) = g(X) 6= Y ! min g I Solution: Bayes rule 8x 2 X, g⇤ (x) = 2{⌘(x) > 1/2} 1 I Bayes error L⇤ = L(g⇤)
  • 12. Main Paradigm - Empirical Risk Minimization I Sample (X1, Y1), . . . , (Xn, Yn) with i.i.d. copies of (X, Y ),class of classifiers I Empirical Risk Minimization principle bgn = Argmin g2 Ln(g) := 1 n nX i=1 {g(Xi ) 6= Yi } I Best classifier in the class ¯g = Argmin g2 L(g) I Concentration inequality With probability 1 : sup g2 | Ln(g) L(g) | C r V n + r 2 log(1/ ) n
  • 13. Machine-Learning - Achievements I Numerous applications: I Supervised anomaly detection I Handwritten digit recognition I Face recognition I Medical diagnosis I Credit-risk screening I CRM I Speech recognition I Monitoring of complex systems I etc. I Many ”o↵-the-shelf” methods I Neural Networks I Support Vector Machines I Boosting I Vector quantization I etc.
  • 14. Machine-Learning - Challenges I Ongoing intense research activity, motivated by I Need for increasing performance I Evolution of computing environments (data centers, clouds, HDFS) I New applications/problems: recommending systems, search engines, medical imagery, yield management etc. I The Big Data era I Mathematical/computational challenges I Volume (data deluge): ubiquity of sensors, high dimension, distributed storage/processing systems I Variety (of data structures): text, graphs, images, signals I Velocity (real-time): on-line prediction, evolutionary environment, reinforcement learning strategies (exploration vs exploitation)
  • 15. Statistical Learning - Milestones I The 30’s - Fisher’s (parametric/Gaussian) statistics I Linear Discriminant Analysis I Linear (logistic) regression I PCA, . . .
  • 16. Statistical Learning - Milestones I The 60’s and 70’s - F. Rosenblatt’s perceptron & VC theory I First ”machine-learning” algorithm (linear binary classification) I Inspired by congnitive sciences I Convexification, one-pass/on-line (stochastic gradient descent) I Relaxation, large margin linear classifiers, structural ERM
  • 17. Statistical Learning - Milestones I The 80’s - Neural Networks & Decision Trees I Artificial Intelligence ”A theory of learnability” Valiant ’84 I The Backpropagation algorithm I The CART algorithm (’84)
  • 18. Statistical Learning - Milestones I From the 90’s - Kernels & Boosting I Kernel trick: SVM, nonlinear PCA, . . . I AdaBoost (’95) I Lasso, compressed sensing I A comprehensive theory beyond VC concepts I Rebirth of Q-learning
  • 19. Applications Supervised Learning - Pattern Recognition/Regression I Data with labels, e.g. (Xi , Yi ) 2 Rd ⇥ { 1, +1}, i = 1, . . . , n. Learn to predict Y based on X I Example: in Quality Control, X features of the product and/or production factors, Y = +1 if ”defect” and Y = 1 otherwise. Build a decision rule C minimizing L(C) = P{Y 6= C(X)}
  • 20. Applications Supervised Learning - Scoring I Data with labels, e.g. (Xi , Yi ) 2 Rd ⇥ { 1, +1}, i = 1, . . . , n. Learn to rank all possible observations X in the same order as that induced by P{Y = +1 | X} through a scoring function s(X)
  • 21. Applications Supervised Learning - Image Recognition I Objects are assigned to data (pixels), e.g. biometrics I Goal: learn to assign objects to new data
  • 22. Empirical Risk Minimization and Stochastic Approximation I Most learning problems consists of minimizing a functional L(f ) = E[ (Z, f )] where Z is the observation, f a decision rule candidate I In general, a stochastic approximation inductive method must be implemented ft+1 = ft ⇢t drf L(ft), where drf L is a statistical estimate of L’s gradient based on training data Z1, . . . , Zn
  • 23. Empirical Risk Minimization and Stochastic Approximation I Popular algorithms are based on these principles I Examples: Logit, Neural Networks, linear SVM, etc. I Computational advantages but too rigid (underfitting)
  • 24. Kernel Trick I Apply a simple algorithm but... in a transformed space K(x, x0 ) = h (x), (x0 )i I Examples: Nonlinear SVM, Kernel PCA, SVR I Kernels for images, text data, biological sequences, etc.
  • 25. Greedy Algorithms I Recursive methods exploring exhaustively a structured space at each step I Examples: CART, projection pursuit, matching pursuit, etc. I Highly interpretable/visualizable but poor performance
  • 27. Ensemble learning Heuristic: combine predictions output by weak decision rules Amit & Geman (’97) for image recognition I Example committee-based binary classification: predict Y 2 { 1, +1} based on X Cagg (X) = sgn MX m=1 !mCm(X) ! , where !m controls the impact of the vote of weak rule Cm I The Bootstrap Aggregating method - Breiman (’96) The Cm’s re learnt from bootstrap versions of the training data and !m ⌘ 1 ) Bagging reduces instability of prediction rules
  • 28. Ensemble learning I The Adaptive Boosting algorithm for binary classification Freund & Shapire (’95) - Slow learning I AdaBoost can be interpreted as a forward additive stagewise modelling strategy to minimize a convexified version of the risk E[exp( Y MX m=1 ↵mCm(X))] I A serious competitor: Random Forest, Breiman (’01) Bagging applied to randomized decision trees I Boosting methods and Random Forests outperform older methods in most cases
  • 29. Applications Unsupervised Learning - Anomaly/Novelty Detection I Data with no labels, e.g. Xi 2 Rd , i = 1, . . . , n I Example: monitoring of complex systems, e.g. aircraft systems, fraud detection, predictive maintenance, cybersecurity I Detect abnormal observations - Rarity replaces labeling I 1-class SVM: bG↵ = {x 2 X : Pn i=1 ↵i K(x, Xi ) tµ}
  • 30. Applications Unsupervised Learning - Anomaly/Novelty Ranking I Data with no labels, e.g. Xi 2 Rd , i = 1, . . . , n I Rank data by degree of novelty/abnormality I Distributed Fleet Monitoring: check the 5 % the most abnormal, then the next 5 %, etc. 10 Optimal Good Bad (Extremal behavior) MASS VOLUME (Modes)
  • 31. Feature Selection A quick algorithm has been proposed by Efron et al (2002) to compute bn( ) = Argmin " 1 n nX i=1 (yi xT i )2 + pX i=1 | i | # , for all > 0. As decreases one obtains more and more active (i.e. non zero) coe cients. The regularization path 7! bn( ) thus defines as # 0 a sequence of models with increasing dimension.
  • 32. Variable selection Under `1 constraints, the points that are the `2-furthest away from the origin are on the axes (zero coe cient): −1.0 −0.5 0.0 0.5 1.0 −1.0−0.50.00.51.0 l1 unit ball containing l2 ball contained l2 ball
  • 33. Industrial example (Renault Technocentre) Regularization path: 1,2, . . . , 1,p VS 0 0.5 1 1.5 2 2.5 3 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15
  • 34. Lasso - L1 penalty I Many variants: group Lasso, lasso and elastic net I L1 penalty ensures sparsity I Compressed sensing: Cand`es & Tao (’04), Donoho (’04) I Numerous applications, e.g. matrix completion, recommender systems
  • 35. Spectral clustering - Ng et al. (’01) I Partition the vertices of a graph, clustering I Graph Laplacian L = D W , D is the degree matrix and W is the adjacency/weight matrix I Spectral Clustering using the normalised version ˜L = D 1/2LD 1/2 (i) Find k-smallest eigenvectors Vk = (v1, . . . , vk ) of ˜L (ii) Normalise Vk ’s rows: Vk diag(Vk V t k ) 1/2 Vk (iii) Cluster rows of Vk with the k-means algorithm
  • 36. Spectral clustering - Ng et al. (’01)
  • 37. Spectral clustering - Ng et al. (’01) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
  • 39. Spectral clustering - Ng et al. (’01) I Partition the vertices of a graph, clustering I Graph Laplacian L = D W , D is the degree matrix and W is the adjacency/weight matrix I Spectral Clustering using the normalised version ˜L = D 1/2LD 1/2 (i) Find k-smallest eigenvectors Vk = (v1, . . . , vk ) of ˜L (ii) Normalise Vk ’s rows: Vk diag(Vk V t k ) 1/2 Vk (iii) Cluster rows of Vk with the k-means algorithm
  • 40. Scaling-up Machine-Learning Algorithms I ”Smart Randomization/Sampling” I Massive Parallelization: break a large optimization problem into smaller problems e.g. Cascade SVM, parallel large-scale feature selection, parallel clustering I Distributed Optimization I Many frameworks are available: MapReduce (+In Memory=PLANET, IBM PML, Mahout), DryadLINQ, MADlib, Storm, etc.
  • 41. How to apply the ERM paradigm to Big Data? I Suppose that n is too large to evaluate the empirical risk Ln(g) I Common sense: run your preferred learning algorithm using a subsample of ”reasonable” size B << n, e.g. by drawing with replacement in the original training data set...
  • 42. How to apply the ERM paradigm to Big Data? I Suppose that n is too large to evaluate the empirical risk Ln(g) I Common sense: run your preferred learning algorithm using a subsample of ”reasonable” size B << n, e.g. by drawing with replacement in the original training data set... I ... but of course, statistical performance is downgraded 1/ p n << 1/ p B
  • 43. ”Smart sampling” I Use side information and implement your mini-batch SGD with a Horvitz-Thompson estimate of the local gradient I In various situations, the performance criterion is not a basic sample mean statistic any more but a U-statistic I Examples: I Clustering: within cluster point scatter related to a partition P 2 n(n 1) X i<j D(Xi , Xj ) X C2P I{(Xi , Xj ) 2 C2 } I Graph inference (link prediction) I Ranking I · · ·
  • 44. Example: Ranking I Data with ordinal label: (X1, Y1), . . . , (Xn, Yn) 2 X ⇥ {1, . . . , K} ⌦n
  • 45. Example: Ranking I Data with ordinal label: (X1, Y1), . . . , (Xn, Yn) 2 X ⇥ {1, . . . , K} ⌦n I Want to: rank X1, . . . , Xn through a scoring function s : X ! s.t. s(X) and Y tend to increase/decrease together with high probability
  • 46. Example: Ranking I Data with ordinal label: (X1, Y1), . . . , (Xn, Yn) 2 X ⇥ {1, . . . , K} ⌦n I Want to: rank X1, . . . , Xn through a scoring function s : X ! s.t. s(X) and Y tend to increase/decrease together with high probability I Quantitative formulation: maximize the criterion L(s) = P{s(X(1) ) < . . . < s(X(k) ) | Y (1) = 1, . . . , Y (K) = K}
  • 47. Example: Ranking I Data with ordinal label: (X1, Y1), . . . , (Xn, Yn) 2 X ⇥ {1, . . . , K} ⌦n I Want to: rank X1, . . . , Xn through a scoring function s : X ! s.t. s(X) and Y tend to increase/decrease together with high probability I Quantitative formulation: maximize the criterion L(s) = P{s(X(1) ) < . . . < s(X(k) ) | Y (1) = 1, . . . , Y (K) = K} I Observations: nk i.i.d. copies of X given Y = k, X (k) 1 , . . . , X (k) nk n = n1 + . . . + nK
  • 48. Example: Ranking I A natural empirical counterpart of L(s) is bLn(s) = Pn1 i1=1 · · · PnK iK =1 I n s(X (1) i1 ) < . . . < s(X (K) iK ) o n1 ⇥ · · · ⇥ nK ,
  • 49. Example: Ranking I A natural empirical counterpart of L(s) is bLn(s) = Pn1 i1=1 · · · PnK iK =1 I n s(X (1) i1 ) < . . . < s(X (K) iK ) o n1 ⇥ · · · ⇥ nK , I But the number of terms to be summed is prohibitive! n1 ⇥ . . . ⇥ nK
  • 50. Example: Ranking I A natural empirical counterpart of L(s) is bLn(s) = Pn1 i1=1 · · · PnK iK =1 I n s(X (1) i1 ) < . . . < s(X (K) iK ) o n1 ⇥ · · · ⇥ nK , I But the number of terms to be summed is prohibitive! n1 ⇥ . . . ⇥ nK I Maximization of bLn(s) is computationally unfeasible...
  • 51. Generalized U-statistics I K 1 samples and degrees (d1, . . . , dK ) 2 N⇤K I (X (k) 1 , . . . , X (k) nk ), 1  k  K, K independent i.i.d. samples drawn from Fk(dx) on Xk respectively I Kernel H : Xd1 1 ⇥ · · · ⇥ XdK K ! R, square integrable w.r.t. µ = F⌦d1 1 ⌦ · · · ⌦ F⌦dK K
  • 52. Generalized U-statistics Definition The K-sample U-statistic of degrees (d1, . . . , dK ) with kernel H is Un(H) = P I1 . . . P IK H(X (1) I1 ; X (2) I2 ; . . . ; X (K) IK ) n1 d1 ⇥ · · · nK dK , where P Ik refers to summation over all nk dk subsets X (k) Ik = (X (k) i1 , . . . , X (k) idk ) related to a set Ik of dk indexes 1  i1 < . . . < idk  nk It is said symmetric when H is permutation symmetric in each set of dk arguments X (k) Ik . References: Lee (1990)
  • 53. Generalized U-statistics I Unbiased estimator of ✓(H) = E[H(X (1) 1 , . . . , X (1) d1 , . . . , X (K) 1 , . . . , X (K) dk )] with minimum variance I Asymptotically Gaussian as nk/n ! k > 0 for k = 1, . . . , K I Its computation requires the summation of KY k=1 ✓ nk dk ◆ terms I K-partite ranking: dk = 1 for 1  k  K Hs(x1, . . . , xK ) = I {s(x1) < s(x2) < · · · < s(xK )}
  • 54. Incomplete U-statistics I Replace Un(H) by an incomplete version, involving much less terms I Build a set DB of cardinality B built by sampling with replacement in the set ⇤ of indexes ((i (1) 1 , . . . , i (1) d1 ), . . . , (i (K) 1 , . . . , i (K) dK )) with 1  i (k) 1 < . . . < i (k) dk  nk, 1  k  K I Compute the Monte-Carlo version based on B terms eUB(H) = 1 B X (I1, ..., IK )2DB H(X (1) I1 , . . . , X (K) IK ) I An incomplete U-statistic is NOT a U-statistic
  • 55. M-Estimation based on incomplete U-statistics I Replace the criterion by a tractable incomplete version based on B = O(n) terms min H2H eUB(H) I This leads to investigate the maximal deviations sup H2H eUB(H) Un(H)
  • 56. Main Result Theorem Let H be a VC major class of bounded symmetric kernels of finite VC dimension V < +1. Set MH = sup(H,x)2H⇥X |H(x)|. Then, (i) P n supH2H eUB(H) Un(H) > ⌘ o  2(1 + #⇤)V ⇥ e B⌘2/M2 H (ii) for all 2 (0, 1), with probability at least 1 , we have: 1 MH sup H2H eUB(H) E h eUB(H) i  2 r 2V log(1 + )  + r log(2/ )  + r V log(1 + #⇤) + log(4/ ) B , where  = min{bn1/d1c, . . . , bnK /dK c}
  • 57. Consequences I Empirical risk sampling with B = O(n) yields a rate bound of the order O( p log n/n) I One su↵ers no loss in terms of learning rate, while drastically reducing computational cost
  • 58. Example: Ranking Empirical ranking performance for SVMrank based on 1%, 5%, 10%, 20% and 100% of the ”LETOR 2007” dataset.
  • 59. Context Consider a network composed of N agents I Agents process local data I Agents cooperate to estimate some global parameter 1/5
  • 60. A regression example Data set formed by n samples (Xi , Yi ) (i = 1 . . . n) I Yi = variable to be explained I Xi = explanatory features Looking for a model. Linear regression example: min x nX i=1 kYi xT Xi k2 2/5
  • 61. A regression example Data set formed by n samples (Xi , Yi ) (i = 1 . . . n) I Yi = variable to be explained I Xi = explanatory features Looking for a model. min x nX i=1 `(xT Xi , Yi ) + r(x) 2/5
  • 62. A regression example Data set formed by n samples (Xi , Yi ) (i = 1 . . . n) I Yi = variable to be explained I Xi = explanatory features Looking for a model. min x nX i=1 `(xT Xi , Yi ) + r(x) Distributed processing: the problem is separable min x X v2V nX i=1 `(xT Xi,v , Yi,v ) + r(x) [Boyd’11, Agarwal’11] 2/5
  • 63. Formally min x X v2V fv (x) I G = (V , E) is the graph modelling the network I fv is the cost function of agent v Di culty : P v fv is nowhere observed. Methods : from distributed gradient algorithms to advanced proximal methods Common principle : 1. process local data 2. exchange information with neighbors 3. iterate. 3/5
  • 64. Key issues I Distribute cutting-edge optimization algorithms (eq. primal-dual methods, fast-admm, etc.) I Include stochastic perturbations: I On-line algorithms I Asynchronism I Investigate specific ML application, e.g. ranking 4/5