SlideShare une entreprise Scribd logo
Margaux Brégère - April, 24th 2024
Sequential and reinforcement learning for
demand side management
Paris Women in Machine Learning & Data Science @Criteo
Introduction
Demand side management
Current solution: forecast demand and
adapt production accordingly
• Renewable energies development
production harder to adjust
• New (smart) meters access to data
and instantaneous communication
Prospective solutions: manage demand
Send incentive signals (prices)
Control flexible devices
→
→
→
→
Electricity is hard to store production - demand balance must be strictly maintained
→
Demand side management with incentive signals
The environment (consumer behavior) is discovered through
interactions (incentive signal choices) Reinforcement learning
How to develop automatic solutions to chose incentive signals dynamically?
→
Exploration: Learn
consumer behavior
Exploitation: Optimize
signal sending
« Smart Meter Energy Consumption
Data in London Households »
Stochastic multi-armed bandits
Stochastic multi-armed bandits
In a multi-armed bandit problem, a gambler facing a row of slot machines
(also called one-armed bandits) has to decide which machines to play to
maximize her reward
K
Exploration:Test
many arms
Exploitation: Play
the « best » arm
Exploration - Exploitation trade-off
Stochastic multi-armed bandit
Each arm is defined by an unknown probability distribution
For
• Pick an arm
• Receive a random reward with
Maximize the cumulative reward Minimize the regret, i.e., the difference, in expectation,
between the cumulative reward of the best strategy and that of ours:
, with
A good bandit algorithm has a sub-linear regret:
k νk
t = 1,…, T
It ∈ {1,…, K}
Yt Yt |It = k ∼ νk
⇔
RT = T max
k=1,…,K
μk −
𝔼
[
T
∑
t=1
μIt]
μk =
𝔼
[ νk ]
RT
T
→ 0
Upper Confidence Bound algorithm1
Initialization: pick each arm once
For :
• Estimate the expected reward of each arm with
(empirical mean of past rewards)
• Build some confidence intervals around these
estimations: with high
probability
• Be optimistic and act as if the best possible probable
reward was the true reward and choose the next arm
accordingly
t = K + 1,…, T
k ̂
μk,t−1
μk ∈ [ ̂
μk,t−1 − αk,t , ̂
μk,t−1 + αk,t]
It = arg max
k {
̂
μk,t−1 + αk,t}
[1] Finite-time analysis of the multiarmed bandit problem, Peter Auer, Nicolo Cesa-Bianchi, Paul Fischer, Machine
learning, 2002
UCB regret bound
The empirical means based on past rewards are:
with
With Hoeffding-Azuma Inequality, we get
with
And be optimistic ensures that
̂
μk,t−1 =
1
Nk,t−1
t−1
∑
s=1
Ys1{Is=k} Nk,t−1 =
t−1
∑
s=1
1{Is=k}
ℙ ( μk ∈ [ ̂
μk,t−1 − αk,t , ̂
μk,t−1 + αk,t] ) ≥ 1 − t−3
αk,t =
2 log t
Nk,t−1
RT ≲ TK log T
Modeling demand side management
Demand side management with incentive signals
For
• Observe a context and a target
• Choose price levels
• Observe the resulting electricity demand
and suffer the loss
t = 1,…, T
xt ct
pt
Yt = f(xt, pt) + noise(pt)
ℓ(Yt, ct )
Assumptions:
• Homogenous population, tariffs,
• with a known mapping
function and an unknown vector to estimate
• with
•
K pt ∈ ΔK
f(xt, pt) = ϕ(xt, pt)
T
θ ϕ
θ
noise(pt) = pT
t εt
𝕍
[εt] = Σ
ℓ(Yt, ct ) = ( Yt − ct )
2
pt
ct
xt
Yt
Bandit algorithm for target tracking
Under these assumptions:
☞ Estimate parameters and to estimate losses and reach a bias-variance trade-off
Optimistic algorithm:
For
• Select price levels deterministically to estimate offline with
For
• Estimate based on past observation with thanks to a Ridge regression
• Estimate future expected loss for each price level :
• Get confidence bound on these estimations:
• Select price levels optimistically:
𝔼
[ (Yt − ct)
2
past, xt, pt ] = (ϕ(xt, pt)T
θ − ct)
2
+ pT
t Σpt
θ Σ
t = 1,…, τ
Σ ̂
Στ
t = τ + 1,…, T
θ ̂
θt−1
p ̂
ℓp,t = (ϕ(xt, p)T ̂
θt−1 − ct)
2
+ pT ̂
Στ p
| ̂
ℓp,t − ℓp | ≤ αp,t
pt ∈ arg min
p
{ ̂
ℓp,t − αp,t}
Regret bound2
RT =
𝔼
[
T
∑
t=1
(Yt − ct)
2
− min
p
(Y(p) − ct)
2
]
=
T
∑
t=1
(ϕ(xt, pt)T
θ − ct)
2
+ pT
t Σpt −
T
∑
t=1
min
p
(ϕ(xt, p)T
θ − ct)
2
+ pT
Σp
[2] Target Tracking for Contextual Bandits : Application to Demand Side Management, Margaux Brégère, Pierre Gaillard, Yannig
Goude and Gilles Stoltz, ICML, 2019
[3] Laplace’s method on supermartingales: Improved algorithms for linear stochastic bandits, Yasin Abbasi-Yadkori, Dávid Pál, and
Csaba Szepesvári, NeuRIPS, 2011
[4] Contextual bandits with linear payoff functions , Wei Chu, Li Lihong, Lev Reyzin, and Robert Schapire., JMLR 2011
Theorem
For proper choices of confidence levels and number of exploration rounds , with high
probability
If is known,
αp,t τ
RT ≤
𝒪
(T2/3
)
Σ RT ≤
𝒪
( T ln T)
Elements of proof
• Deviation inequalities on 3 and on
• Inspired from LinUCB regret bound analysis4
̂
θt
̂
Στ
Application
Extension: personalized demand side management
Others approaches and prospects
Control of flexible devices4
water-heaters to be controlled
without compromising service
quality
N
At each round
• Observe a target
• Send to all thermostatically controlled loads a probability
of switching on
• Observe the demand
t = 1,…, T
ct
pt ∈ [0,1]
Assumptions:
• water-heaters with same characteristics
• Demand of water-heater is zero if OFF and constant if ON
• State of water-heater
follows an unknown Markov Decision Process (MDP)
• It is possible to control demand if the MDP is known
N
i
xi,t = (Temperaturet, ON/OFFt) i
Learn MDP
(drain law)
Follow the
target
[4] (Online) Convex Optimization for Demand-Side Management: Application to Thermostatically Controlled Loads,
Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard and Nadia Oudjane, 2024
Hyper-parameter optimization5
Train a neural network is expensive and time-consuming
Aim: for a set of hyper-parameters (number of neurons, activation
functions etc.) and a budget , find the best neural network:
At each round
• Choose hyper-parameters
• Train network on
• Observe the forecast error
Output (best arm identification):
Λ
T
arg min
λ∈Λ
ℓ(fλ(
𝒟
TEST))
t = 1,…, T
λt ∈ Λ
fλt
𝒟
TRAIN
ℓt = ℓ(fλt(
𝒟
VALID))
arg min
fλt
ℓ(fλt(
𝒟
VALID))
𝒟
TRAIN
𝒟
VALID
𝒟
TEST
[5] A bandit approach with evolutionary operators for model selection : Application to neural architecture optimization for
image classification, Margaux Brégère and Julie Keisler, 2024
That's all folks!
Questions

Contenu connexe

Similaire à Sequential and reinforcement learning for demand side management by Margaux Brégère

Slides ensae 9
Slides ensae 9Slides ensae 9
Slides ensae 9
Arthur Charpentier
 
Bayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic ModelsBayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic Models
Colin Gillespie
 
Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
Kim Hammar
 
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Learning for exploration-exploitation in reinforcement learning. The dusk of ...Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Learning for exploration-exploitation in reinforcement learning. The dusk of ...Université de Liège (ULg)
 
DRL #2-3 - Multi-Armed Bandits .pptx.pdf
DRL #2-3 - Multi-Armed Bandits .pptx.pdfDRL #2-3 - Multi-Armed Bandits .pptx.pdf
DRL #2-3 - Multi-Armed Bandits .pptx.pdf
GulamSarwar31
 
Meta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningMeta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningUniversité de Liège (ULg)
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
Asma Ben Slimene
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
Asma Ben Slimene
 
Slide for ATD --- Tiansong Wang
Slide for ATD --- Tiansong WangSlide for ATD --- Tiansong Wang
Slide for ATD --- Tiansong WangTiansong Wang
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
JeremyHeng10
 
Yield curve estimation in Costa Rica
Yield curve estimation in Costa RicaYield curve estimation in Costa Rica
Yield curve estimation in Costa Rica
Facultad de Ciencias, UCR
 
Dynpri brno
Dynpri brnoDynpri brno
Dynpri brno
Kjetil Haugen
 
Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
Albert Bifet
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
Albert Bifet
 
Sampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methodsSampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methodsStephane Senecal
 
Integration of renewable energy sources and demand-side management into distr...
Integration of renewable energy sources and demand-side management into distr...Integration of renewable energy sources and demand-side management into distr...
Integration of renewable energy sources and demand-side management into distr...
Université de Liège (ULg)
 
07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan
AMDSeminarSeries
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
JeremyHeng10
 

Similaire à Sequential and reinforcement learning for demand side management by Margaux Brégère (20)

Slides ensae 9
Slides ensae 9Slides ensae 9
Slides ensae 9
 
Bayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic ModelsBayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic Models
 
Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Learning for exploration-exploitation in reinforcement learning. The dusk of ...Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
 
DRL #2-3 - Multi-Armed Bandits .pptx.pdf
DRL #2-3 - Multi-Armed Bandits .pptx.pdfDRL #2-3 - Multi-Armed Bandits .pptx.pdf
DRL #2-3 - Multi-Armed Bandits .pptx.pdf
 
Meta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningMeta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learning
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
Slide for ATD --- Tiansong Wang
Slide for ATD --- Tiansong WangSlide for ATD --- Tiansong Wang
Slide for ATD --- Tiansong Wang
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
Yield curve estimation in Costa Rica
Yield curve estimation in Costa RicaYield curve estimation in Costa Rica
Yield curve estimation in Costa Rica
 
Dynpri brno
Dynpri brnoDynpri brno
Dynpri brno
 
Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Sampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methodsSampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methods
 
Integration of renewable energy sources and demand-side management into distr...
Integration of renewable energy sources and demand-side management into distr...Integration of renewable energy sources and demand-side management into distr...
Integration of renewable energy sources and demand-side management into distr...
 
07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
 
COTS_poster
COTS_posterCOTS_poster
COTS_poster
 
PhDretreat2011
PhDretreat2011PhDretreat2011
PhDretreat2011
 

Plus de Paris Women in Machine Learning and Data Science

How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
Paris Women in Machine Learning and Data Science
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
Paris Women in Machine Learning and Data Science
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
Paris Women in Machine Learning and Data Science
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Paris Women in Machine Learning and Data Science
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
Paris Women in Machine Learning and Data Science
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Paris Women in Machine Learning and Data Science
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
Paris Women in Machine Learning and Data Science
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Paris Women in Machine Learning and Data Science
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
Paris Women in Machine Learning and Data Science
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
Paris Women in Machine Learning and Data Science
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Paris Women in Machine Learning and Data Science
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
Paris Women in Machine Learning and Data Science
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Paris Women in Machine Learning and Data Science
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
Paris Women in Machine Learning and Data Science
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Paris Women in Machine Learning and Data Science
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
Paris Women in Machine Learning and Data Science
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Paris Women in Machine Learning and Data Science
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Paris Women in Machine Learning and Data Science
 
Iana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdfIana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdf
Paris Women in Machine Learning and Data Science
 

Plus de Paris Women in Machine Learning and Data Science (20)

How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 
Iana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdfIana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdf
 

Dernier

一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 

Dernier (20)

一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 

Sequential and reinforcement learning for demand side management by Margaux Brégère

  • 1. Margaux Brégère - April, 24th 2024 Sequential and reinforcement learning for demand side management Paris Women in Machine Learning & Data Science @Criteo
  • 3. Demand side management Current solution: forecast demand and adapt production accordingly • Renewable energies development production harder to adjust • New (smart) meters access to data and instantaneous communication Prospective solutions: manage demand Send incentive signals (prices) Control flexible devices → → → → Electricity is hard to store production - demand balance must be strictly maintained →
  • 4. Demand side management with incentive signals The environment (consumer behavior) is discovered through interactions (incentive signal choices) Reinforcement learning How to develop automatic solutions to chose incentive signals dynamically? → Exploration: Learn consumer behavior Exploitation: Optimize signal sending « Smart Meter Energy Consumption Data in London Households »
  • 6. Stochastic multi-armed bandits In a multi-armed bandit problem, a gambler facing a row of slot machines (also called one-armed bandits) has to decide which machines to play to maximize her reward K Exploration:Test many arms Exploitation: Play the « best » arm Exploration - Exploitation trade-off
  • 7. Stochastic multi-armed bandit Each arm is defined by an unknown probability distribution For • Pick an arm • Receive a random reward with Maximize the cumulative reward Minimize the regret, i.e., the difference, in expectation, between the cumulative reward of the best strategy and that of ours: , with A good bandit algorithm has a sub-linear regret: k νk t = 1,…, T It ∈ {1,…, K} Yt Yt |It = k ∼ νk ⇔ RT = T max k=1,…,K μk − 𝔼 [ T ∑ t=1 μIt] μk = 𝔼 [ νk ] RT T → 0
  • 8. Upper Confidence Bound algorithm1 Initialization: pick each arm once For : • Estimate the expected reward of each arm with (empirical mean of past rewards) • Build some confidence intervals around these estimations: with high probability • Be optimistic and act as if the best possible probable reward was the true reward and choose the next arm accordingly t = K + 1,…, T k ̂ μk,t−1 μk ∈ [ ̂ μk,t−1 − αk,t , ̂ μk,t−1 + αk,t] It = arg max k { ̂ μk,t−1 + αk,t} [1] Finite-time analysis of the multiarmed bandit problem, Peter Auer, Nicolo Cesa-Bianchi, Paul Fischer, Machine learning, 2002
  • 9. UCB regret bound The empirical means based on past rewards are: with With Hoeffding-Azuma Inequality, we get with And be optimistic ensures that ̂ μk,t−1 = 1 Nk,t−1 t−1 ∑ s=1 Ys1{Is=k} Nk,t−1 = t−1 ∑ s=1 1{Is=k} ℙ ( μk ∈ [ ̂ μk,t−1 − αk,t , ̂ μk,t−1 + αk,t] ) ≥ 1 − t−3 αk,t = 2 log t Nk,t−1 RT ≲ TK log T
  • 10. Modeling demand side management
  • 11. Demand side management with incentive signals For • Observe a context and a target • Choose price levels • Observe the resulting electricity demand and suffer the loss t = 1,…, T xt ct pt Yt = f(xt, pt) + noise(pt) ℓ(Yt, ct ) Assumptions: • Homogenous population, tariffs, • with a known mapping function and an unknown vector to estimate • with • K pt ∈ ΔK f(xt, pt) = ϕ(xt, pt) T θ ϕ θ noise(pt) = pT t εt 𝕍 [εt] = Σ ℓ(Yt, ct ) = ( Yt − ct ) 2 pt ct xt Yt
  • 12. Bandit algorithm for target tracking Under these assumptions: ☞ Estimate parameters and to estimate losses and reach a bias-variance trade-off Optimistic algorithm: For • Select price levels deterministically to estimate offline with For • Estimate based on past observation with thanks to a Ridge regression • Estimate future expected loss for each price level : • Get confidence bound on these estimations: • Select price levels optimistically: 𝔼 [ (Yt − ct) 2 past, xt, pt ] = (ϕ(xt, pt)T θ − ct) 2 + pT t Σpt θ Σ t = 1,…, τ Σ ̂ Στ t = τ + 1,…, T θ ̂ θt−1 p ̂ ℓp,t = (ϕ(xt, p)T ̂ θt−1 − ct) 2 + pT ̂ Στ p | ̂ ℓp,t − ℓp | ≤ αp,t pt ∈ arg min p { ̂ ℓp,t − αp,t}
  • 13. Regret bound2 RT = 𝔼 [ T ∑ t=1 (Yt − ct) 2 − min p (Y(p) − ct) 2 ] = T ∑ t=1 (ϕ(xt, pt)T θ − ct) 2 + pT t Σpt − T ∑ t=1 min p (ϕ(xt, p)T θ − ct) 2 + pT Σp [2] Target Tracking for Contextual Bandits : Application to Demand Side Management, Margaux Brégère, Pierre Gaillard, Yannig Goude and Gilles Stoltz, ICML, 2019 [3] Laplace’s method on supermartingales: Improved algorithms for linear stochastic bandits, Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári, NeuRIPS, 2011 [4] Contextual bandits with linear payoff functions , Wei Chu, Li Lihong, Lev Reyzin, and Robert Schapire., JMLR 2011 Theorem For proper choices of confidence levels and number of exploration rounds , with high probability If is known, αp,t τ RT ≤ 𝒪 (T2/3 ) Σ RT ≤ 𝒪 ( T ln T) Elements of proof • Deviation inequalities on 3 and on • Inspired from LinUCB regret bound analysis4 ̂ θt ̂ Στ
  • 17. Control of flexible devices4 water-heaters to be controlled without compromising service quality N At each round • Observe a target • Send to all thermostatically controlled loads a probability of switching on • Observe the demand t = 1,…, T ct pt ∈ [0,1] Assumptions: • water-heaters with same characteristics • Demand of water-heater is zero if OFF and constant if ON • State of water-heater follows an unknown Markov Decision Process (MDP) • It is possible to control demand if the MDP is known N i xi,t = (Temperaturet, ON/OFFt) i Learn MDP (drain law) Follow the target [4] (Online) Convex Optimization for Demand-Side Management: Application to Thermostatically Controlled Loads, Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard and Nadia Oudjane, 2024
  • 18. Hyper-parameter optimization5 Train a neural network is expensive and time-consuming Aim: for a set of hyper-parameters (number of neurons, activation functions etc.) and a budget , find the best neural network: At each round • Choose hyper-parameters • Train network on • Observe the forecast error Output (best arm identification): Λ T arg min λ∈Λ ℓ(fλ( 𝒟 TEST)) t = 1,…, T λt ∈ Λ fλt 𝒟 TRAIN ℓt = ℓ(fλt( 𝒟 VALID)) arg min fλt ℓ(fλt( 𝒟 VALID)) 𝒟 TRAIN 𝒟 VALID 𝒟 TEST [5] A bandit approach with evolutionary operators for model selection : Application to neural architecture optimization for image classification, Margaux Brégère and Julie Keisler, 2024