SlideShare a Scribd company logo
1 of 14
Download to read offline
Jaeho Lee Sejun Park Jinwoo Shin

Korea Advanced Institute of Science and Technology (KAIST)
†
Learning bounds for Risk-sensitive learning
… or, “Robust and Fair ML with Vapnik & Chervonenkis”
Contact: jaeho-lee@kaist.ac.kr
Code: https://github.com/jaeho-lee/oce
Motivation: Robust and fair learning
Truth. Empirical risk minimization (ERM) is a theoretical foundation for ML.
̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
n
∑
i=1
1
n
⋅ f(Zi)
Motivation: Robust and fair learning
Truth. Study on the “empirical risk minimization” gives a concrete foundation for ML.
̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
n
∑
i=1
1
n
⋅ f(Zi)
Also Truth. .Modern-day ML is more than just ERM.

-We weigh samples differently, based on their loss values!
̂f ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
n
∑
i=1
wi ⋅ f(Zi)
Depends on , relative tof(Zi) f(Z1), f(Z2), ⋯, f(Zn)
Motivation: Robust and fair learning
Truth. Study on the “empirical risk minimization” gives a concrete foundation for ML.
̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
n
∑
i=1
1
n
⋅ f(Zi)
Examples. .Robust learning with outliers / noisy labels (high-loss samples are ignored)

Curriculum learning (low-loss samples are prioritized)

Fair ML, with individual fairness criteria (low-loss samples are ignored)
Also Truth. .Modern-day ML is more than just ERM.

-We weigh samples differently, based on their loss values!
̂f ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
n
∑
i=1
wi ⋅ f(Zi)
[1] e.g., Han et al., “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” NeurIPS 2018.

[2] e.g., Pawan Kumar et al., “Self-paced learning for latent variable models,” NeurIPS 2010.

[3] e.g., Williamson et al., “Fairness risk measures,” ICML 2019.
[1]
[2]
[3]
Motivation: Robust and fair learning
Truth. Study on the “empirical risk minimization” gives a concrete foundation for ML.
̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
n
∑
i=1
1
n
⋅ f(Zi)
Also Truth. .Modern-day ML is more than just ERM.

-We weigh samples differently, based on their loss values!
̂f ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
n
∑
i=1
wi ⋅ f(Zi)
Examples. .Robust learning with outliers / noisy labels (high-loss samples are ignored)

Curriculum learning (low-loss samples are prioritized)

Fair ML, with individual fairness criteria (low-loss samples are ignored)
Question. Can we give convergence guarantees for algorithms with loss-dependent weights?
Challenge. What theoretical framework should we use?
Framework: Optimized Certainty Equivalents (OCE)
History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion.

- extends the utility-theoretic perspective of von Neumann and Morgenstern.
Utility curve

(diminishing marginal utility)
Income
(Objective)
Utility

(subjective)
Δ1
Δ2
Δ3
Framework: Optimized Certainty Equivalents (OCE)
History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion.

- extends the utility-theoretic perspective of von Neumann and Morgenstern.
Definition. Capture the risk-averse behavior using a convex disutility function .ϕ
i.e., negative utility
𝗈𝖼𝖾(f, P) ≜ inf
λ∈ℝ
{λ + EP[ϕ(f(Z) − λ)]}
EP[ϕ(f(Z) − λ)]
λ Certain present loss
Uncertain future disutility
Framework: Optimized Certainty Equivalents (OCE)
History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion.

- extends the utility-theoretic perspective of von Neumann and Morgenstern.
Definition. Capture the risk-averse behavior using a convex disutility function .ϕ
i.e., negative utility
ML view. .We are penalizing the average loss + deviation!
𝗈𝖼𝖾(f, P) = EP[f(Z)] + inf
λ∈ℝ
{EP[φ(f(Z) − λ)]}
… for some convex .φ(t) = ϕ(t) − t
λ* f(Z𝗁𝗂𝗀𝗁−𝗅𝗈𝗌𝗌)f(Z𝗅𝗈𝗐−𝗅𝗈𝗌𝗌)
“deviation penalty” from the

optimized anchor λ*
Framework: Optimized Certainty Equivalents (OCE)
History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion.

- extends the utility-theoretic perspective of von Neumann and Morgenstern.
Definition. Capture the risk-averse behavior using a convex disutility function .ϕ
i.e., negative utility
ML view. .We are penalizing the average loss + deviation!
𝗈𝖼𝖾(f, P) = EP[f(Z)] + inf
λ∈ℝ
{EP[φ(f(Z) − λ)]}
Examples. This framework covers a wide range of “risk-averse” measures of loss.
- Average + variance penalty

- Conditional value-at-risk .(i.e., ignore low-loss samples)

- Entropic risk measure -(i.e., exponentially tilted loss).
Note: OCE is complementary to rank-based approaches

(come to our poster session for details!)
[1] e.g., Maurer and Pontil, “Empirical Bernstein bounds and sample variance penalization,” COLT 2009.

[2] e.g., Curi et al., “Adaptive sampling for stochastic risk-averse learning,” NeurIPS 2020.

[3] e.g., Li et al., “Tilted empirical risk minimization,” arXiv 2020.
[1]
[2]
[3]
Framework: Optimized Certainty Equivalents (OCE)
History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion.

- extends the utility-theoretic perspective of von Neumann and Morgenstern.
Definition. Capture the risk-averse behavior using a convex disutility function .ϕ
i.e., negative utility
ML view. .We are penalizing the average loss + deviation!
𝗈𝖼𝖾(f, P) = EP[f(Z)] + inf
λ∈ℝ
{EP[φ(f(Z) − λ)]}
Examples. This framework covers a wide range of “risk-averse” measures of loss.
- Average + variance penalty

- Conditional value-at-risk .(i.e., ignore low-loss samples)

- Entropic risk measure -(i.e., exponentially tilted loss).
Inverted OCE. A new notion to address “risk-seeking” algorithms (e.g., ignore high-loss samples)
𝗈𝖼𝖾(f, P) ≜ EP[f(Z)] − inf
λ∈ℝ
{EP[φ(λ − f(Z))]}
Results: Two learning bounds.
What we do. We analyze the empirical OCE minimization procedure:
Just as Vapnik&Chervonenkis studies “empirical risk minimization.”

we also give inverted OCE version.
̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
𝗈𝖼𝖾(f, Pn)
Results: Two learning bounds.
In a nutshell. We give learning bounds of two different type.
What we do. We analyze the empirical OCE minimization procedure:
Just as Vapnik&Chervonenkis studies “empirical risk minimization.”

we also give inverted OCE version.
𝗈𝖼𝖾( ̂f 𝖾𝗈𝗆, P) − inf
f∈ℱ
𝗈𝖼𝖾(f, P) ≈ 𝒪
(
𝖫𝗂𝗉(ϕ) ⋅ 𝖼𝗈𝗆𝗉(ℱ)
n )
EP[ ̂f 𝖾𝗈𝗆(Z)] − inf
f∈ℱ
EP[f(Z)] ≈ 𝒪
(
𝖼𝗈𝗆𝗉(ℱ)
n )
Theorem 6. Excess expected loss bound
Theorem 3. Excess OCE bound
(come to our poster session for details!)
̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
𝗈𝖼𝖾(f, Pn)
Results: Two learning bounds.
In a nutshell. We give learning bounds of two different type.
What we do. We analyze the empirical OCE minimization procedure:
Just as Vapnik&Chervonenkis studies “empirical risk minimization.”

we also give inverted OCE version.
Theorem 6. Excess expected loss bound
Theorem 3. Excess OCE bound
Also… We also discover the relationship to sample variance penalization (SVP) procedure,

and find that SVP is a nice baseline strategy for batch-based OCE minimization.
(come to our poster session for details!)
̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
𝗈𝖼𝖾(f, Pn)
𝗈𝖼𝖾( ̂f 𝖾𝗈𝗆, P) − inf
f∈ℱ
𝗈𝖼𝖾(f, P) ≈ 𝒪
(
𝖫𝗂𝗉(ϕ) ⋅ 𝖼𝗈𝗆𝗉(ℱ)
n )
EP[ ̂f 𝖾𝗈𝗆(Z)] − inf
f∈ℱ
EP[f(Z)] ≈ 𝒪
(
𝖼𝗈𝗆𝗉(ℱ)
n )
Results: Two learning bounds.
In a nutshell. We give learning bounds of two different type.
What we do. We analyze the empirical OCE minimization procedure:
Just as Vapnik&Chervonenkis studies “empirical risk minimization.”

we also give inverted OCE version.
Theorem 6. Excess expected loss bound
Theorem 3. Excess OCE bound
Also… We also discover the relationship to sample variance penalization (SVP) procedure,

and find that SVP is a nice baseline strategy for batch-based OCE minimization.
(come to our poster session for details!)
̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ
𝗈𝖼𝖾(f, Pn)
𝗈𝖼𝖾( ̂f 𝖾𝗈𝗆, P) − inf
f∈ℱ
𝗈𝖼𝖾(f, P) ≈ 𝒪
(
𝖫𝗂𝗉(ϕ) ⋅ 𝖼𝗈𝗆𝗉(ℱ)
n )
EP[ ̂f 𝖾𝗈𝗆(Z)] − inf
f∈ℱ
EP[f(Z)] ≈ 𝒪
(
𝖼𝗈𝗆𝗉(ℱ)
n )
TL;DR. . - We give OCE-based theoretical framework to address robust/fair ML.

-- We give excess risk bounds for empirical OCE minimizers.
- Further implications of our theoretical results…

- Proof ideas…

- Experiment details…

- Comparisons with alternative frameworks…
Come to our zoom session for interesting details, including…

More Related Content

Similar to Learning bounds for risk-sensitive learning

STAT: Random experiments(2)
STAT: Random experiments(2)STAT: Random experiments(2)
STAT: Random experiments(2)
Tuenti SiIx
 

Similar to Learning bounds for risk-sensitive learning (20)

Theory of Probability-Bernoulli, Binomial, Passion
Theory of Probability-Bernoulli, Binomial, PassionTheory of Probability-Bernoulli, Binomial, Passion
Theory of Probability-Bernoulli, Binomial, Passion
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
Next Steps in Propositional Horn Contraction
Next Steps in Propositional Horn ContractionNext Steps in Propositional Horn Contraction
Next Steps in Propositional Horn Contraction
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
A measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisA measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysis
 
Demystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
 
Lesson 26
Lesson 26Lesson 26
Lesson 26
 
AI Lesson 26
AI Lesson 26AI Lesson 26
AI Lesson 26
 
  Information Theory and the Analysis of Uncertainties in a Spatial Geologi...
  Information Theory and the Analysis of Uncertainties in a Spatial Geologi...  Information Theory and the Analysis of Uncertainties in a Spatial Geologi...
  Information Theory and the Analysis of Uncertainties in a Spatial Geologi...
 
STAT: Random experiments(2)
STAT: Random experiments(2)STAT: Random experiments(2)
STAT: Random experiments(2)
 
EWMA VaR Models
EWMA VaR ModelsEWMA VaR Models
EWMA VaR Models
 
Basic Inference Analysis
Basic Inference AnalysisBasic Inference Analysis
Basic Inference Analysis
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Chapter06
Chapter06Chapter06
Chapter06
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Artificial intelligence.pptx
Artificial intelligence.pptxArtificial intelligence.pptx
Artificial intelligence.pptx
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 

More from ALINLAB

More from ALINLAB (7)

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
 
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
 
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
 
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
 
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
 
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
 
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
 

Recently uploaded

Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 

Recently uploaded (20)

Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 

Learning bounds for risk-sensitive learning

  • 1. Jaeho Lee Sejun Park Jinwoo Shin Korea Advanced Institute of Science and Technology (KAIST) † Learning bounds for Risk-sensitive learning … or, “Robust and Fair ML with Vapnik & Chervonenkis” Contact: jaeho-lee@kaist.ac.kr Code: https://github.com/jaeho-lee/oce
  • 2. Motivation: Robust and fair learning Truth. Empirical risk minimization (ERM) is a theoretical foundation for ML. ̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 1 n ⋅ f(Zi)
  • 3. Motivation: Robust and fair learning Truth. Study on the “empirical risk minimization” gives a concrete foundation for ML. ̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 1 n ⋅ f(Zi) Also Truth. .Modern-day ML is more than just ERM.
 -We weigh samples differently, based on their loss values! ̂f ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 wi ⋅ f(Zi) Depends on , relative tof(Zi) f(Z1), f(Z2), ⋯, f(Zn)
  • 4. Motivation: Robust and fair learning Truth. Study on the “empirical risk minimization” gives a concrete foundation for ML. ̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 1 n ⋅ f(Zi) Examples. .Robust learning with outliers / noisy labels (high-loss samples are ignored)
 Curriculum learning (low-loss samples are prioritized)
 Fair ML, with individual fairness criteria (low-loss samples are ignored) Also Truth. .Modern-day ML is more than just ERM.
 -We weigh samples differently, based on their loss values! ̂f ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 wi ⋅ f(Zi) [1] e.g., Han et al., “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” NeurIPS 2018. [2] e.g., Pawan Kumar et al., “Self-paced learning for latent variable models,” NeurIPS 2010.
 [3] e.g., Williamson et al., “Fairness risk measures,” ICML 2019. [1] [2] [3]
  • 5. Motivation: Robust and fair learning Truth. Study on the “empirical risk minimization” gives a concrete foundation for ML. ̂f 𝖾𝗋𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 1 n ⋅ f(Zi) Also Truth. .Modern-day ML is more than just ERM.
 -We weigh samples differently, based on their loss values! ̂f ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ n ∑ i=1 wi ⋅ f(Zi) Examples. .Robust learning with outliers / noisy labels (high-loss samples are ignored)
 Curriculum learning (low-loss samples are prioritized)
 Fair ML, with individual fairness criteria (low-loss samples are ignored) Question. Can we give convergence guarantees for algorithms with loss-dependent weights? Challenge. What theoretical framework should we use?
  • 6. Framework: Optimized Certainty Equivalents (OCE) History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion. - extends the utility-theoretic perspective of von Neumann and Morgenstern. Utility curve
 (diminishing marginal utility) Income (Objective) Utility
 (subjective) Δ1 Δ2 Δ3
  • 7. Framework: Optimized Certainty Equivalents (OCE) History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion. - extends the utility-theoretic perspective of von Neumann and Morgenstern. Definition. Capture the risk-averse behavior using a convex disutility function .ϕ i.e., negative utility 𝗈𝖼𝖾(f, P) ≜ inf λ∈ℝ {λ + EP[ϕ(f(Z) − λ)]} EP[ϕ(f(Z) − λ)] λ Certain present loss Uncertain future disutility
  • 8. Framework: Optimized Certainty Equivalents (OCE) History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion. - extends the utility-theoretic perspective of von Neumann and Morgenstern. Definition. Capture the risk-averse behavior using a convex disutility function .ϕ i.e., negative utility ML view. .We are penalizing the average loss + deviation! 𝗈𝖼𝖾(f, P) = EP[f(Z)] + inf λ∈ℝ {EP[φ(f(Z) − λ)]} … for some convex .φ(t) = ϕ(t) − t λ* f(Z𝗁𝗂𝗀𝗁−𝗅𝗈𝗌𝗌)f(Z𝗅𝗈𝗐−𝗅𝗈𝗌𝗌) “deviation penalty” from the
 optimized anchor λ*
  • 9. Framework: Optimized Certainty Equivalents (OCE) History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion. - extends the utility-theoretic perspective of von Neumann and Morgenstern. Definition. Capture the risk-averse behavior using a convex disutility function .ϕ i.e., negative utility ML view. .We are penalizing the average loss + deviation! 𝗈𝖼𝖾(f, P) = EP[f(Z)] + inf λ∈ℝ {EP[φ(f(Z) − λ)]} Examples. This framework covers a wide range of “risk-averse” measures of loss. - Average + variance penalty - Conditional value-at-risk .(i.e., ignore low-loss samples) - Entropic risk measure -(i.e., exponentially tilted loss). Note: OCE is complementary to rank-based approaches
 (come to our poster session for details!) [1] e.g., Maurer and Pontil, “Empirical Bernstein bounds and sample variance penalization,” COLT 2009. [2] e.g., Curi et al., “Adaptive sampling for stochastic risk-averse learning,” NeurIPS 2020.
 [3] e.g., Li et al., “Tilted empirical risk minimization,” arXiv 2020. [1] [2] [3]
  • 10. Framework: Optimized Certainty Equivalents (OCE) History. Invented by Ben-Tal and Teboulle (1986) to characterize risk-aversion. - extends the utility-theoretic perspective of von Neumann and Morgenstern. Definition. Capture the risk-averse behavior using a convex disutility function .ϕ i.e., negative utility ML view. .We are penalizing the average loss + deviation! 𝗈𝖼𝖾(f, P) = EP[f(Z)] + inf λ∈ℝ {EP[φ(f(Z) − λ)]} Examples. This framework covers a wide range of “risk-averse” measures of loss. - Average + variance penalty - Conditional value-at-risk .(i.e., ignore low-loss samples) - Entropic risk measure -(i.e., exponentially tilted loss). Inverted OCE. A new notion to address “risk-seeking” algorithms (e.g., ignore high-loss samples) 𝗈𝖼𝖾(f, P) ≜ EP[f(Z)] − inf λ∈ℝ {EP[φ(λ − f(Z))]}
  • 11. Results: Two learning bounds. What we do. We analyze the empirical OCE minimization procedure: Just as Vapnik&Chervonenkis studies “empirical risk minimization.”
 we also give inverted OCE version. ̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ 𝗈𝖼𝖾(f, Pn)
  • 12. Results: Two learning bounds. In a nutshell. We give learning bounds of two different type. What we do. We analyze the empirical OCE minimization procedure: Just as Vapnik&Chervonenkis studies “empirical risk minimization.”
 we also give inverted OCE version. 𝗈𝖼𝖾( ̂f 𝖾𝗈𝗆, P) − inf f∈ℱ 𝗈𝖼𝖾(f, P) ≈ 𝒪 ( 𝖫𝗂𝗉(ϕ) ⋅ 𝖼𝗈𝗆𝗉(ℱ) n ) EP[ ̂f 𝖾𝗈𝗆(Z)] − inf f∈ℱ EP[f(Z)] ≈ 𝒪 ( 𝖼𝗈𝗆𝗉(ℱ) n ) Theorem 6. Excess expected loss bound Theorem 3. Excess OCE bound (come to our poster session for details!) ̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ 𝗈𝖼𝖾(f, Pn)
  • 13. Results: Two learning bounds. In a nutshell. We give learning bounds of two different type. What we do. We analyze the empirical OCE minimization procedure: Just as Vapnik&Chervonenkis studies “empirical risk minimization.”
 we also give inverted OCE version. Theorem 6. Excess expected loss bound Theorem 3. Excess OCE bound Also… We also discover the relationship to sample variance penalization (SVP) procedure,
 and find that SVP is a nice baseline strategy for batch-based OCE minimization. (come to our poster session for details!) ̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ 𝗈𝖼𝖾(f, Pn) 𝗈𝖼𝖾( ̂f 𝖾𝗈𝗆, P) − inf f∈ℱ 𝗈𝖼𝖾(f, P) ≈ 𝒪 ( 𝖫𝗂𝗉(ϕ) ⋅ 𝖼𝗈𝗆𝗉(ℱ) n ) EP[ ̂f 𝖾𝗈𝗆(Z)] − inf f∈ℱ EP[f(Z)] ≈ 𝒪 ( 𝖼𝗈𝗆𝗉(ℱ) n )
  • 14. Results: Two learning bounds. In a nutshell. We give learning bounds of two different type. What we do. We analyze the empirical OCE minimization procedure: Just as Vapnik&Chervonenkis studies “empirical risk minimization.”
 we also give inverted OCE version. Theorem 6. Excess expected loss bound Theorem 3. Excess OCE bound Also… We also discover the relationship to sample variance penalization (SVP) procedure,
 and find that SVP is a nice baseline strategy for batch-based OCE minimization. (come to our poster session for details!) ̂f 𝖾𝗈𝗆 ≜ 𝖺𝗋𝗀𝗆𝗂𝗇f∈ℱ 𝗈𝖼𝖾(f, Pn) 𝗈𝖼𝖾( ̂f 𝖾𝗈𝗆, P) − inf f∈ℱ 𝗈𝖼𝖾(f, P) ≈ 𝒪 ( 𝖫𝗂𝗉(ϕ) ⋅ 𝖼𝗈𝗆𝗉(ℱ) n ) EP[ ̂f 𝖾𝗈𝗆(Z)] − inf f∈ℱ EP[f(Z)] ≈ 𝒪 ( 𝖼𝗈𝗆𝗉(ℱ) n ) TL;DR. . - We give OCE-based theoretical framework to address robust/fair ML.
 -- We give excess risk bounds for empirical OCE minimizers. - Further implications of our theoretical results…
 - Proof ideas…
 - Experiment details…
 - Comparisons with alternative frameworks… Come to our zoom session for interesting details, including…