SlideShare une entreprise Scribd logo
1  sur  60
High-performance computing



High performance
computing in Artificial
Intelligence & Optimization
Olivier.Teytaud@inria.fr + many people

TAO, Inria-Saclay IDF, Cnrs 8623,
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of Excellence.


NCHC, Taiwan.
November 2010.
Disclaimer

Many works in parallelism are about
technical tricks on SMP programming,
message-passing, network organization.
==> often moderate improvements, but
    for all users using a given
 library/methodology
Here, opposite point of view:
  Don't worry for 10% loss due to suboptimal
   programming
  Try to benefit from huge machines
Outline

Parallelism
Bias & variance
AI & Optimization
  Optimization
  Supervised machine learning
  Multistage decision making
Conclusions
Parallelism

Basic principle (here!):
  Using more CPUs for being faster
Parallelism

Basic principle (here!):
  Using more CPUs for being faster
Various cases:
  Many cores in one machine (shared memory)
Parallelism

Basic principle (here!):
  Using more CPUs for being faster
Various cases:
  Many cores in one machine (shared memory)
  Many cores on a same fast network
    (explicit fast communications)
Parallelism

Basic principle (here!):
  Using more CPUs for being faster
Various cases:
  Many cores in one machine (shared memory)
  Many cores on a same fast network
    (explicit fast communications)
  Many cores on a network
    (explicit slow communications)
Parallelism

Various cases:
  Many cores in one machine (shared memory)
   ==> your laptop
  Many cores on a same fast network
   (explicit fast communications)
     ==> your favorite cluster
  Many cores on a network
   (explicit slow communications)
     ==> your grid or your lab or internet
Parallelism

Definitions:
  p = number of processors
  Speed-up(P) = ratio


   Time for reaching precision  when p=1
  -------------------------------------------------------------
   Time for reaching precision  when p=P
  Efficiency(p) = speed-up(p)/p
    (usually at most 1)
Outline

Parallelism
Bias & variance
AI & Optimization
  Optimization
  Supervised machine learning
  Multistage decision making
Conclusions
Bias and variance

I compute x on a computer.
It's imprecise, I get x'.
How can I parallelize this to
    make it faster ?
Bias and variance

I compute x on a computer.
It's imprecise, I get x'.
What happens if I compute x
   1000 times,
   on 1000 different machines ?
I get x'1,...,x'1000.
x' = average( x'1,...,x'1000 )
Bias and variance

x' = average( x'1,...,x'1000 )
If the algorithm is deterministic:
   all x'i are equal
   no benefit
   Speed-up = 1, efficiency → 0
   ==> not good!       (trouble=bias!)
Bias and variance

x' = average( x'1,...,x'1000 )
If the algorithm is deterministic:
   all x'i are equal
   no benefit
   Speed-up = 1, efficiency → 0
   ==> not good!
If unbiased Monte-Carlo estimate:
    - speed-up=p, efficiency=1
   ==> ideal case! (trouble = variance)
Bias and variance, concluding

Two classical notions for an estimator x':
  Bias = E (x' – x)
  Variance E (x' – Ex')2
Parallelism can easily reduce variance;
parallelism can not easily reduce the bias.
AI & optimization: bias &
variance everywhere

Parallelism
Bias & variance
AI & Optimization
  Optimization
  Supervised machine learning
  Multistage decision making
Conclusions
AI & optimization: bias &
variance everywhere

Many (parts of) algorithms can be rewritten
as follows:


  Generate sample x1,...,x using current
   knowledge
  Work on x1,...,x, get y1,...,y.
  Update knowledge.
Example 1: evolutionary
optimization




While (I have time)
  Generate sample x1,...,x using current
   knowledge
  Work on x1,...,x, get y1,...,y.
  Update knowledge.
Example 1: evolutionary
optimization

Initial knowledge = Gaussian
distribution
While (I have time)
  Generate sample x1,...,x using current
   knowledge
  Work on x1,...,x, get y1,...,y.
  Update knowledge.
Example 1: evolutionary
optimization

Initial knowledge = Gaussian
distribution G (mean m, variance  2)
While (I have time)
  Generate sample x1,...,x using G
  Work on x1,...,x, get y1,...,y.
  Update knowledge.
Example 1: evolutionary
optimization

Initial knowledge = Gaussian
distribution G (mean m, variance  2)
While (I have time)
  Generate sample x1,...,x using G
  Work on x1,...,x, get
   y1=fitness(x1),...,y=fitness(x).
  Update knowledge.
Example 1: evolutionary
optimization

Initial knowledge = Gaussian
distribution G (mean m, variance  2)
While (I have time)
  Generate sample x1,...,x using G
  Work on x1,...,x, get
   y1=fitness(x1),...,y=fitness(x).
  Update G (rank xi's):
    m=mean(x1,...,x)
     2=var(x1,...,x)
Example 1: evolutionary
optimization

Initial knowledge = Gaussian
          MANY EVOLUTIONARY
distribution G (mean m, variance  2)
     ALGORITHMS ARE WEAK FOR
While (I have time)
           LAMBDA LARGE.
  GenerateBE EASILY OPTIMIZED
      CAN sample x1,...,x using G
        BY A BIAS / VARIANCE
  Work on x1,...,x, get
              ANALYSIS
   y1=fitness(x1),...,y=fitness(x).
  Update G (rank xi's):
    m=mean(x1,...,x)
     2=var(x1,...,x)
Ex. 1: bias & variance for EO

Initial knowledge = Gaussian
distribution G (mean m, variance  2)
While (I have time)
  Generate sample x1,...,x using G
  Work on x1,...,x, get
   y1=fitness(x1),...,y=fitness(x).
  Update G (rank xi's):
    m=mean(x1,...,x) <== unweighted!
     2=var(x1,...,x)
Ex. 1: bias & variance for EO

Huge improvement in EMNA for lambda
large just by taking into account bias/variance
decomposition: reweighting necessary for
cancelling the bias.
Other improvements by classical statistical
tricks:
   Reducing      for    large;
   Using quasi-random mutations.


==> really simple and crucial for large
 population sizes. (not just for publishing :-) )
Ex. 1: bias & variance for EO

Initial knowledge = Gaussian
distribution G (mean m, variance  2)
While (I have time)
  Generate sample x1,...,x using G
  Work on x1,...,x, get
   y1=fitness(x1),...,y=fitness(x).
  Update G (rank xi's):
    m=mean(x1,...,x) <== unweighted!
     2=var(x1,...,x)
Example 2: supervised machine
learning (huge dataset)




  Generate sample x1,...,x using current
   knowledge
  Work on x1,...,x, get y1,...,y.
  Update knowledge.
Example 2: supervised machine
learning (huge dataset D)
 Generate data sets D1,...,D using current
  knowledge (subsets of the database)
 Work on D1,...,D, get f1,...,f. (by learning)
 Average the fis.
 ==> (su)bagging: Di=subset of D
 ==> random subspace: Di=projection of D on
  random vector space
 ==> random noise: Di=D+noise
 ==> random forest: Di = D, but noisy algo
Example 2: supervised machine
learning (huge dataset D)
 Generate data sets D1,...,D using current
   Easy tricks for parallelizing supervised
  knowledge (subsets of the database)
   machine learning:
    - use (su)bagging
 Work on D1,...,D, get f1,...,f. (by learning)
    - use random subspaces
 Average the of randomized algorithms
    - use average fis.
          (random forests)
 ==> (su)bagging: Di=subset of D
    - do the cross-validation in parallel
 ==> random subspace: Di=projection of D on
  random vector space
  ==> from my experience, complicated parallel tools
 ==> randomimportantDi=D+noise
     are not that noise: …
    - polemical issue: many papers on sophisticated parallel
 ==> supervisedforest: Di = D, algorithms; algo
        random machine learning but noisy
    - I might be wrong :-)
Example 2: active supervised
machine learning (huge dataset)


While I have time
  Generate sample x1,...,x using current
   knowledge (e.g. sample the
   maxUncertainty region)
  Work on x1,...,x, get y1,...,y (labels by
   experts / expensive code)
  Update knowledge (approximate model).
Example 3: decision making
under uncertainty


While I have time
  Generate simulations x1,...,x using
   current knowledge
  Work on x1,...,x, get y1,...,y (get
   rewards)
  Update knowledge (approximate model).
UCT (Upper Confidence Trees)




Coulom (06)
Chaslot, Saito & Bouzy (06)
Kocsis Szepesvari (06)
UCT
UCT
UCT
UCT
UCT
      Kocsis & Szepesvari (06)
Exploitation ...
Exploitation ...
            SCORE =
               5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
               5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
               5/7
             + k.sqrt( log(10)/7 )
... or exploration ?
              SCORE =
                 0/2
               + k.sqrt( log(10)/2 )
Example 3: decision making
under uncertainty


While I have time
  Generate simulation x1,...,x using
   current knowledge (=scoring rule based
   on statistics)
  Work on x1,...,x, get y1,...,y (get
   rewards)
  Update knowledge (= update statistics in
   memory ).
Example 3: decision making
under uncertainty: parallelizing

While I have time
  Generate simulation x1,...,x using
   current knowledge (=scoring rule based
   on statistics)
  Work on x1,...,x, get y1,...,y (get
   rewards)
  Update knowledge (= update statistics in
   memory ).
==> “easily” parallelized on multicore
 machines
Example 3: decision making
under uncertainty: parallelizing

While I have time

   Generate simulation x1,...,x using current knowledge (=scoring rule
     based on statistics)
   Work on x1,...,x, get y1,...,y (get rewards)
   Update knowledge (= update statistics in memory ).

==> parallelized on clusters: one
 knowledge base per machine,
 average statistics only for crucial
 nodes:
   nodes with more than 5 % of the sims
   nodes at depth < 4
Example 3: decision making
under uncertainty: parallelizing

Good news first: it's simple and it
works on huge clusters ! ! !




       Comparison with voting schemes;
       40 machines, 2 seconds per move.
Example 3: decision making
under uncertainty: parallelizing

Good news first: it's simple and it
works on huge clusters ! ! !




     Comparing N machines and P machines
  ==> consistent with linear speed-up in 19x19 !
Example 3: decision making
under uncertainty: parallelizing

Good news first: it's simple and it
works on huge clusters ! ! !

When we have produced these numbers, we
believed we were ready to play Go against very
strong players.


Unfortunately not at all :-)
Go: from 29 to 6 stones
1998: loss against amateur (6d) 19x19 H29
2008: win against a pro (8p) 19x19, H9        MoGo
2008: win against a pro (4p) 19x19, H8    CrazyStone
2008: win against a pro (4p) 19x19, H7    CrazyStone
2009: win against a pro (9p) 19x19, H7        MoGo
2009: win against a pro (1p) 19x19, H6        MoGo
2010: win against a pro (4p) 19x19, H6          Zen

2007: win against a pro (5p) 9x9 (blitz)     MoGo
2008: win against a pro (5p) 9x9 white       MoGo
2009: win against a pro (5p) 9x9 black       MoGo
2009: win against a pro (9p) 9x9 white       Fuego
2009: win against a pro (9p) 9x9 black       MoGoTW

==> still 6 stones at least!
Go: from 29 to 6 stones
1998: loss against amateur (6d) 19x19 H29
2008: win against a pro (8p) 19x19, H9        MoGo
2008: win against a pro (4p) 19x19, H8    CrazyStone
2008: win against a pro (4p) 19x19, H7    CrazyStone
2009: win against a pro (9p) 19x19, H7        MoGo
2009: win against a pro (1p) 19x19, H6        MoGo
2010: win against a pro (4p) 19x19, H6          Zen
                                           Wins with H6 / H7
                                            are lucky (rare)
2007: win against a pro (5p) 9x9 (blitz)           MoGo
                                                  wins
2008: win against a pro (5p) 9x9 white            MoGo
2009: win against a pro (5p) 9x9 black            MoGo
2009: win against a pro (9p) 9x9 white            Fuego
2009: win against a pro (9p) 9x9 black            MoGoTW

==> still 6 stones at least!
Example 3: decision making
under uncertainty: parallelizing

So what happened ?

great speed-up + moderate results;
= contradiction ? ? ?
Example 3: decision making
under uncertainty: parallelizing

So what happened ?

great speed-up + moderate results;
= contradiction ? ? ?


Ok, we can simulate the sequential algorithm very
quickly = success.
But even the sequential algorithm is limited, even
with huge computation time!
Example 3: decision making
under uncertainty: parallelizing

Poorly
handled
situation,
even with
10 days of
CPU !
Example 3: decision making
under uncertainty: limited
scalability

(game of Havannah)




==> killed by the bias!
Example 3: decision making
under uncertainty: limited
scalability

(game of Go)




==> bias trouble ! ! !
we reduce the variance but not the
 systematic bias.
Conclusions

We have seen that “good old”
 bias/variance analysis is
  quite efficient;
  not widely known / used.
Conclusions

easy tricks for evolutionary optimization on
 grids
==> we published papers with great
 speed-ups with just one line of code:
  Reweighting mainly,
  and also
    quasi-random,
    selective pressure modified for large pop size.
Conclusions
easy tricks for supervised machine
 learning:
==> bias/variance analysis here boils
 down to: choose an algorithm with more
 variance than bias and average:
  random subspace;
  random subset (subagging);
  noise introduction;
  “hyper”parameters to be tuned (cross-
   validation).
Conclusions

For sequential decision making under
 uncertainty, disappointing results:
 the best algorithms are not
“that” scalable.


A systematic bias remains.
Conclusions and references

Our experiments: often on Grid5000:
  ~5000 cores         - Linux
  homogeneous environment
  union of high-performance clusters
  contains multi-core machines
Monte-Carlo Tree Search for decision
 making and uncertainty: Coulom, Kocsis
 & Szepesvari, Chaslot et al,...
For parallel evolutionary algorithms: Beyer
 et al, Teytaud et al (this Teytaud is not me...).

Contenu connexe

Dernier

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 

Dernier (20)

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Parallel Artificial Intelligence and Parallel Optimization: a Bias and Variance Point of View

  • 1. High-performance computing High performance computing in Artificial Intelligence & Optimization Olivier.Teytaud@inria.fr + many people TAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud, Digiteo Labs, Pascal Network of Excellence. NCHC, Taiwan. November 2010.
  • 2. Disclaimer Many works in parallelism are about technical tricks on SMP programming, message-passing, network organization. ==> often moderate improvements, but for all users using a given library/methodology Here, opposite point of view: Don't worry for 10% loss due to suboptimal programming Try to benefit from huge machines
  • 3. Outline Parallelism Bias & variance AI & Optimization Optimization Supervised machine learning Multistage decision making Conclusions
  • 4. Parallelism Basic principle (here!): Using more CPUs for being faster
  • 5. Parallelism Basic principle (here!): Using more CPUs for being faster Various cases: Many cores in one machine (shared memory)
  • 6. Parallelism Basic principle (here!): Using more CPUs for being faster Various cases: Many cores in one machine (shared memory) Many cores on a same fast network (explicit fast communications)
  • 7. Parallelism Basic principle (here!): Using more CPUs for being faster Various cases: Many cores in one machine (shared memory) Many cores on a same fast network (explicit fast communications) Many cores on a network (explicit slow communications)
  • 8. Parallelism Various cases: Many cores in one machine (shared memory) ==> your laptop Many cores on a same fast network (explicit fast communications) ==> your favorite cluster Many cores on a network (explicit slow communications) ==> your grid or your lab or internet
  • 9. Parallelism Definitions: p = number of processors Speed-up(P) = ratio Time for reaching precision  when p=1 ------------------------------------------------------------- Time for reaching precision  when p=P Efficiency(p) = speed-up(p)/p (usually at most 1)
  • 10. Outline Parallelism Bias & variance AI & Optimization Optimization Supervised machine learning Multistage decision making Conclusions
  • 11. Bias and variance I compute x on a computer. It's imprecise, I get x'. How can I parallelize this to make it faster ?
  • 12. Bias and variance I compute x on a computer. It's imprecise, I get x'. What happens if I compute x 1000 times, on 1000 different machines ? I get x'1,...,x'1000. x' = average( x'1,...,x'1000 )
  • 13. Bias and variance x' = average( x'1,...,x'1000 ) If the algorithm is deterministic: all x'i are equal no benefit Speed-up = 1, efficiency → 0 ==> not good! (trouble=bias!)
  • 14. Bias and variance x' = average( x'1,...,x'1000 ) If the algorithm is deterministic: all x'i are equal no benefit Speed-up = 1, efficiency → 0 ==> not good! If unbiased Monte-Carlo estimate: - speed-up=p, efficiency=1 ==> ideal case! (trouble = variance)
  • 15. Bias and variance, concluding Two classical notions for an estimator x': Bias = E (x' – x) Variance E (x' – Ex')2 Parallelism can easily reduce variance; parallelism can not easily reduce the bias.
  • 16. AI & optimization: bias & variance everywhere Parallelism Bias & variance AI & Optimization Optimization Supervised machine learning Multistage decision making Conclusions
  • 17. AI & optimization: bias & variance everywhere Many (parts of) algorithms can be rewritten as follows: Generate sample x1,...,x using current knowledge Work on x1,...,x, get y1,...,y. Update knowledge.
  • 18. Example 1: evolutionary optimization While (I have time) Generate sample x1,...,x using current knowledge Work on x1,...,x, get y1,...,y. Update knowledge.
  • 19. Example 1: evolutionary optimization Initial knowledge = Gaussian distribution While (I have time) Generate sample x1,...,x using current knowledge Work on x1,...,x, get y1,...,y. Update knowledge.
  • 20. Example 1: evolutionary optimization Initial knowledge = Gaussian distribution G (mean m, variance  2) While (I have time) Generate sample x1,...,x using G Work on x1,...,x, get y1,...,y. Update knowledge.
  • 21. Example 1: evolutionary optimization Initial knowledge = Gaussian distribution G (mean m, variance  2) While (I have time) Generate sample x1,...,x using G Work on x1,...,x, get y1=fitness(x1),...,y=fitness(x). Update knowledge.
  • 22. Example 1: evolutionary optimization Initial knowledge = Gaussian distribution G (mean m, variance  2) While (I have time) Generate sample x1,...,x using G Work on x1,...,x, get y1=fitness(x1),...,y=fitness(x). Update G (rank xi's): m=mean(x1,...,x)  2=var(x1,...,x)
  • 23. Example 1: evolutionary optimization Initial knowledge = Gaussian MANY EVOLUTIONARY distribution G (mean m, variance  2) ALGORITHMS ARE WEAK FOR While (I have time) LAMBDA LARGE. GenerateBE EASILY OPTIMIZED CAN sample x1,...,x using G BY A BIAS / VARIANCE Work on x1,...,x, get ANALYSIS y1=fitness(x1),...,y=fitness(x). Update G (rank xi's): m=mean(x1,...,x)  2=var(x1,...,x)
  • 24. Ex. 1: bias & variance for EO Initial knowledge = Gaussian distribution G (mean m, variance  2) While (I have time) Generate sample x1,...,x using G Work on x1,...,x, get y1=fitness(x1),...,y=fitness(x). Update G (rank xi's): m=mean(x1,...,x) <== unweighted!  2=var(x1,...,x)
  • 25. Ex. 1: bias & variance for EO Huge improvement in EMNA for lambda large just by taking into account bias/variance decomposition: reweighting necessary for cancelling the bias. Other improvements by classical statistical tricks: Reducing  for  large; Using quasi-random mutations. ==> really simple and crucial for large population sizes. (not just for publishing :-) )
  • 26. Ex. 1: bias & variance for EO Initial knowledge = Gaussian distribution G (mean m, variance  2) While (I have time) Generate sample x1,...,x using G Work on x1,...,x, get y1=fitness(x1),...,y=fitness(x). Update G (rank xi's): m=mean(x1,...,x) <== unweighted!  2=var(x1,...,x)
  • 27. Example 2: supervised machine learning (huge dataset) Generate sample x1,...,x using current knowledge Work on x1,...,x, get y1,...,y. Update knowledge.
  • 28. Example 2: supervised machine learning (huge dataset D) Generate data sets D1,...,D using current knowledge (subsets of the database) Work on D1,...,D, get f1,...,f. (by learning) Average the fis. ==> (su)bagging: Di=subset of D ==> random subspace: Di=projection of D on random vector space ==> random noise: Di=D+noise ==> random forest: Di = D, but noisy algo
  • 29. Example 2: supervised machine learning (huge dataset D) Generate data sets D1,...,D using current Easy tricks for parallelizing supervised knowledge (subsets of the database) machine learning: - use (su)bagging Work on D1,...,D, get f1,...,f. (by learning) - use random subspaces Average the of randomized algorithms - use average fis. (random forests) ==> (su)bagging: Di=subset of D - do the cross-validation in parallel ==> random subspace: Di=projection of D on random vector space ==> from my experience, complicated parallel tools ==> randomimportantDi=D+noise are not that noise: … - polemical issue: many papers on sophisticated parallel ==> supervisedforest: Di = D, algorithms; algo random machine learning but noisy - I might be wrong :-)
  • 30. Example 2: active supervised machine learning (huge dataset) While I have time Generate sample x1,...,x using current knowledge (e.g. sample the maxUncertainty region) Work on x1,...,x, get y1,...,y (labels by experts / expensive code) Update knowledge (approximate model).
  • 31. Example 3: decision making under uncertainty While I have time Generate simulations x1,...,x using current knowledge Work on x1,...,x, get y1,...,y (get rewards) Update knowledge (approximate model).
  • 32. UCT (Upper Confidence Trees) Coulom (06) Chaslot, Saito & Bouzy (06) Kocsis Szepesvari (06)
  • 33. UCT
  • 34. UCT
  • 35. UCT
  • 36. UCT
  • 37. UCT Kocsis & Szepesvari (06)
  • 39. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 40. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 41. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 42. ... or exploration ? SCORE = 0/2 + k.sqrt( log(10)/2 )
  • 43. Example 3: decision making under uncertainty While I have time Generate simulation x1,...,x using current knowledge (=scoring rule based on statistics) Work on x1,...,x, get y1,...,y (get rewards) Update knowledge (= update statistics in memory ).
  • 44. Example 3: decision making under uncertainty: parallelizing While I have time Generate simulation x1,...,x using current knowledge (=scoring rule based on statistics) Work on x1,...,x, get y1,...,y (get rewards) Update knowledge (= update statistics in memory ). ==> “easily” parallelized on multicore machines
  • 45. Example 3: decision making under uncertainty: parallelizing While I have time Generate simulation x1,...,x using current knowledge (=scoring rule based on statistics) Work on x1,...,x, get y1,...,y (get rewards) Update knowledge (= update statistics in memory ). ==> parallelized on clusters: one knowledge base per machine, average statistics only for crucial nodes: nodes with more than 5 % of the sims nodes at depth < 4
  • 46. Example 3: decision making under uncertainty: parallelizing Good news first: it's simple and it works on huge clusters ! ! ! Comparison with voting schemes; 40 machines, 2 seconds per move.
  • 47. Example 3: decision making under uncertainty: parallelizing Good news first: it's simple and it works on huge clusters ! ! ! Comparing N machines and P machines ==> consistent with linear speed-up in 19x19 !
  • 48. Example 3: decision making under uncertainty: parallelizing Good news first: it's simple and it works on huge clusters ! ! ! When we have produced these numbers, we believed we were ready to play Go against very strong players. Unfortunately not at all :-)
  • 49. Go: from 29 to 6 stones 1998: loss against amateur (6d) 19x19 H29 2008: win against a pro (8p) 19x19, H9 MoGo 2008: win against a pro (4p) 19x19, H8 CrazyStone 2008: win against a pro (4p) 19x19, H7 CrazyStone 2009: win against a pro (9p) 19x19, H7 MoGo 2009: win against a pro (1p) 19x19, H6 MoGo 2010: win against a pro (4p) 19x19, H6 Zen 2007: win against a pro (5p) 9x9 (blitz) MoGo 2008: win against a pro (5p) 9x9 white MoGo 2009: win against a pro (5p) 9x9 black MoGo 2009: win against a pro (9p) 9x9 white Fuego 2009: win against a pro (9p) 9x9 black MoGoTW ==> still 6 stones at least!
  • 50. Go: from 29 to 6 stones 1998: loss against amateur (6d) 19x19 H29 2008: win against a pro (8p) 19x19, H9 MoGo 2008: win against a pro (4p) 19x19, H8 CrazyStone 2008: win against a pro (4p) 19x19, H7 CrazyStone 2009: win against a pro (9p) 19x19, H7 MoGo 2009: win against a pro (1p) 19x19, H6 MoGo 2010: win against a pro (4p) 19x19, H6 Zen Wins with H6 / H7 are lucky (rare) 2007: win against a pro (5p) 9x9 (blitz) MoGo wins 2008: win against a pro (5p) 9x9 white MoGo 2009: win against a pro (5p) 9x9 black MoGo 2009: win against a pro (9p) 9x9 white Fuego 2009: win against a pro (9p) 9x9 black MoGoTW ==> still 6 stones at least!
  • 51. Example 3: decision making under uncertainty: parallelizing So what happened ? great speed-up + moderate results; = contradiction ? ? ?
  • 52. Example 3: decision making under uncertainty: parallelizing So what happened ? great speed-up + moderate results; = contradiction ? ? ? Ok, we can simulate the sequential algorithm very quickly = success. But even the sequential algorithm is limited, even with huge computation time!
  • 53. Example 3: decision making under uncertainty: parallelizing Poorly handled situation, even with 10 days of CPU !
  • 54. Example 3: decision making under uncertainty: limited scalability (game of Havannah) ==> killed by the bias!
  • 55. Example 3: decision making under uncertainty: limited scalability (game of Go) ==> bias trouble ! ! ! we reduce the variance but not the systematic bias.
  • 56. Conclusions We have seen that “good old” bias/variance analysis is quite efficient; not widely known / used.
  • 57. Conclusions easy tricks for evolutionary optimization on grids ==> we published papers with great speed-ups with just one line of code: Reweighting mainly, and also quasi-random, selective pressure modified for large pop size.
  • 58. Conclusions easy tricks for supervised machine learning: ==> bias/variance analysis here boils down to: choose an algorithm with more variance than bias and average: random subspace; random subset (subagging); noise introduction; “hyper”parameters to be tuned (cross- validation).
  • 59. Conclusions For sequential decision making under uncertainty, disappointing results: the best algorithms are not “that” scalable. A systematic bias remains.
  • 60. Conclusions and references Our experiments: often on Grid5000: ~5000 cores - Linux homogeneous environment union of high-performance clusters contains multi-core machines Monte-Carlo Tree Search for decision making and uncertainty: Coulom, Kocsis & Szepesvari, Chaslot et al,... For parallel evolutionary algorithms: Beyer et al, Teytaud et al (this Teytaud is not me...).