SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Sequential Selection of Correlated Ads by
               POMDPs

           Shuai Yuan, Jun Wang

             University College London


              October 29, 2012
Motivations and contributions
Motivations,
  • help publishers gain more profit by displaying ads;
  • go further than offline, content-based matching of
       webpages and ads;
Contributions,
  • a framework of ad selection for revenue optimisation;
  • formulating the sequential selection problem by Partially
       observable Markov decision process and providing exact
       and approximate solutions;
  • a public keyword-bid-ad-webpage dataset for reproducible
       research1 .


  1
      http://www.computational-advertising.org
Related works
Contextual advertising,
   • A semantic approach to contextual advertising [Broder 2007]
   • Impedance coupling in content-targeted advertising [Ribeiro 2005]
   • Contextual advertising by combining relevance with click feedback [Chakrabarti
      2008]
Inventory management (contracts),
   • Targeted advertising on the Web with inventory management [Chickering 2003]
   • Revenue management for online advertising: Impatient advertisers
      [Fridgeirsdottir 2007]
   • Dynamic revenue management for online display advertising [Roels 2009]
Optimal pricing model,
   • Pricing of Online Advertising: Cost-Per-Click-Through Vs. Cost-Per-Action [Hu
      2010]
   • Online advertising: Pay-per-view versus pay-per-click [Mangani 2004]
   • Online advertising: Pay-per-view versus pay-per-click A comment [Fjell 2009]
   • Single period balancing of pay-per-click and pay-per-view online display
      advertisements [Kwon 2011]
Related works (cont.)
Ad scheduling,
   • Scheduling advertisements on a web page to maximize revenue [Kumar 2006]
   • Scheduling of dynamic in-game advertising [Turner 2011]
Multi-armed bandits,
   • Using confidence bounds for exploitation-exploration trade-offs [Auer 2003]
   • Multi-armed bandit problems with dependent arms [Pandey 2007]
POMDPs,
   • A survey of POMDP applications [Cassandra 1998]
   • Monte Carlo POMDPs [Thrun 2000]
   • Perseus: Randomized point-based value iteration for POMDPs [Spaan 2005]
Problem statement - setup
                                                            500



                                                            400



                                                            300



                                                            200



                                                            100




                                                                  0             200         400          600         800         1000




                         $                                            500



                                                                      400



                                                                      300



                                                                      200



                                                                      100




                                                                            0         200         400          600         800          1000



                                      500



                                      400



                                      300



                                      200



                                      100




                                            0   200   400               600           800         1000




Figure : 1 webpage, 1 ad slot, M impressions at each time step.
                                     2
Payoff of ads follows X ∼ N (µ, I · σ0 ). µ is generated by µ ∼ N (θ, Σ).
Problem statement - graphical model

        θ(1), Σ(1), T-1          θ(2), Σ(2), T-2                 θ(T), Σ(T), 0




             s(1)                     s(2)         θ, Σ              s(T)



              μ(1)                                    μ(2)                       μ(T)
                                         2
                                     σ   0




                          x(1)                            x(2)                   x(T)



Figure : The payoff model illustrated by an influence diagram
representation with generative processes of a finite horizon POMDP.
s(t) is the selection action. θ(t), Σ(t) is the belief at some stage.
Problem statement - object function
To maximise the expected cumulative payoff over time,
                                                              
                                                  T                              T
    ∗
   π = arg max E [Rπ (T )] = arg max E                 Xs(t) (t) = arg max          E Xs(t) (t)
            π                          π                                π
                                                  t=1                           t=1
                   T                                                   T
        =arg max             xs(t) (t)p(xs(t) (t)|Ψ(t))dx = arg max          θs(t) (t)              (1)
            π            x                                         π
                   t=1                                                 t=1


where,
  • s(t) is the selection decision;
  • Ψ(t) is the available information;
  • π is a selection policy and π ∗ is the optimal one;
  • “M impressions” is dropped from object function.
Belief update



                          $




                                     t=1       t=2 ...

    Figure : Updating belief on ads’ performance over time.
Belief update - the selected ad
We update the belief using Bayes’ theorem.
                    p (x1 |x1 (t), Ψ(t))

                     =       p (x1 |x1 (t), Ψ(t), µ1 ) p (µ1 |x1 (t), Ψ(t))dµ         (2)


by “completing squares”,
             p µ1 |x1 (t), Ψ(t) ∝ p(x1 (t)|µ1 , Ψ(t))p(µ1 |Ψ(t))
                                                             2                   2
                                   ∝ exp − x1 (t) − µ1           − µ1 − θ1 (t)        (3)

we obtain the new belief,
                                                      2
                          µ1 |x1 (t) ∼ N θ1 (t + 1), σ1 (t + 1)                       (4)

                              2              2
                             σ1 (t)x1 (t) + σ0 θ1 (t)                     2
                                                                        σ1 (t)σ02
                                                         2
              θ1 (t + 1) =          2            2
                                                        σ1 (t + 1) =    2 (t) + σ 2
                                                                                      (5)
                                   σ1 (t)   +   σ0                     σ1         0

we write θi (t) and σi2 (t) as the shorthand for θi |Ψ(t) and σi2 |Ψ(t).
Belief update - the correlated ad
We also update the belief of non-selected ads,

        p (x2 |x1 (t), Ψ(t)) =      p (x2 |µ2 , x1 (t), Ψ(t)) p(µ2 |x1 (t), Ψ(t))dµ2       (6)


with linear Gaussian property,
                                                       2
                                 µ1 |µ2 ∼ N (θ1 |µ2 , σ1 |µ2 )                             (7)

                                                                        2
                                                                       σ1,2
                                   σ1,2                 2        2
                 θ1 |µ2 = θ1 +      2
                                          (µ2 − θ2 )   σ1 |µ2 = σ1 −    2
                                                                                           (8)
                                   σ2                                  σ2

we obtain the new belief on a correlated ad,
                                                      2
                         µ2 |x1 (t) ∼ N (θ2 (t + 1), σ2 (t + 1))                           (9)

                                                                                2
                                                                              σ1,2
                                x1 (t) − θ1 (t)    2            2
   θ2 (t + 1) = θ2 (t) + σ1,2      2         2
                                                  σ2 (t + 1) = σ2 (t) −    2 (t) +    2
                                                                                          (10)
                                 σ1 (t) + σ0                              σ1         σ0
Belief update - expected payoff
We also obtain the expected payoff of the selected ad,
                                                 2    2
               X1 |x1 (t), Ψ(t) ∼ N θ1 (t + 1), σ0 + σ1 (t + 1)                   (11)


and the expected payoff of the correlated ad,
                                                 2    2
               X2 |x1 (t), Ψ(t) ∼ N θ2 (t + 1), σ0 + σ2 (t + 1)                   (12)


The final objective function is,
                                        T
                       π ∗ = arg max         θs(t) (t) subject to                 (13)
                                 π
                                       t=1
                                                          xs(t) (t) − θs(t) (t)
           θs(t+1) (t + 1) = θs(t+1) (t) + σs(t),s(t+1)       2           2
                                                                                  (14)
                                                             σs(t) (t) + σ0
                                               2
                                              σs(t),s(t+1)
            2                 2
           σs(t+1) (t + 1) = σs(t+1) (t) −    2           2
                                                                                  (15)
                                             σs(t) (t) + σ0
POMDP formulation and solution
                                 (belief state)
                                                                         500



                                                                         400



                                                                         300

                             (observation                                200


                              & reward)           (action)               100




                                                                               0             200         400          600         800         1000




                                 $                                                 500



                                                                                   400



                                                                                   300



                                                                                   200



                                                                                   100


                                            (hidden state)                               0         200         400          600         800          1000



                                                   500



                                                   400



                                                   300



                                                   200



                                                   100




                                                         0   200   400               600           800         1000




Figure : The POMDP model for the revenue optimisation problem.
(θ(t), Σ(t)) is belief at some stage; x(t) is observation and reward;
s(t) is action; (θ, Σ) is the hidden state. There is no state transition.
Value iteration and MAB approximation
The value function could be expressed as,
                                                                                                            
                                                                                                            
s(t)= arg max Vs(t) (Ψ(t)) = arg max 
                                                        ¯
                                                        (xi )               +          ξ(Ψ(t), i)            
                                                                                                             
       s(t)∈N                    i∈N
                                           the expected immediate reward        the expected future reward
                                                                                                      (16)

The exact solution using Value iteration2 :
        V ∗ (θ, Σ, T ) = max E Xs(t) (1) + V ∗ θ|Xs(t) (1), Σ|Xs(t) (1), T − 1                        (17)
                        s(1)∈N


The approximation based on multi-armed bandit3 :
                                                   qi − ti θi2 (t)       t −1
                      ξUCB 1- NORMAL =      16 ·                     ·                                (18)
                                                      ti − 1               ti

   2
    R. E. Bellman. (1957) “Dynamic Programming”
   3
    Auer, P. et al. (2002) “Finite-time analysis of the multi-armed bandit
problem”
Value iteration with Monte Carlo sampling4
We use sampling to reduce the computational complexity,
1: function VALUE F UNC(θ, Σ, t)
2:    array V ← 0                                               Expected reward vector.
3:    loop i ← 1 to N
4:        V [i] ← θi (t)                                    Expected immediate reward.
5:        if t < T then
6:             for all s in S AMPLE(θ, Σ) do
7:                 [θ , Σ ] ← U PDATE B ELIEF(θ, Σ, s, i)
                                           New belief after selecting i and observing s.
                                                                          Equations 13.
                                  1
8:              V [i] ← V [i] + M   VALUE F UNC(θ , Σ , t + 1)
                                   0
9:           end for
10:       end if
11:    end loop
12:    return [M AX(V ), M AX I NDEX(V )]
13: end function


   4
       Thrun, S. (2000) “Monte Carlo POMDPs”
Multi-armed bandit based approximation
(cont.)
The UCB 1- NORMAL - COR algorithm:
1: function P LAN(θ, Σ, Ψ(t))
2:    array V ← 0
3:    loop i ← 1 to N
4:        if ti < 8 log t then            ti is the number of times ad i gets selected.
5:             return i
6:        end if
7:    end loop
8:    [θ , Σ ] ← U PDATE B ELIEF(θ, Σ, Ψ(t))
                                     New belief of all ads with all available information.
                                                                            Equations 13.
9:     loop i ← 1 to N
                              q −t θ 2
10:       V [i] ← θi + 16 · i t −1i · t−1
                                   i
                                         ti
                                                                       Expected reward.
                                 i
11:    end loop
12:    return [MAX(V ), M AX I NDEX(V )]
13: end function
Experiment datasets
                                ad network/exchange

            Google AdWords              INTRANET




            Traffic Estimator
            service                      $
                          $$$                         $$


         advertisers                                       publishers


 • publishers gain 68% of advertisers’ spending (2003);
 • data was collected from 12/2011 to 05/2012;
 • 512 different keywords, 310 with non-zero mean payoff, 8
   categories;
 • 20% for training and 80% for testing;
 • we consider each keyword to be an ad.
Competing algorithms
We compare the following algorithms,
  • RANDOM policy, which selects candidates randomly
    (uniform);
  • MYOPIC policy, based on the expected immediate reward;
  • UCB 1 policy, which assumes independent between arms
    and is model-free of reward distribution;
  • UCB 1- NORMAL policy, which assumes independent
    between arms and the reward following Gaussian
    distribution;
  • VI - COR policy, which solves Value iteration using Monte
    Carlo sampling; and
  • UCB 1- NORMAL - COR policy, which consider the
    dependencies between candidates.
Results
 Datasets       MYOPIC     RANDOM      UCB 1    UCB 1- N    VI - COR   UCB 1- N - COR
 Education      21.9       23.0        30.9     30.9        41.2*      27.6
 Finance-1      38.5       27.8        40.9     26.4        44.5       27.4
 Finance-2      22.1       16.5        30.6     22.8        38.0*      22.9
 Information    14.1       12.9        27.8     15.9        29.4       15.9
 P&O            41.6       30.4        50.5     31.4        72.9*      63.3
 Shopping-1     17.4       10.6        42.3     16.1        40.2       16.4
 Shopping-2     29.9       14.5        34.3     75.3        52.9       79.2*
 Shopping-3     9.7        4.3         21.9     18.3        27.3       19.4
 P&S            24.7       26.0        47.2     57.1        67.9*      59.9
 Medical        30.5       19.6        52.7     32.2        58.0*      33.5

Table : The cumulative payoffs are averaged on 8 chunks then normalized w.r.t the
GOLDEN policy for a better representation. The one with highest cumulative payoff is
in bold and with ∗ if the difference with the second best is significant by Wilcoxon
signed-rank test. P&O is “People & organisations” and P&S is “‘Products & services”.
Results (cont.)

                VI COR

                UCB1 Normal COR
      4000
                UCB1 Normal

                UCB1

                Golden

                Myopic
      3000
                Random




      2000




      1000




                    20            40   60        80        100



Figure : Cumulative payoff on “People & organization” category, 5
candidates.
Results (cont.)
                                   1
                                          Myopic
                                  0.9     VI-Cor
                                          UCB1-Normal
                                  0.8
   Normalized cumulative payoff




                                          UCB1-Normal-Cor
                                  0.7

                                  0.6

                                  0.5

                                  0.4

                                  0.3

                                  0.2

                                  0.1

                                   0
                                        Edu   F-1    F-2    Info   P&O   S-1   S-2   S-3   P&S   Med


Figure : Comparison of accumulated payoffs on the 10 datasets.
VI-COR always performed better than MYOPIC and UCB1-NORMAL-COR
always performed better than UCB1-NORMAL across all datasets.
Results (cont.)
                           5000
                                   best phones
                           4500    term insurance

                           4000

                           3500
            Daily payoff




                           3000

                           2500

                           2000

                           1500

                           1000

                            500

                              0
                               0                50         100   150
                                                     Day


Figure : Special case: the daily payoff of two candidates with a
sudden change.
Results (cont.)
                            4
                         x 10
                    10
                                                                   Golden
                                                                   Myopic
                     9                                             VI−COR
                                                                   UCB1−Normal−COR

                     8
Cumulative payoff




                                                                                                   Figure : The
                     7
                                                                                                   impact of the noise
                                                                                                           2
                     6
                                                                                                   factor σ0 for the
                                                                                                   situation in the
                     5                                                                             previous figure.

                     4


                     3           −2          0                2                4
                                10        10             10                 10
                                            Noise factor σ2
                                                          0
                                                                                     xs(t) (t) − θs(t) (t)
                                      θs(t+1) (t + 1) = θs(t+1) (t) + σs(t),s(t+1)
                                                                                        2           2
                                                                                       σs(t) (t) + σ0
Future works
 • correlated update: if ad a1 on webpage w1 was shown to
   user u1 and we observed its performance, what’s the belief
   on performance of ad a2 on webpage w2 when showing to
   user u2 with correlations known?
 • multiple ads with diversification (another exploration and
   exploitation dilemma);
 • better solution for our continuous POMDP problem.

Contenu connexe

Tendances

Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlabkrishna_093
 
Gentlest Introduction to Tensorflow
Gentlest Introduction to TensorflowGentlest Introduction to Tensorflow
Gentlest Introduction to TensorflowKhor SoonHin
 
Gentlest Introduction to Tensorflow - Part 3
Gentlest Introduction to Tensorflow - Part 3Gentlest Introduction to Tensorflow - Part 3
Gentlest Introduction to Tensorflow - Part 3Khor SoonHin
 
รายงานคอม
รายงานคอมรายงานคอม
รายงานคอมAreeya Onnom
 
Numerical solution of spatiotemporal models from ecology
Numerical solution of spatiotemporal models from ecologyNumerical solution of spatiotemporal models from ecology
Numerical solution of spatiotemporal models from ecologyKyrre Wahl Kongsgård
 
TensorFlow Tutorial
TensorFlow TutorialTensorFlow Tutorial
TensorFlow TutorialNamHyuk Ahn
 
TensorFlow in Practice
TensorFlow in PracticeTensorFlow in Practice
TensorFlow in Practiceindico data
 
รายงานคอม
รายงานคอมรายงานคอม
รายงานคอมAreeya Onnom
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithmsguestfee8698
 
Explanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expertExplanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expert홍배 김
 
Distilling Free-Form Natural Laws from Experimental Data
Distilling Free-Form Natural Laws from Experimental DataDistilling Free-Form Natural Laws from Experimental Data
Distilling Free-Form Natural Laws from Experimental Dataswissnex San Francisco
 
Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)asghar123456
 
provenance of lists - TAPP'11 Mini-tutorial
provenance of lists - TAPP'11 Mini-tutorialprovenance of lists - TAPP'11 Mini-tutorial
provenance of lists - TAPP'11 Mini-tutorialPaolo Missier
 
Stochastic Differential Equations: Application to Pension Funds under Adverse...
Stochastic Differential Equations: Application to Pension Funds under Adverse...Stochastic Differential Equations: Application to Pension Funds under Adverse...
Stochastic Differential Equations: Application to Pension Funds under Adverse...Marius García Meza
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputszukun
 

Tendances (19)

Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Gentlest Introduction to Tensorflow
Gentlest Introduction to TensorflowGentlest Introduction to Tensorflow
Gentlest Introduction to Tensorflow
 
Gentlest Introduction to Tensorflow - Part 3
Gentlest Introduction to Tensorflow - Part 3Gentlest Introduction to Tensorflow - Part 3
Gentlest Introduction to Tensorflow - Part 3
 
รายงานคอม
รายงานคอมรายงานคอม
รายงานคอม
 
Numerical solution of spatiotemporal models from ecology
Numerical solution of spatiotemporal models from ecologyNumerical solution of spatiotemporal models from ecology
Numerical solution of spatiotemporal models from ecology
 
TensorFlow Tutorial
TensorFlow TutorialTensorFlow Tutorial
TensorFlow Tutorial
 
TensorFlow in Practice
TensorFlow in PracticeTensorFlow in Practice
TensorFlow in Practice
 
My sql cheat sheet
My sql cheat sheetMy sql cheat sheet
My sql cheat sheet
 
รายงานคอม
รายงานคอมรายงานคอม
รายงานคอม
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithms
 
Explanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expertExplanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expert
 
TensorFlow
TensorFlowTensorFlow
TensorFlow
 
Distilling Free-Form Natural Laws from Experimental Data
Distilling Free-Form Natural Laws from Experimental DataDistilling Free-Form Natural Laws from Experimental Data
Distilling Free-Form Natural Laws from Experimental Data
 
Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)
 
provenance of lists - TAPP'11 Mini-tutorial
provenance of lists - TAPP'11 Mini-tutorialprovenance of lists - TAPP'11 Mini-tutorial
provenance of lists - TAPP'11 Mini-tutorial
 
Stochastic Differential Equations: Application to Pension Funds under Adverse...
Stochastic Differential Equations: Application to Pension Funds under Adverse...Stochastic Differential Equations: Application to Pension Funds under Adverse...
Stochastic Differential Equations: Application to Pension Funds under Adverse...
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
About RNN
About RNNAbout RNN
About RNN
 
About RNN
About RNNAbout RNN
About RNN
 

En vedette

CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...Shuai Yuan
 
Dsp and the prediction
Dsp and the predictionDsp and the prediction
Dsp and the predictionSoohan Ahn
 
RTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorialRTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorialShuai Yuan
 
ブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short verブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short ver尚行 坂井
 
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LTあなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LTHiroaki Kudo
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論Taiji Suzuki
 
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のことHiroaki Kudo
 
機械学習でデジタル広告を変える! @デブサミ 2015autumn
機械学習でデジタル広告を変える! @デブサミ 2015autumn機械学習でデジタル広告を変える! @デブサミ 2015autumn
機械学習でデジタル広告を変える! @デブサミ 2015autumnKei Tateno
 
アドテクにおける機械学習技術 @Tokyo Data Night #tokyodn
アドテクにおける機械学習技術 @Tokyo Data Night #tokyodnアドテクにおける機械学習技術 @Tokyo Data Night #tokyodn
アドテクにおける機械学習技術 @Tokyo Data Night #tokyodnKei Tateno
 

En vedette (10)

CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
 
Dsp and the prediction
Dsp and the predictionDsp and the prediction
Dsp and the prediction
 
RTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorialRTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorial
 
ブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short verブラックボックスなアドテクを機械学習で推理してみた Short ver
ブラックボックスなアドテクを機械学習で推理してみた Short ver
 
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LTあなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論
 
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
「YDNの広告のCTRをオンライン学習で予測してみた」#yjdsw4
 
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
 
機械学習でデジタル広告を変える! @デブサミ 2015autumn
機械学習でデジタル広告を変える! @デブサミ 2015autumn機械学習でデジタル広告を変える! @デブサミ 2015autumn
機械学習でデジタル広告を変える! @デブサミ 2015autumn
 
アドテクにおける機械学習技術 @Tokyo Data Night #tokyodn
アドテクにおける機械学習技術 @Tokyo Data Night #tokyodnアドテクにおける機械学習技術 @Tokyo Data Night #tokyodn
アドテクにおける機械学習技術 @Tokyo Data Night #tokyodn
 

Similaire à Sequential Selection of Correlated Ads by POMDPs

Pricing average price advertising options when underlying spot market prices ...
Pricing average price advertising options when underlying spot market prices ...Pricing average price advertising options when underlying spot market prices ...
Pricing average price advertising options when underlying spot market prices ...Bowei Chen
 
Optimal debt maturity management
Optimal debt maturity managementOptimal debt maturity management
Optimal debt maturity managementADEMU_Project
 
A/B Testing for Game Design
A/B Testing for Game DesignA/B Testing for Game Design
A/B Testing for Game DesignTrieu Nguyen
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Chapter 1 introduction (Image Processing)
Chapter 1 introduction (Image Processing)Chapter 1 introduction (Image Processing)
Chapter 1 introduction (Image Processing)Varun Ojha
 
Asset Prices in Segmented and Integrated Markets
Asset Prices in Segmented and Integrated MarketsAsset Prices in Segmented and Integrated Markets
Asset Prices in Segmented and Integrated Marketsguasoni
 
ISI MSQE Entrance Question Paper (2010)
ISI MSQE Entrance Question Paper (2010)ISI MSQE Entrance Question Paper (2010)
ISI MSQE Entrance Question Paper (2010)CrackDSE
 
Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Sean Meyn
 
Unit 1 Operation on signals
Unit 1  Operation on signalsUnit 1  Operation on signals
Unit 1 Operation on signalsDr.SHANTHI K.G
 
Cs8092 computer graphics and multimedia unit 2
Cs8092 computer graphics and multimedia unit 2Cs8092 computer graphics and multimedia unit 2
Cs8092 computer graphics and multimedia unit 2SIMONTHOMAS S
 
12.5. vector valued functions
12.5. vector valued functions12.5. vector valued functions
12.5. vector valued functionsmath267
 
Multi-keyword multi-click advertisement option contracts for sponsored search
Multi-keyword multi-click advertisement option contracts for sponsored searchMulti-keyword multi-click advertisement option contracts for sponsored search
Multi-keyword multi-click advertisement option contracts for sponsored searchBowei Chen
 
Case Study (All)
Case Study (All)Case Study (All)
Case Study (All)gudeyi
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydSri Ambati
 
Fuzzy calculation
Fuzzy calculationFuzzy calculation
Fuzzy calculationAmir Rafati
 
The convenience yield implied by quadratic volatility smiles presentation [...
The convenience yield implied by quadratic volatility smiles   presentation [...The convenience yield implied by quadratic volatility smiles   presentation [...
The convenience yield implied by quadratic volatility smiles presentation [...yigalbt
 
Discussion of Matti Vihola's talk
Discussion of Matti Vihola's talkDiscussion of Matti Vihola's talk
Discussion of Matti Vihola's talkChristian Robert
 
K050 t分布f分布
K050 t分布f分布K050 t分布f分布
K050 t分布f分布t2tarumi
 

Similaire à Sequential Selection of Correlated Ads by POMDPs (20)

Pricing average price advertising options when underlying spot market prices ...
Pricing average price advertising options when underlying spot market prices ...Pricing average price advertising options when underlying spot market prices ...
Pricing average price advertising options when underlying spot market prices ...
 
Optimal debt maturity management
Optimal debt maturity managementOptimal debt maturity management
Optimal debt maturity management
 
A/B Testing for Game Design
A/B Testing for Game DesignA/B Testing for Game Design
A/B Testing for Game Design
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Chapter 1 introduction (Image Processing)
Chapter 1 introduction (Image Processing)Chapter 1 introduction (Image Processing)
Chapter 1 introduction (Image Processing)
 
Asset Prices in Segmented and Integrated Markets
Asset Prices in Segmented and Integrated MarketsAsset Prices in Segmented and Integrated Markets
Asset Prices in Segmented and Integrated Markets
 
ISI MSQE Entrance Question Paper (2010)
ISI MSQE Entrance Question Paper (2010)ISI MSQE Entrance Question Paper (2010)
ISI MSQE Entrance Question Paper (2010)
 
Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009
 
Unit 1 Operation on signals
Unit 1  Operation on signalsUnit 1  Operation on signals
Unit 1 Operation on signals
 
Cs8092 computer graphics and multimedia unit 2
Cs8092 computer graphics and multimedia unit 2Cs8092 computer graphics and multimedia unit 2
Cs8092 computer graphics and multimedia unit 2
 
12.5. vector valued functions
12.5. vector valued functions12.5. vector valued functions
12.5. vector valued functions
 
matlab.docx
matlab.docxmatlab.docx
matlab.docx
 
Multi-keyword multi-click advertisement option contracts for sponsored search
Multi-keyword multi-click advertisement option contracts for sponsored searchMulti-keyword multi-click advertisement option contracts for sponsored search
Multi-keyword multi-click advertisement option contracts for sponsored search
 
Case Study (All)
Case Study (All)Case Study (All)
Case Study (All)
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 
Fuzzy calculation
Fuzzy calculationFuzzy calculation
Fuzzy calculation
 
The convenience yield implied by quadratic volatility smiles presentation [...
The convenience yield implied by quadratic volatility smiles   presentation [...The convenience yield implied by quadratic volatility smiles   presentation [...
The convenience yield implied by quadratic volatility smiles presentation [...
 
Discussion of Matti Vihola's talk
Discussion of Matti Vihola's talkDiscussion of Matti Vihola's talk
Discussion of Matti Vihola's talk
 
K050 t分布f分布
K050 t分布f分布K050 t分布f分布
K050 t分布f分布
 

Dernier

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 

Dernier (20)

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 

Sequential Selection of Correlated Ads by POMDPs

  • 1. Sequential Selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang University College London October 29, 2012
  • 2. Motivations and contributions Motivations, • help publishers gain more profit by displaying ads; • go further than offline, content-based matching of webpages and ads; Contributions, • a framework of ad selection for revenue optimisation; • formulating the sequential selection problem by Partially observable Markov decision process and providing exact and approximate solutions; • a public keyword-bid-ad-webpage dataset for reproducible research1 . 1 http://www.computational-advertising.org
  • 3. Related works Contextual advertising, • A semantic approach to contextual advertising [Broder 2007] • Impedance coupling in content-targeted advertising [Ribeiro 2005] • Contextual advertising by combining relevance with click feedback [Chakrabarti 2008] Inventory management (contracts), • Targeted advertising on the Web with inventory management [Chickering 2003] • Revenue management for online advertising: Impatient advertisers [Fridgeirsdottir 2007] • Dynamic revenue management for online display advertising [Roels 2009] Optimal pricing model, • Pricing of Online Advertising: Cost-Per-Click-Through Vs. Cost-Per-Action [Hu 2010] • Online advertising: Pay-per-view versus pay-per-click [Mangani 2004] • Online advertising: Pay-per-view versus pay-per-click A comment [Fjell 2009] • Single period balancing of pay-per-click and pay-per-view online display advertisements [Kwon 2011]
  • 4. Related works (cont.) Ad scheduling, • Scheduling advertisements on a web page to maximize revenue [Kumar 2006] • Scheduling of dynamic in-game advertising [Turner 2011] Multi-armed bandits, • Using confidence bounds for exploitation-exploration trade-offs [Auer 2003] • Multi-armed bandit problems with dependent arms [Pandey 2007] POMDPs, • A survey of POMDP applications [Cassandra 1998] • Monte Carlo POMDPs [Thrun 2000] • Perseus: Randomized point-based value iteration for POMDPs [Spaan 2005]
  • 5. Problem statement - setup 500 400 300 200 100 0 200 400 600 800 1000 $ 500 400 300 200 100 0 200 400 600 800 1000 500 400 300 200 100 0 200 400 600 800 1000 Figure : 1 webpage, 1 ad slot, M impressions at each time step. 2 Payoff of ads follows X ∼ N (µ, I · σ0 ). µ is generated by µ ∼ N (θ, Σ).
  • 6. Problem statement - graphical model θ(1), Σ(1), T-1 θ(2), Σ(2), T-2 θ(T), Σ(T), 0 s(1) s(2) θ, Σ s(T) μ(1) μ(2) μ(T) 2 σ 0 x(1) x(2) x(T) Figure : The payoff model illustrated by an influence diagram representation with generative processes of a finite horizon POMDP. s(t) is the selection action. θ(t), Σ(t) is the belief at some stage.
  • 7. Problem statement - object function To maximise the expected cumulative payoff over time,   T T ∗ π = arg max E [Rπ (T )] = arg max E  Xs(t) (t) = arg max E Xs(t) (t) π π π t=1 t=1 T T =arg max xs(t) (t)p(xs(t) (t)|Ψ(t))dx = arg max θs(t) (t) (1) π x π t=1 t=1 where, • s(t) is the selection decision; • Ψ(t) is the available information; • π is a selection policy and π ∗ is the optimal one; • “M impressions” is dropped from object function.
  • 8. Belief update $ t=1 t=2 ... Figure : Updating belief on ads’ performance over time.
  • 9. Belief update - the selected ad We update the belief using Bayes’ theorem. p (x1 |x1 (t), Ψ(t)) = p (x1 |x1 (t), Ψ(t), µ1 ) p (µ1 |x1 (t), Ψ(t))dµ (2) by “completing squares”, p µ1 |x1 (t), Ψ(t) ∝ p(x1 (t)|µ1 , Ψ(t))p(µ1 |Ψ(t)) 2 2 ∝ exp − x1 (t) − µ1 − µ1 − θ1 (t) (3) we obtain the new belief, 2 µ1 |x1 (t) ∼ N θ1 (t + 1), σ1 (t + 1) (4) 2 2 σ1 (t)x1 (t) + σ0 θ1 (t) 2 σ1 (t)σ02 2 θ1 (t + 1) = 2 2 σ1 (t + 1) = 2 (t) + σ 2 (5) σ1 (t) + σ0 σ1 0 we write θi (t) and σi2 (t) as the shorthand for θi |Ψ(t) and σi2 |Ψ(t).
  • 10. Belief update - the correlated ad We also update the belief of non-selected ads, p (x2 |x1 (t), Ψ(t)) = p (x2 |µ2 , x1 (t), Ψ(t)) p(µ2 |x1 (t), Ψ(t))dµ2 (6) with linear Gaussian property, 2 µ1 |µ2 ∼ N (θ1 |µ2 , σ1 |µ2 ) (7) 2 σ1,2 σ1,2 2 2 θ1 |µ2 = θ1 + 2 (µ2 − θ2 ) σ1 |µ2 = σ1 − 2 (8) σ2 σ2 we obtain the new belief on a correlated ad, 2 µ2 |x1 (t) ∼ N (θ2 (t + 1), σ2 (t + 1)) (9) 2 σ1,2 x1 (t) − θ1 (t) 2 2 θ2 (t + 1) = θ2 (t) + σ1,2 2 2 σ2 (t + 1) = σ2 (t) − 2 (t) + 2 (10) σ1 (t) + σ0 σ1 σ0
  • 11. Belief update - expected payoff We also obtain the expected payoff of the selected ad, 2 2 X1 |x1 (t), Ψ(t) ∼ N θ1 (t + 1), σ0 + σ1 (t + 1) (11) and the expected payoff of the correlated ad, 2 2 X2 |x1 (t), Ψ(t) ∼ N θ2 (t + 1), σ0 + σ2 (t + 1) (12) The final objective function is, T π ∗ = arg max θs(t) (t) subject to (13) π t=1 xs(t) (t) − θs(t) (t) θs(t+1) (t + 1) = θs(t+1) (t) + σs(t),s(t+1) 2 2 (14) σs(t) (t) + σ0 2 σs(t),s(t+1) 2 2 σs(t+1) (t + 1) = σs(t+1) (t) − 2 2 (15) σs(t) (t) + σ0
  • 12. POMDP formulation and solution (belief state) 500 400 300 (observation 200 & reward) (action) 100 0 200 400 600 800 1000 $ 500 400 300 200 100 (hidden state) 0 200 400 600 800 1000 500 400 300 200 100 0 200 400 600 800 1000 Figure : The POMDP model for the revenue optimisation problem. (θ(t), Σ(t)) is belief at some stage; x(t) is observation and reward; s(t) is action; (θ, Σ) is the hidden state. There is no state transition.
  • 13. Value iteration and MAB approximation The value function could be expressed as,     s(t)= arg max Vs(t) (Ψ(t)) = arg max   ¯ (xi ) + ξ(Ψ(t), i)   s(t)∈N i∈N the expected immediate reward the expected future reward (16) The exact solution using Value iteration2 : V ∗ (θ, Σ, T ) = max E Xs(t) (1) + V ∗ θ|Xs(t) (1), Σ|Xs(t) (1), T − 1 (17) s(1)∈N The approximation based on multi-armed bandit3 : qi − ti θi2 (t) t −1 ξUCB 1- NORMAL = 16 · · (18) ti − 1 ti 2 R. E. Bellman. (1957) “Dynamic Programming” 3 Auer, P. et al. (2002) “Finite-time analysis of the multi-armed bandit problem”
  • 14. Value iteration with Monte Carlo sampling4 We use sampling to reduce the computational complexity, 1: function VALUE F UNC(θ, Σ, t) 2: array V ← 0 Expected reward vector. 3: loop i ← 1 to N 4: V [i] ← θi (t) Expected immediate reward. 5: if t < T then 6: for all s in S AMPLE(θ, Σ) do 7: [θ , Σ ] ← U PDATE B ELIEF(θ, Σ, s, i) New belief after selecting i and observing s. Equations 13. 1 8: V [i] ← V [i] + M VALUE F UNC(θ , Σ , t + 1) 0 9: end for 10: end if 11: end loop 12: return [M AX(V ), M AX I NDEX(V )] 13: end function 4 Thrun, S. (2000) “Monte Carlo POMDPs”
  • 15. Multi-armed bandit based approximation (cont.) The UCB 1- NORMAL - COR algorithm: 1: function P LAN(θ, Σ, Ψ(t)) 2: array V ← 0 3: loop i ← 1 to N 4: if ti < 8 log t then ti is the number of times ad i gets selected. 5: return i 6: end if 7: end loop 8: [θ , Σ ] ← U PDATE B ELIEF(θ, Σ, Ψ(t)) New belief of all ads with all available information. Equations 13. 9: loop i ← 1 to N q −t θ 2 10: V [i] ← θi + 16 · i t −1i · t−1 i ti Expected reward. i 11: end loop 12: return [MAX(V ), M AX I NDEX(V )] 13: end function
  • 16. Experiment datasets ad network/exchange Google AdWords INTRANET Traffic Estimator service $ $$$ $$ advertisers publishers • publishers gain 68% of advertisers’ spending (2003); • data was collected from 12/2011 to 05/2012; • 512 different keywords, 310 with non-zero mean payoff, 8 categories; • 20% for training and 80% for testing; • we consider each keyword to be an ad.
  • 17. Competing algorithms We compare the following algorithms, • RANDOM policy, which selects candidates randomly (uniform); • MYOPIC policy, based on the expected immediate reward; • UCB 1 policy, which assumes independent between arms and is model-free of reward distribution; • UCB 1- NORMAL policy, which assumes independent between arms and the reward following Gaussian distribution; • VI - COR policy, which solves Value iteration using Monte Carlo sampling; and • UCB 1- NORMAL - COR policy, which consider the dependencies between candidates.
  • 18. Results Datasets MYOPIC RANDOM UCB 1 UCB 1- N VI - COR UCB 1- N - COR Education 21.9 23.0 30.9 30.9 41.2* 27.6 Finance-1 38.5 27.8 40.9 26.4 44.5 27.4 Finance-2 22.1 16.5 30.6 22.8 38.0* 22.9 Information 14.1 12.9 27.8 15.9 29.4 15.9 P&O 41.6 30.4 50.5 31.4 72.9* 63.3 Shopping-1 17.4 10.6 42.3 16.1 40.2 16.4 Shopping-2 29.9 14.5 34.3 75.3 52.9 79.2* Shopping-3 9.7 4.3 21.9 18.3 27.3 19.4 P&S 24.7 26.0 47.2 57.1 67.9* 59.9 Medical 30.5 19.6 52.7 32.2 58.0* 33.5 Table : The cumulative payoffs are averaged on 8 chunks then normalized w.r.t the GOLDEN policy for a better representation. The one with highest cumulative payoff is in bold and with ∗ if the difference with the second best is significant by Wilcoxon signed-rank test. P&O is “People & organisations” and P&S is “‘Products & services”.
  • 19. Results (cont.) VI COR UCB1 Normal COR 4000 UCB1 Normal UCB1 Golden Myopic 3000 Random 2000 1000 20 40 60 80 100 Figure : Cumulative payoff on “People & organization” category, 5 candidates.
  • 20. Results (cont.) 1 Myopic 0.9 VI-Cor UCB1-Normal 0.8 Normalized cumulative payoff UCB1-Normal-Cor 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Edu F-1 F-2 Info P&O S-1 S-2 S-3 P&S Med Figure : Comparison of accumulated payoffs on the 10 datasets. VI-COR always performed better than MYOPIC and UCB1-NORMAL-COR always performed better than UCB1-NORMAL across all datasets.
  • 21. Results (cont.) 5000 best phones 4500 term insurance 4000 3500 Daily payoff 3000 2500 2000 1500 1000 500 0 0 50 100 150 Day Figure : Special case: the daily payoff of two candidates with a sudden change.
  • 22. Results (cont.) 4 x 10 10 Golden Myopic 9 VI−COR UCB1−Normal−COR 8 Cumulative payoff Figure : The 7 impact of the noise 2 6 factor σ0 for the situation in the 5 previous figure. 4 3 −2 0 2 4 10 10 10 10 Noise factor σ2 0 xs(t) (t) − θs(t) (t) θs(t+1) (t + 1) = θs(t+1) (t) + σs(t),s(t+1) 2 2 σs(t) (t) + σ0
  • 23. Future works • correlated update: if ad a1 on webpage w1 was shown to user u1 and we observed its performance, what’s the belief on performance of ad a2 on webpage w2 when showing to user u2 with correlations known? • multiple ads with diversification (another exploration and exploitation dilemma); • better solution for our continuous POMDP problem.