This document summarizes a paper presentation on price optimization for fresh produce at Alibaba-owned supermarkets. It discusses how the paper uses machine learning, causal inference, Markov decision processes, and the Bellman equation to predict demand and optimize prices over multiple periods, in a way that was previously not possible with machine learning alone. This led to increased sales of over 20% across 170 stores. The techniques included predicting base sales and price elasticities, framing it as an MDP, and solving the Bellman equation online to determine optimal pricing policies.
18. 二段階構成
1. 4.4 Counterfactual Demand Prediction
a. 4.2 Basic Sales = Intercept Prediction (β for Items)
b. 4.3 Slope Prediction (α for Categories)
2. 5.2 Two-stage Algorithm (Dynamic Programming)
a. 5.2.1 Update by Greedy Policy by Bellman Equation
b. 5.2.2 Joint Optimization of Q function
β α B d/Q
Demand Discount
19. Demand(Y) Prediction
特徴量x, L→Y/Y_normalを予測
目的変数Y
● Y/Y_normal = 値下げ時の売上と定価時の売上の比(>1)
● Y_i: Y of product i
説明変数
● x: set of all features ∈ R^n ⊂ {historical sales of products, shops, holidays...}
● L_i: 3-hot product category vector
20. Demand(Y): Base Sales Prediction (Boosting Tree)
d_0: dではなくd_0。average of historical discounts
x_i: set of all features ∈ R^n ⊂ {historical sales of products, shops, holidays...}
21. Demand(Y): Base Sales Prediction ???
とりあえず各itemの平均売上(β)を当ててるか
ら、x_iの中にあるhistorical
sales(Y_i_normal_t)が必要っぽい。
聞いてみた →
“h doesn’t learn the relationship between
price and sales.”
つまりSlope(α)はxから学習しないわけだ