SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
M = {S, A, pT, p0, g}
Pr{St+1 = s′

|At = a, St = s, …} = Pr{St+1 = s′

|At = a, St = s}
=: pT(s′

|s, a), Pr(S0 = s) =: p0(s)
π ∈ ΠM
Pr(At = a|St = s, …) = Pr(At = a|St = s)
=: π(a|s)
V*
Vπ
(s) :=
𝔼
π
[C0 |S0 = s], Ct :=
∞
∑
i=0
γi
g(At+i, St+i), γ ∈ [0,1)
f(π)
f(π) :=
∑
s∈S
p0(s)Vπ
(s)
π∈ΠM
f(π) M
V* = max
π∈ΠM
Vπ
= max
a∈A
(g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)V*(s′

))
= B*(V*)
⇒ V*
π*
π*d
= arg max
a∈A
g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)V*(s′

)
B*
⇔ ∥B*(v) − B*(u)∥ ≤ γ∥v − u∥
vk+1 = B*(vk), v0 ∈ Rn
⇒ vk → V* k → ∞
Bπ
Vπ
(s):=
∑
a∈A
π(a|s)[g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)Vπ
(s′

)]
=
𝔼
π
[g(St, At) + γVπ
(St+1) St = s, ]
B*V*(s):= max
a∈A
(g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)V*(s′

))
= max
π∈ΠM
𝔼
π
[g(St, At) + γV*(St+1) St = s]
Ct
Ct :=
∞
∑
i=0
γi
g(At+i, St+i), γ ∈ [0,1)
Vπ
Vπ
(s) :=
𝔼
π
[C0 |S0 = s]
V*
Vπ
(s) := max
π∈ΠM
𝔼
π
[C0 |S0 = s]
Qπ
Qπ
(s, a) :=
𝔼
π
[C0 |S0 = s, A0 = a]
Q*
Q*(s, a) := max
π∈ΠM
𝔼
π
[C0 |S0 = s, A0 = a]
Vπ
(s) =
∑
a∈A
Qπ
(s, a)π(a|s), V*(s) = max
a∈A
Q*(s, a)
π*d
= arg max
a∈A
Q*( ⋅ , a)
Υπ
Qπ
(s, a):=
𝔼
π
[g(St, At) + γQπ
(St+1, At+1) St = s, At = a]
= g(s, a) + γ
∑
s′

,a′

∈S×A
pT(s′

|s, a)π(a′

|s′

)Qπ
(s′

, a′

)
Υ*Q*(s, a):=
𝔼
π
[g(St, At) + γ max
a′

∈A
Q*(St+1, a′

) St = s, At = a]
= g(s, a) + γ max
a′

∈A ∑
s′

∈S
pT(s′

|s, a)π(a′

|s′

)Q*(s′

, a′

)
Υπ
(q) = g( ⋅ ) + γ
∑
s′

,a′

∈S×A
pT(s′

| ⋅ )π(a′

|s′

)q(s′

, a′

)
Υ*(q) = g( ⋅ ) + γ max
a′

∈A ∑
s′

∈S
pT(s′
| ⋅ )π(a′

|s′

)q(s′

, a′

)
q, q′

: S × A → ℝ
q ≤ q′

⇔ q(s, a) ≤ q′

(s, a), ∀s, a ∈ S × A
∥q − q′

∥ := max
s,a∈S×A
|q(s, a) − q′

(s, a)|
q ≤ q′

⇒ Υ(q) ≤ Υ(q′

)
Υ(q + c) = Υ(q) + γc, ∀c ∈ ℝ
⇔ ∥Υ(q) − Υ(q′

)∥ ≤ γ∥q − q′

∥
qk+1 = Υ*(qk), q0 ∈ Rn×m
⇒ qk → Q* k → ∞
π*d
= arg max
a∈A
Q*( ⋅ , a)
Hπ
t := {S0, A0, R0, …, St−1, At−1, Rt−1, At M(π)}
hπ
t := {s0, a0, r0, …, st−1, at−1, rt−1, st M(π)}
̂Υπ
(q; hπ
T)(s, a)
:=
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
(rt + γq(st+1, at+1)
)
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
, ∑
T−1
t=0
𝕀
{s=st}
𝕀 {a=at} > 0
q(s, a),
̂Υ*(q; hπ
T)(s, a)
:=
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
(rt + γ maxa∈A q(st+1, a′

)
)
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
, ∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at} > 0
q(s, a),
Υπ
Qπ
(s, a):=
𝔼
π
[g(St, At) + γQπ
(St+1, At+1) St = s, At = a]
Υ*Q*(s, a):=
𝔼
π
[g(St, At) + γ max
a′

∈A
Q*(St+1, a′

) St = s, At = a]
lim
T→∞
1
T
T
∑
i=1
Pr(St = s, At = a|M(π)) > 0, ∀(s, a) ∈ S × A
̂Υπ
( ⋅ ; hT) → Υπ
, ̂Υ*( ⋅ ; hT) → Υ* T → ∞
q ≤ q′

⇒ ̂Υ(q) ≤ ̂Υ(q′

)
̂Υ(q + c) = ̂Υ(q) + γc, ∀c ∈ ℝ
⇔ ∥ ̂Υ(q) − ̂Υ(q′

)∥ ≤ γ∥q − q′

∥
qk+1 = ̂Υ*(qk), q0 ∈ Rn×m
⇒ qk → ̂Q* k → ∞
̂π*d
= arg max
a∈A
̂Q*( ⋅ , a)
̂Υπ
(q; hπ
T)(s, a)
:=
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
(rt + γq(st+1, at+1)
)
∑
T−1
t=0
𝕀 {s=st}
𝕀
{a=at}
, ∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at} > 0
q(s, a),
̂Υ*(q; hπ
T)(s, a)
:=
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
(rt + γ maxa∈A q(st+1, a′
)
)
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
, ∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at} > 0
q(s, a),
qk+1 = ̂Υ*(qk : hπ
∞), q0 ∈ Rn×m
⇒ qk → Q* k → ∞
qt+1 = (1 − αt)qt + αt
̂Υ*(qt : {St, At, Rt, St+1}),
𝔼
[∥q0∥] ≤ const
αt ≥ 0, ∀t ∈ ℤ≥0
∑
t∈ℤ≥0
αt
𝕀
{s=st}
𝕀
{a=at} = ∞, ∀(s, a) ∈ S × A
∑
t∈ℤ≥0
α2
t
𝕀
{s=st}
𝕀
{a=at} < ∞, ∀(s, a) ∈ S × A
lim
t→∞
𝔼
[∥qt − Q*∥2
] = 0
qk+1 = ̂Υ*(qk : hπ
∞), q0 ∈ Rn×m
⇒ qk → Q* k → ∞
qt+1 = (1 − αt)qt + αt
̂Υ*(qt : {St, At, Rt, St+1}),
𝔼
[∥q0∥] ≤ const
at ∼ π( ⋅ |st)
rt, st+1 ∼ g(st, at), pT( ⋅ : st, at)
̂qt+1(st, at) = ̂qt+1(st, at) + αt(rt + γ max
a′

∈A
̂qt(st+1, at) − ̂q(st, at))
π*d
= arg max
a∈A
̂q∞( ⋅ , a)
vk+1 = B*(vk), v0 ∈ Rn
⇒ vk → V* k → ∞
qt+1 = (1 − αt)qt + αt
̂Υ*(qt : {St, At, Rt, St+1})
xk+1 = ft(xk)
x*
ft(x*) = 0
lim
t→∞
∥xt − x*∥ = 0
vk+1 = B*(vk), v0 ∈ Rn
⇒ vk → V* k → ∞
qt+1 = (1 − αt)qt + αt
̂Υ*(qt : {St, At, Rt, St+1})
xk+1 = ft(xk, ω)
x*
ft(x*, ω) = 0, ∀ω ∈ Ω
lim
t→∞
E[∥xt − x*∥2
] = 0
qt+1 = (1 − αt)qt + αt
̂Υ*(qt : {St, At, Rt, St+1})
= (1 − αt)qt + αt(Υ*(qt) + Xt)
Xt := ̂Υ*(qt : {St, At, Rt, St+1}) − Υ*(qt)
𝔼
[Xt] = 0,
𝔼
[∥Xt∥2
] ≤ const

Contenu connexe

Similaire à 強化学習勉強会6の資料

Наибольшая общая мера: 2500 лет
Наибольшая общая мера: 2500 летНаибольшая общая мера: 2500 лет
Наибольшая общая мера: 2500 летsixtyone
 
Oceans 2019 tutorial-geophysical-nav_7-updated
Oceans 2019 tutorial-geophysical-nav_7-updatedOceans 2019 tutorial-geophysical-nav_7-updated
Oceans 2019 tutorial-geophysical-nav_7-updatedFrancisco Curado-Teixeira
 
Responsibility as Indian - Protection of Dharma, Samskriti and Society
Responsibility as Indian - Protection of Dharma, Samskriti and SocietyResponsibility as Indian - Protection of Dharma, Samskriti and Society
Responsibility as Indian - Protection of Dharma, Samskriti and SocietySajjana Bharathi
 
تحطيم الأوهام الإدارية
تحطيم الأوهام الإداريةتحطيم الأوهام الإدارية
تحطيم الأوهام الإداريةDr Ghaiath Hussein
 
شرح أركان الإيمان لأمة الإسلام من عقيدة العوام
شرح أركان الإيمان لأمة الإسلام من عقيدة العوامشرح أركان الإيمان لأمة الإسلام من عقيدة العوام
شرح أركان الإيمان لأمة الإسلام من عقيدة العوامAbdel-Rahman Al-Khattab
 
ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ssusere0a682
 
Diploma - French Diploma
Diploma - French DiplomaDiploma - French Diploma
Diploma - French DiplomaIlham Aminuddin
 
【ゲーム理論応用】 - 寡占市場分析4 -
【ゲーム理論応用】 - 寡占市場分析4 -【ゲーム理論応用】 - 寡占市場分析4 -
【ゲーム理論応用】 - 寡占市場分析4 -ssusere0a682
 
Fisica matematica final
Fisica matematica finalFisica matematica final
Fisica matematica finaldanbohe
 
09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raices09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raicesHipólito Aguilar
 
Functional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network PerceptionFunctional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network PerceptionAtsushi Nitanda
 
Kriya Sharir_Hand_Book.pdf
Kriya Sharir_Hand_Book.pdfKriya Sharir_Hand_Book.pdf
Kriya Sharir_Hand_Book.pdfRahulTale6
 
とちぎRuby会議01(原)
とちぎRuby会議01(原)とちぎRuby会議01(原)
とちぎRuby会議01(原)Shin-ichiro HARA
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilarWidmar Aguilar Gonzalez
 

Similaire à 強化学習勉強会6の資料 (20)

raseswara.compressed
raseswara.compressedraseswara.compressed
raseswara.compressed
 
Наибольшая общая мера: 2500 лет
Наибольшая общая мера: 2500 летНаибольшая общая мера: 2500 лет
Наибольшая общая мера: 2500 лет
 
Oceans 2019 tutorial-geophysical-nav_7-updated
Oceans 2019 tutorial-geophysical-nav_7-updatedOceans 2019 tutorial-geophysical-nav_7-updated
Oceans 2019 tutorial-geophysical-nav_7-updated
 
Polar regions hindii
Polar regions hindiiPolar regions hindii
Polar regions hindii
 
Responsibility as Indian - Protection of Dharma, Samskriti and Society
Responsibility as Indian - Protection of Dharma, Samskriti and SocietyResponsibility as Indian - Protection of Dharma, Samskriti and Society
Responsibility as Indian - Protection of Dharma, Samskriti and Society
 
32.28
32.2832.28
32.28
 
تحطيم الأوهام الإدارية
تحطيم الأوهام الإداريةتحطيم الأوهام الإدارية
تحطيم الأوهام الإدارية
 
شرح أركان الإيمان لأمة الإسلام من عقيدة العوام
شرح أركان الإيمان لأمة الإسلام من عقيدة العوامشرح أركان الإيمان لأمة الإسلام من عقيدة العوام
شرح أركان الإيمان لأمة الإسلام من عقيدة العوام
 
Prelude to halide_public
Prelude to halide_publicPrelude to halide_public
Prelude to halide_public
 
ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-
 
Diploma - French Diploma
Diploma - French DiplomaDiploma - French Diploma
Diploma - French Diploma
 
【ゲーム理論応用】 - 寡占市場分析4 -
【ゲーム理論応用】 - 寡占市場分析4 -【ゲーム理論応用】 - 寡占市場分析4 -
【ゲーム理論応用】 - 寡占市場分析4 -
 
Data bank KALLARA village vaikom.Kallara grama panchayath - James Joseph adh...
Data bank KALLARA  village vaikom.Kallara grama panchayath - James Joseph adh...Data bank KALLARA  village vaikom.Kallara grama panchayath - James Joseph adh...
Data bank KALLARA village vaikom.Kallara grama panchayath - James Joseph adh...
 
College raging2
College raging2College raging2
College raging2
 
Fisica matematica final
Fisica matematica finalFisica matematica final
Fisica matematica final
 
09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raices09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raices
 
Functional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network PerceptionFunctional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network Perception
 
Kriya Sharir_Hand_Book.pdf
Kriya Sharir_Hand_Book.pdfKriya Sharir_Hand_Book.pdf
Kriya Sharir_Hand_Book.pdf
 
とちぎRuby会議01(原)
とちぎRuby会議01(原)とちぎRuby会議01(原)
とちぎRuby会議01(原)
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilar
 

Dernier

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 

Dernier (20)

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 

強化学習勉強会6の資料

  • 1.
  • 2.
  • 3. M = {S, A, pT, p0, g} Pr{St+1 = s′  |At = a, St = s, …} = Pr{St+1 = s′  |At = a, St = s} =: pT(s′  |s, a), Pr(S0 = s) =: p0(s) π ∈ ΠM Pr(At = a|St = s, …) = Pr(At = a|St = s) =: π(a|s) V* Vπ (s) := 𝔼 π [C0 |S0 = s], Ct := ∞ ∑ i=0 γi g(At+i, St+i), γ ∈ [0,1) f(π) f(π) := ∑ s∈S p0(s)Vπ (s) π∈ΠM f(π) M
  • 4. V* = max π∈ΠM Vπ = max a∈A (g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)V*(s′  )) = B*(V*) ⇒ V* π* π*d = arg max a∈A g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)V*(s′  ) B* ⇔ ∥B*(v) − B*(u)∥ ≤ γ∥v − u∥ vk+1 = B*(vk), v0 ∈ Rn ⇒ vk → V* k → ∞
  • 5. Bπ Vπ (s):= ∑ a∈A π(a|s)[g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)Vπ (s′  )] = 𝔼 π [g(St, At) + γVπ (St+1) St = s, ] B*V*(s):= max a∈A (g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)V*(s′  )) = max π∈ΠM 𝔼 π [g(St, At) + γV*(St+1) St = s]
  • 6.
  • 7. Ct Ct := ∞ ∑ i=0 γi g(At+i, St+i), γ ∈ [0,1) Vπ Vπ (s) := 𝔼 π [C0 |S0 = s] V* Vπ (s) := max π∈ΠM 𝔼 π [C0 |S0 = s] Qπ Qπ (s, a) := 𝔼 π [C0 |S0 = s, A0 = a] Q* Q*(s, a) := max π∈ΠM 𝔼 π [C0 |S0 = s, A0 = a] Vπ (s) = ∑ a∈A Qπ (s, a)π(a|s), V*(s) = max a∈A Q*(s, a) π*d = arg max a∈A Q*( ⋅ , a) Υπ Qπ (s, a):= 𝔼 π [g(St, At) + γQπ (St+1, At+1) St = s, At = a] = g(s, a) + γ ∑ s′  ,a′  ∈S×A pT(s′  |s, a)π(a′  |s′  )Qπ (s′  , a′  ) Υ*Q*(s, a):= 𝔼 π [g(St, At) + γ max a′  ∈A Q*(St+1, a′  ) St = s, At = a] = g(s, a) + γ max a′  ∈A ∑ s′  ∈S pT(s′  |s, a)π(a′  |s′  )Q*(s′  , a′  )
  • 8. Υπ (q) = g( ⋅ ) + γ ∑ s′  ,a′  ∈S×A pT(s′  | ⋅ )π(a′  |s′  )q(s′  , a′  ) Υ*(q) = g( ⋅ ) + γ max a′  ∈A ∑ s′  ∈S pT(s′ | ⋅ )π(a′  |s′  )q(s′  , a′  ) q, q′  : S × A → ℝ q ≤ q′  ⇔ q(s, a) ≤ q′  (s, a), ∀s, a ∈ S × A ∥q − q′  ∥ := max s,a∈S×A |q(s, a) − q′  (s, a)| q ≤ q′  ⇒ Υ(q) ≤ Υ(q′  ) Υ(q + c) = Υ(q) + γc, ∀c ∈ ℝ ⇔ ∥Υ(q) − Υ(q′  )∥ ≤ γ∥q − q′  ∥ qk+1 = Υ*(qk), q0 ∈ Rn×m ⇒ qk → Q* k → ∞ π*d = arg max a∈A Q*( ⋅ , a)
  • 9.
  • 10. Hπ t := {S0, A0, R0, …, St−1, At−1, Rt−1, At M(π)} hπ t := {s0, a0, r0, …, st−1, at−1, rt−1, st M(π)} ̂Υπ (q; hπ T)(s, a) := ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} (rt + γq(st+1, at+1) ) ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} , ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} > 0 q(s, a), ̂Υ*(q; hπ T)(s, a) := ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} (rt + γ maxa∈A q(st+1, a′  ) ) ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} , ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} > 0 q(s, a), Υπ Qπ (s, a):= 𝔼 π [g(St, At) + γQπ (St+1, At+1) St = s, At = a] Υ*Q*(s, a):= 𝔼 π [g(St, At) + γ max a′  ∈A Q*(St+1, a′  ) St = s, At = a]
  • 11. lim T→∞ 1 T T ∑ i=1 Pr(St = s, At = a|M(π)) > 0, ∀(s, a) ∈ S × A ̂Υπ ( ⋅ ; hT) → Υπ , ̂Υ*( ⋅ ; hT) → Υ* T → ∞ q ≤ q′  ⇒ ̂Υ(q) ≤ ̂Υ(q′  ) ̂Υ(q + c) = ̂Υ(q) + γc, ∀c ∈ ℝ ⇔ ∥ ̂Υ(q) − ̂Υ(q′  )∥ ≤ γ∥q − q′  ∥ qk+1 = ̂Υ*(qk), q0 ∈ Rn×m ⇒ qk → ̂Q* k → ∞ ̂π*d = arg max a∈A ̂Q*( ⋅ , a) ̂Υπ (q; hπ T)(s, a) := ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} (rt + γq(st+1, at+1) ) ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} , ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} > 0 q(s, a), ̂Υ*(q; hπ T)(s, a) := ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} (rt + γ maxa∈A q(st+1, a′ ) ) ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} , ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} > 0 q(s, a),
  • 12.
  • 13. qk+1 = ̂Υ*(qk : hπ ∞), q0 ∈ Rn×m ⇒ qk → Q* k → ∞ qt+1 = (1 − αt)qt + αt ̂Υ*(qt : {St, At, Rt, St+1}), 𝔼 [∥q0∥] ≤ const αt ≥ 0, ∀t ∈ ℤ≥0 ∑ t∈ℤ≥0 αt 𝕀 {s=st} 𝕀 {a=at} = ∞, ∀(s, a) ∈ S × A ∑ t∈ℤ≥0 α2 t 𝕀 {s=st} 𝕀 {a=at} < ∞, ∀(s, a) ∈ S × A lim t→∞ 𝔼 [∥qt − Q*∥2 ] = 0
  • 14. qk+1 = ̂Υ*(qk : hπ ∞), q0 ∈ Rn×m ⇒ qk → Q* k → ∞ qt+1 = (1 − αt)qt + αt ̂Υ*(qt : {St, At, Rt, St+1}), 𝔼 [∥q0∥] ≤ const at ∼ π( ⋅ |st) rt, st+1 ∼ g(st, at), pT( ⋅ : st, at) ̂qt+1(st, at) = ̂qt+1(st, at) + αt(rt + γ max a′  ∈A ̂qt(st+1, at) − ̂q(st, at)) π*d = arg max a∈A ̂q∞( ⋅ , a)
  • 15.
  • 16. vk+1 = B*(vk), v0 ∈ Rn ⇒ vk → V* k → ∞ qt+1 = (1 − αt)qt + αt ̂Υ*(qt : {St, At, Rt, St+1}) xk+1 = ft(xk) x* ft(x*) = 0 lim t→∞ ∥xt − x*∥ = 0
  • 17. vk+1 = B*(vk), v0 ∈ Rn ⇒ vk → V* k → ∞ qt+1 = (1 − αt)qt + αt ̂Υ*(qt : {St, At, Rt, St+1}) xk+1 = ft(xk, ω) x* ft(x*, ω) = 0, ∀ω ∈ Ω lim t→∞ E[∥xt − x*∥2 ] = 0
  • 18. qt+1 = (1 − αt)qt + αt ̂Υ*(qt : {St, At, Rt, St+1}) = (1 − αt)qt + αt(Υ*(qt) + Xt) Xt := ̂Υ*(qt : {St, At, Rt, St+1}) − Υ*(qt) 𝔼 [Xt] = 0, 𝔼 [∥Xt∥2 ] ≤ const