SlideShare une entreprise Scribd logo
1  sur  38
Detailed Description
on Cross Entropy Loss Function
ICSL Seminar
김범준
2019. 01. 03
 Cross Entropy Loss
- Classification 문제에서 범용적으로 사용
- Prediction과 Label 사이의 Cross Entropy를 계산
- 구체적인 이론적 근거 조사, 직관적 의미 해석
𝐻 𝑃, 𝑄 = −
𝑖=1
𝑐
𝑝𝑖 𝑙𝑜𝑔(𝑞𝑖)
• Theoretical Derivation
- Binary Classification Problem
- Multiclass Classification Problem
• Intuitive understanding
- Relation to the KL-Divergence
• Theoretical Derivation
- Binary Classification Problem
- Multiclass Classification Problem
• Intuitive understanding
- Relation to the KL-Divergence
NN
𝑥1 𝜃
ℎ 𝜃 𝑥1 = 0.1 𝑦1 = 0
Image Classifier Prediction Label
NN
𝑥2 𝜃
ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1
NN
𝑥1 𝜃
ℎ 𝜃 𝑥1 = 0.1 𝑦1 = 0
Image Classifier Prediction Label
NN
𝑥2 𝜃
ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1
[0, 0, 0, 1, 1, 1]
𝑦1, … , 𝑦 𝑚𝑥1, … , 𝑥 𝑚
: Training Dataset
𝜃
NN
𝑥1 𝜃
ℎ 𝜃 𝑥1 = 0.1 𝑦1 = 0
Image Classifier Prediction Label
NN
𝑥2 𝜃
ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1
[0, 0, 0, 1, 1, 1]
𝑦1, … , 𝑦 𝑚𝑥1, … , 𝑥 𝑚
𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: 𝐿 𝜃 = 𝑝(𝑦1, … , 𝑦 𝑚|𝑥1, … , 𝑥 𝑚; 𝜃)
: Training Dataset
𝜃
: 에 의해 [0, 0, 0, 1, 1, 1]로 Prediction이 나올법한 정도𝜃
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
입력 image예측 label
NN
𝑥1 𝜃
ℎ 𝜃 𝑥1 = 0.1 𝑦1 = 0
Image Classifier Prediction Label
NN
𝑥2 𝜃
ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1
[0, 0, 0, 1, 1, 1]
𝑦1, … , 𝑦 𝑚𝑥1, … , 𝑥 𝑚
𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: 𝐿 𝜃 = 𝑝(𝑦1, … , 𝑦 𝑚|𝑥1, … , 𝑥 𝑚; 𝜃)
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥(𝐿(𝜃))
: [0, 0, 0, 1, 1, 1]로 Prediction이 가장 나올법한 를 선택한다
𝜃
: 에 의해 [0, 0, 0, 1, 1, 1]로 Prediction이 나올법한 정도𝜃
𝜃
: Training Dataset
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
Image Classifier Prediction Label
NN
𝑥2 𝜃
ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1
𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖)
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
Image Classifier Prediction Label
𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖)
NN
𝑥2 𝜃
ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1
𝑝 𝑦𝑖 = 0 𝑥𝑖; 𝜃 = 1 − ℎ 𝜃(𝑥𝑖)
NN
𝑥1 𝜃
ℎ 𝜃 𝑥1 = 0.1 𝑦1 = 0
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖)
𝑝 𝑦𝑖 = 0 𝑥𝑖; 𝜃 = 1 − ℎ 𝜃(𝑥𝑖)
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖)
𝑝 𝑦𝑖 = 0 𝑥𝑖; 𝜃 = 1 − ℎ 𝜃(𝑥𝑖)
즉, 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖
𝑦 𝑖 1 − ℎ 𝜃 𝑥𝑖
1−𝑦 𝑖
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
: 베르누이 분포
𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖)
𝑝 𝑦𝑖 = 0 𝑥𝑖; 𝜃 = 1 − ℎ 𝜃(𝑥𝑖)
즉, 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖
𝑦 𝑖 1 − ℎ 𝜃 𝑥𝑖
1−𝑦 𝑖
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
𝐿 𝜃 = 𝑝 𝑦1, … , 𝑦 𝑚 𝑥1, … , 𝑥 𝑚; 𝜃
=
𝑖=1
𝑚
𝑝 𝑦𝑖 𝑥𝑖; 𝜃 ∵ 𝑖. 𝑖. 𝑑 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛
* i.i.d : independent and identically distributed
: 베르누이 분포
𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖)
𝑝 𝑦𝑖 = 0 𝑥𝑖; 𝜃 = 1 − ℎ 𝜃(𝑥𝑖)
즉, 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖
𝑦 𝑖 1 − ℎ 𝜃 𝑥𝑖
1−𝑦 𝑖
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
𝐿 𝜃 = 𝑝 𝑦1, … , 𝑦 𝑚 𝑥1, … , 𝑥 𝑚; 𝜃
=
𝑖=1
𝑚
𝑝 𝑦𝑖 𝑥𝑖; 𝜃 ∵ 𝑖. 𝑖. 𝑑 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛
=
𝑖=1
𝑚
ℎ 𝜃 𝑥𝑖
𝑦 𝑖 1 − ℎ 𝜃 𝑥𝑖
1−𝑦 𝑖
* i.i.d : independent and identically distributed
: 베르누이 분포
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃 (∵log는 단조증가 함수)
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛( 𝑖=1
𝑚
[−𝑦𝑖 log ℎ 𝜃 𝑥𝑖 − (1 − 𝑦𝑖) log(1 − ℎ 𝜃 𝑥𝑖 )]) (∵ 𝑙𝑜𝑔 성질)
𝐿 𝜃 =
𝑖=1
𝑚
ℎ 𝜃 𝑥𝑖
𝑦 𝑖 1 − ℎ 𝜃 𝑥𝑖
1−𝑦 𝑖
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛( 𝑖=1
𝑚
[−𝑦𝑖 log ℎ 𝜃 𝑥𝑖 − (1 − 𝑦𝑖) log(1 − ℎ 𝜃 𝑥𝑖 )])
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖
𝑤ℎ𝑒𝑟𝑒 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 = −𝑦𝑖 log ℎ 𝜃 𝑥𝑖 − 1 − 𝑦𝑖 log 1 − ℎ 𝜃 𝑥𝑖
: 𝐵𝑖𝑛𝑎𝑟𝑦 𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛( 𝑖=1
𝑚
[−𝑦𝑖 log ℎ 𝜃 𝑥𝑖 − (1 − 𝑦𝑖) log(1 − ℎ 𝜃 𝑥𝑖 )])
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖
𝑤ℎ𝑒𝑟𝑒 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 = −𝑦𝑖 log ℎ 𝜃 𝑥𝑖 − 1 − 𝑦𝑖 log 1 − ℎ 𝜃 𝑥𝑖
: 𝐵𝑖𝑛𝑎𝑟𝑦 𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦
ℎ 𝜃 𝑥𝑖 , 𝑦𝑖 ∈ 0, 1 인 확률값
Maximize Likelihood Minimize Binary Cross Entropy
Binary Classification Problem
NN
𝑥1 𝜃
ℎ 𝜃 𝑥1 = [𝟎. 𝟗, 0.05, 0.05] 𝑦1 = [1, 0, 0]
Image Classifier Prediction Label
NN
𝑥2 𝜃
ℎ 𝜃 𝑥2 = [0.03, 𝟎. 𝟗𝟓, 0.02] 𝑦2 = [0, 1, 0]
NN
𝑥3 𝜃
ℎ 𝜃 𝑥3 = [0.01, 0.01, 𝟎. 𝟗𝟖] 𝑦3 = [0, 0, 1]
NN
𝑥1 𝜃
ℎ 𝜃 𝑥1 = [𝟎. 𝟗, 0.05, 0.05] 𝑦1 = [1, 0, 0]
Image Classifier Prediction Label
𝑝 𝑦𝑖 = [1, 0, 0] 𝑥𝑖; 𝜃
= 𝑝 𝑦𝑖(0) = 1 𝑥𝑖; 𝜃) (𝐴𝑠𝑠𝑢𝑚𝑒 𝑂𝑛𝑒ℎ𝑜𝑡 𝑒𝑛𝑐𝑜𝑑𝑖𝑛𝑔)
NN
𝑥1 𝜃
ℎ 𝜃 𝑥1 = [𝟎. 𝟗, 0.05, 0.05] 𝑦1 = [1, 0, 0]
Image Classifier Prediction Label
𝑝 𝑦𝑖 = [1, 0, 0] 𝑥𝑖; 𝜃
= 𝑝 𝑦𝑖(0) = 1 𝑥𝑖; 𝜃)
= ℎ 𝜃 𝑥𝑖 (0)
NN
𝑥1 𝜃
ℎ 𝜃 𝑥1 = [𝟎. 𝟗, 0.05, 0.05] 𝑦1 = [1, 0, 0]
Image Classifier Prediction Label
𝑝 𝑦𝑖 = [1, 0, 0] 𝑥𝑖; 𝜃
= 𝑝 𝑦𝑖(0) = 1 𝑥𝑖; 𝜃)
= ℎ 𝜃 𝑥𝑖 (0)
같은 방법으로,
𝑝 𝑦𝑖 = [0, 1, 0] 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 1
𝑝 𝑦𝑖 = [0, 0, 1] 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 (2)
𝑝 𝑦𝑖 = [1, 0, 0] 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 (0)
𝑝 𝑦𝑖 = [0, 1, 0] 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 1
𝑝 𝑦𝑖 = [0, 0, 1] 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 (2)
즉, 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 0 𝑦 𝑖(0)
ℎ 𝜃 𝑥𝑖 1 𝑦 𝑖(1)
ℎ 𝜃 𝑥𝑖 2 𝑦 𝑖(2)
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
[−𝑦𝑖 0 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(0) − 𝑦𝑖 1 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(1) − 𝑦𝑖 2 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(2)]
𝑝 𝑦𝑖 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 0 𝑦 𝑖(0)
ℎ 𝜃 𝑥𝑖 1 𝑦 𝑖(1)
ℎ 𝜃 𝑥𝑖 2 𝑦 𝑖(2)
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
[−𝑦𝑖 0 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(0) − 𝑦𝑖 1 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(1) − 𝑦𝑖 2 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(2)]
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖
𝑤ℎ𝑒𝑟𝑒 𝐻 𝑃, 𝑄 = −
𝑖=1
𝑐
𝑝𝑖 𝑙𝑜𝑔(𝑞𝑖)
: 𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦
𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
[−𝑦𝑖 0 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(0) − 𝑦𝑖 1 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(1) − 𝑦𝑖 2 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(2)]
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖
ℎ 𝜃 𝑥𝑖 , 𝑦𝑖는 Probability Distribution
Maximize Likelihood Minimize Cross Entropy
Multiclass Classification Problem
𝑤ℎ𝑒𝑟𝑒 𝐻 𝑃, 𝑄 = −
𝑖=1
𝑐
𝑝𝑖 𝑙𝑜𝑔(𝑞𝑖)
: 𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦
• Theoretical Derivation
- Binary Classification Problem
- Multiclass Classification Problem
• Intuitive understanding
- Relation to the KL-Divergence
𝐻 𝑃, 𝑄
=
𝑖=1
𝑐
𝑝𝑖 𝑙𝑜𝑔
1
𝑞𝑖
* KL-Divergence : Kullback–Leibler divergence
𝐻 𝑃, 𝑄
=
𝑖=1
𝑐
𝑝𝑖 𝑙𝑜𝑔
1
𝑞𝑖
=
𝑖=1
𝑐
(𝑝𝑖 𝑙𝑜𝑔
𝑝𝑖
𝑞𝑖
+ 𝑝𝑖 𝑙𝑜𝑔
1
𝑝𝑖
)
𝐻 𝑃, 𝑄
=
𝑖=1
𝑐
𝑝𝑖 𝑙𝑜𝑔
1
𝑞𝑖
=
𝑖=1
𝑐
(𝑝𝑖 𝑙𝑜𝑔
𝑝𝑖
𝑞𝑖
+ 𝑝𝑖 𝑙𝑜𝑔
1
𝑝𝑖
)
= 𝐾𝐿(𝑃| 𝑄 + 𝐻(𝑃)
P 자체가 갖는 entropy
KL-Divergence
Cross-entropy
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖
Maximize Likelihood Minimize Cross Entropy
Multiclass Classification Problem
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
(𝐾𝐿(𝑦𝑖||ℎ 𝜃 𝑥𝑖 ) + 𝐻(𝑦𝑖) ) (∵ 𝐻 𝑃, 𝑄 = 𝐾𝐿(𝑃| 𝑄 + 𝐻 𝑃 )
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
(𝐾𝐿(𝑦𝑖||ℎ 𝜃 𝑥𝑖 ) + 𝐻(𝑦𝑖) )
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
(𝐾𝐿(𝑦𝑖||ℎ 𝜃 𝑥𝑖 ) (∵OnehotEncoding된 label의 entropy는 0)
Maximize Likelihood Minimize Cross Entropy
Multiclass Classification Problem
Minimize KL-Divergence
𝜃
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
(𝐾𝐿(𝑦𝑖||ℎ 𝜃 𝑥𝑖 ) + 𝐻(𝑦𝑖) )
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1
𝑚
(𝐾𝐿(𝑦𝑖||ℎ 𝜃 𝑥𝑖 ) (∵OnehotEncoding된 label의 entropy는 0)
 정보 이론의 관점에서는 KL-divergence를 직관적으로 “놀라움의 정도”로 이해 가능
 (예) 준결승 진출팀 : LG 트윈스, 한화 이글스, NC 다이노스, 삼성 라이온즈
- 예측 모델 1) :
- 예측 모델 2) :
- 경기 결과 :
- 예측 모델 2)에서 더 큰 놀라움을 확인
- 놀라움의 정도를 최소화  Q가 P로 근사됨  두 확률 분포가 닮음  정확한 예측
𝑦 = 𝑃 = [1, 0, 0, 0]
𝑦 = 𝑄 = [𝟎. 𝟗, 0.03, 0.03, 0.04]
𝑦 = 𝑄 = [0.3, 𝟎. 𝟔 0.05, 0.05]
𝐾𝐿(𝑃| 𝑄 =
𝑖=1
𝑐
(𝑝𝑖 𝑙𝑜𝑔
𝑝𝑖
𝑞𝑖
)
Maximize Likelihood Minimize Cross Entropy
Multiclass Classification Problem
Minimize KL-Divergence
Minimize Surprisal
Approximate prediction to label
Better classification performance in general

Contenu connexe

Tendances

The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networksmilad abbasi
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural NetworkVignesh Suresh
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionEntrepreneur / Startup
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKMd Rajib Bhuiyan
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Basit Rafiq
 
Densely Connected Convolutional Networks
Densely Connected Convolutional NetworksDensely Connected Convolutional Networks
Densely Connected Convolutional NetworksHosein Mohebbi
 

Tendances (20)

The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Deep learning
Deep learningDeep learning
Deep learning
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORK
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Densely Connected Convolutional Networks
Densely Connected Convolutional NetworksDensely Connected Convolutional Networks
Densely Connected Convolutional Networks
 

Similaire à Detailed Description on Cross Entropy Loss Function

07-Convolution.pptx signal spectra and signal processing
07-Convolution.pptx signal spectra and signal processing07-Convolution.pptx signal spectra and signal processing
07-Convolution.pptx signal spectra and signal processingJordanJohmMallillin
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffTaeoh Kim
 
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix MappingDual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mappinginventionjournals
 
publisher in research
publisher in researchpublisher in research
publisher in researchrikaseorika
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Hojin Yang
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine LearningSEMINARGROOT
 
Functions of severable variables
Functions of severable variablesFunctions of severable variables
Functions of severable variablesSanthanam Krishnan
 
research on journaling
research on journalingresearch on journaling
research on journalingrikaseorika
 
Mentor mix review
Mentor mix reviewMentor mix review
Mentor mix reviewtaeseon ryu
 
Maximum Likelihood Estimation of Beetle
Maximum Likelihood Estimation of BeetleMaximum Likelihood Estimation of Beetle
Maximum Likelihood Estimation of BeetleLiang Kai Hu
 
Computer aided design
Computer aided designComputer aided design
Computer aided designAbhi23396
 
Regularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD DetectionRegularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD Detectionkirk68
 

Similaire à Detailed Description on Cross Entropy Loss Function (20)

07-Convolution.pptx signal spectra and signal processing
07-Convolution.pptx signal spectra and signal processing07-Convolution.pptx signal spectra and signal processing
07-Convolution.pptx signal spectra and signal processing
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion Tradeoff
 
lec32.ppt
lec32.pptlec32.ppt
lec32.ppt
 
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix MappingDual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
 
Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...
Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...
Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
publisher in research
publisher in researchpublisher in research
publisher in research
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
 
lec19.ppt
lec19.pptlec19.ppt
lec19.ppt
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine Learning
 
Functions of severable variables
Functions of severable variablesFunctions of severable variables
Functions of severable variables
 
research on journaling
research on journalingresearch on journaling
research on journaling
 
Mentor mix review
Mentor mix reviewMentor mix review
Mentor mix review
 
Maximum Likelihood Estimation of Beetle
Maximum Likelihood Estimation of BeetleMaximum Likelihood Estimation of Beetle
Maximum Likelihood Estimation of Beetle
 
Computer aided design
Computer aided designComputer aided design
Computer aided design
 
Regularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD DetectionRegularisation & Auxiliary Information in OOD Detection
Regularisation & Auxiliary Information in OOD Detection
 
lec39.ppt
lec39.pptlec39.ppt
lec39.ppt
 
Z transforms
Z transformsZ transforms
Z transforms
 

Dernier

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 

Dernier (20)

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Detailed Description on Cross Entropy Loss Function

  • 1. Detailed Description on Cross Entropy Loss Function ICSL Seminar 김범준 2019. 01. 03
  • 2.  Cross Entropy Loss - Classification 문제에서 범용적으로 사용 - Prediction과 Label 사이의 Cross Entropy를 계산 - 구체적인 이론적 근거 조사, 직관적 의미 해석 𝐻 𝑃, 𝑄 = − 𝑖=1 𝑐 𝑝𝑖 𝑙𝑜𝑔(𝑞𝑖)
  • 3. • Theoretical Derivation - Binary Classification Problem - Multiclass Classification Problem • Intuitive understanding - Relation to the KL-Divergence
  • 4. • Theoretical Derivation - Binary Classification Problem - Multiclass Classification Problem • Intuitive understanding - Relation to the KL-Divergence
  • 5. NN 𝑥1 𝜃 ℎ 𝜃 𝑥1 = 0.1 𝑦1 = 0 Image Classifier Prediction Label NN 𝑥2 𝜃 ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1
  • 6. NN 𝑥1 𝜃 ℎ 𝜃 𝑥1 = 0.1 𝑦1 = 0 Image Classifier Prediction Label NN 𝑥2 𝜃 ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1 [0, 0, 0, 1, 1, 1] 𝑦1, … , 𝑦 𝑚𝑥1, … , 𝑥 𝑚 : Training Dataset 𝜃
  • 7. NN 𝑥1 𝜃 ℎ 𝜃 𝑥1 = 0.1 𝑦1 = 0 Image Classifier Prediction Label NN 𝑥2 𝜃 ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1 [0, 0, 0, 1, 1, 1] 𝑦1, … , 𝑦 𝑚𝑥1, … , 𝑥 𝑚 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: 𝐿 𝜃 = 𝑝(𝑦1, … , 𝑦 𝑚|𝑥1, … , 𝑥 𝑚; 𝜃) : Training Dataset 𝜃 : 에 의해 [0, 0, 0, 1, 1, 1]로 Prediction이 나올법한 정도𝜃 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖) 입력 image예측 label
  • 8. NN 𝑥1 𝜃 ℎ 𝜃 𝑥1 = 0.1 𝑦1 = 0 Image Classifier Prediction Label NN 𝑥2 𝜃 ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1 [0, 0, 0, 1, 1, 1] 𝑦1, … , 𝑦 𝑚𝑥1, … , 𝑥 𝑚 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: 𝐿 𝜃 = 𝑝(𝑦1, … , 𝑦 𝑚|𝑥1, … , 𝑥 𝑚; 𝜃) 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥(𝐿(𝜃)) : [0, 0, 0, 1, 1, 1]로 Prediction이 가장 나올법한 를 선택한다 𝜃 : 에 의해 [0, 0, 0, 1, 1, 1]로 Prediction이 나올법한 정도𝜃 𝜃 : Training Dataset 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
  • 9. Image Classifier Prediction Label NN 𝑥2 𝜃 ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1 𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖) 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
  • 10. Image Classifier Prediction Label 𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖) NN 𝑥2 𝜃 ℎ 𝜃 𝑥2 = 0.95 𝑦2 = 1 𝑝 𝑦𝑖 = 0 𝑥𝑖; 𝜃 = 1 − ℎ 𝜃(𝑥𝑖) NN 𝑥1 𝜃 ℎ 𝜃 𝑥1 = 0.1 𝑦1 = 0 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
  • 11. 𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖) 𝑝 𝑦𝑖 = 0 𝑥𝑖; 𝜃 = 1 − ℎ 𝜃(𝑥𝑖) 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
  • 12. 𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖) 𝑝 𝑦𝑖 = 0 𝑥𝑖; 𝜃 = 1 − ℎ 𝜃(𝑥𝑖) 즉, 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 𝑦 𝑖 1 − ℎ 𝜃 𝑥𝑖 1−𝑦 𝑖 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖) : 베르누이 분포
  • 13. 𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖) 𝑝 𝑦𝑖 = 0 𝑥𝑖; 𝜃 = 1 − ℎ 𝜃(𝑥𝑖) 즉, 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 𝑦 𝑖 1 − ℎ 𝜃 𝑥𝑖 1−𝑦 𝑖 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖) 𝐿 𝜃 = 𝑝 𝑦1, … , 𝑦 𝑚 𝑥1, … , 𝑥 𝑚; 𝜃 = 𝑖=1 𝑚 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 ∵ 𝑖. 𝑖. 𝑑 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 * i.i.d : independent and identically distributed : 베르누이 분포
  • 14. 𝑝 𝑦𝑖 = 1 𝑥𝑖; 𝜃 = ℎ 𝜃(𝑥𝑖) 𝑝 𝑦𝑖 = 0 𝑥𝑖; 𝜃 = 1 − ℎ 𝜃(𝑥𝑖) 즉, 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 𝑦 𝑖 1 − ℎ 𝜃 𝑥𝑖 1−𝑦 𝑖 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖) 𝐿 𝜃 = 𝑝 𝑦1, … , 𝑦 𝑚 𝑥1, … , 𝑥 𝑚; 𝜃 = 𝑖=1 𝑚 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 ∵ 𝑖. 𝑖. 𝑑 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 = 𝑖=1 𝑚 ℎ 𝜃 𝑥𝑖 𝑦 𝑖 1 − ℎ 𝜃 𝑥𝑖 1−𝑦 𝑖 * i.i.d : independent and identically distributed : 베르누이 분포
  • 16. 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃 (∵log는 단조증가 함수)
  • 17. 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛( 𝑖=1 𝑚 [−𝑦𝑖 log ℎ 𝜃 𝑥𝑖 − (1 − 𝑦𝑖) log(1 − ℎ 𝜃 𝑥𝑖 )]) (∵ 𝑙𝑜𝑔 성질) 𝐿 𝜃 = 𝑖=1 𝑚 ℎ 𝜃 𝑥𝑖 𝑦 𝑖 1 − ℎ 𝜃 𝑥𝑖 1−𝑦 𝑖
  • 18. 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛( 𝑖=1 𝑚 [−𝑦𝑖 log ℎ 𝜃 𝑥𝑖 − (1 − 𝑦𝑖) log(1 − ℎ 𝜃 𝑥𝑖 )]) = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 𝑤ℎ𝑒𝑟𝑒 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 = −𝑦𝑖 log ℎ 𝜃 𝑥𝑖 − 1 − 𝑦𝑖 log 1 − ℎ 𝜃 𝑥𝑖 : 𝐵𝑖𝑛𝑎𝑟𝑦 𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦
  • 19. 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛( 𝑖=1 𝑚 [−𝑦𝑖 log ℎ 𝜃 𝑥𝑖 − (1 − 𝑦𝑖) log(1 − ℎ 𝜃 𝑥𝑖 )]) = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 𝑤ℎ𝑒𝑟𝑒 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 = −𝑦𝑖 log ℎ 𝜃 𝑥𝑖 − 1 − 𝑦𝑖 log 1 − ℎ 𝜃 𝑥𝑖 : 𝐵𝑖𝑛𝑎𝑟𝑦 𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ℎ 𝜃 𝑥𝑖 , 𝑦𝑖 ∈ 0, 1 인 확률값 Maximize Likelihood Minimize Binary Cross Entropy Binary Classification Problem
  • 20. NN 𝑥1 𝜃 ℎ 𝜃 𝑥1 = [𝟎. 𝟗, 0.05, 0.05] 𝑦1 = [1, 0, 0] Image Classifier Prediction Label NN 𝑥2 𝜃 ℎ 𝜃 𝑥2 = [0.03, 𝟎. 𝟗𝟓, 0.02] 𝑦2 = [0, 1, 0] NN 𝑥3 𝜃 ℎ 𝜃 𝑥3 = [0.01, 0.01, 𝟎. 𝟗𝟖] 𝑦3 = [0, 0, 1]
  • 21. NN 𝑥1 𝜃 ℎ 𝜃 𝑥1 = [𝟎. 𝟗, 0.05, 0.05] 𝑦1 = [1, 0, 0] Image Classifier Prediction Label 𝑝 𝑦𝑖 = [1, 0, 0] 𝑥𝑖; 𝜃 = 𝑝 𝑦𝑖(0) = 1 𝑥𝑖; 𝜃) (𝐴𝑠𝑠𝑢𝑚𝑒 𝑂𝑛𝑒ℎ𝑜𝑡 𝑒𝑛𝑐𝑜𝑑𝑖𝑛𝑔)
  • 22. NN 𝑥1 𝜃 ℎ 𝜃 𝑥1 = [𝟎. 𝟗, 0.05, 0.05] 𝑦1 = [1, 0, 0] Image Classifier Prediction Label 𝑝 𝑦𝑖 = [1, 0, 0] 𝑥𝑖; 𝜃 = 𝑝 𝑦𝑖(0) = 1 𝑥𝑖; 𝜃) = ℎ 𝜃 𝑥𝑖 (0)
  • 23. NN 𝑥1 𝜃 ℎ 𝜃 𝑥1 = [𝟎. 𝟗, 0.05, 0.05] 𝑦1 = [1, 0, 0] Image Classifier Prediction Label 𝑝 𝑦𝑖 = [1, 0, 0] 𝑥𝑖; 𝜃 = 𝑝 𝑦𝑖(0) = 1 𝑥𝑖; 𝜃) = ℎ 𝜃 𝑥𝑖 (0) 같은 방법으로, 𝑝 𝑦𝑖 = [0, 1, 0] 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 1 𝑝 𝑦𝑖 = [0, 0, 1] 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 (2)
  • 24. 𝑝 𝑦𝑖 = [1, 0, 0] 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 (0) 𝑝 𝑦𝑖 = [0, 1, 0] 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 1 𝑝 𝑦𝑖 = [0, 0, 1] 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 (2) 즉, 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 0 𝑦 𝑖(0) ℎ 𝜃 𝑥𝑖 1 𝑦 𝑖(1) ℎ 𝜃 𝑥𝑖 2 𝑦 𝑖(2) 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖)
  • 25. 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖) 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃
  • 26. 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖) 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 [−𝑦𝑖 0 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(0) − 𝑦𝑖 1 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(1) − 𝑦𝑖 2 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(2)] 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 = ℎ 𝜃 𝑥𝑖 0 𝑦 𝑖(0) ℎ 𝜃 𝑥𝑖 1 𝑦 𝑖(1) ℎ 𝜃 𝑥𝑖 2 𝑦 𝑖(2)
  • 27. 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖) 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 [−𝑦𝑖 0 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(0) − 𝑦𝑖 1 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(1) − 𝑦𝑖 2 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(2)] = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 𝑤ℎ𝑒𝑟𝑒 𝐻 𝑃, 𝑄 = − 𝑖=1 𝑐 𝑝𝑖 𝑙𝑜𝑔(𝑞𝑖) : 𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦
  • 28. 𝑁𝑜𝑡𝑎𝑡𝑖𝑜𝑛: 𝑝 𝑌 = 𝑦𝑖|𝑋 = 𝑥𝑖 = 𝑝(𝑦𝑖|𝑥𝑖) 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛(− 𝑙𝑜𝑔 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 [−𝑦𝑖 0 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(0) − 𝑦𝑖 1 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(1) − 𝑦𝑖 2 𝑙𝑜𝑔ℎ 𝜃(𝑥𝑖)(2)] = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 ℎ 𝜃 𝑥𝑖 , 𝑦𝑖는 Probability Distribution Maximize Likelihood Minimize Cross Entropy Multiclass Classification Problem 𝑤ℎ𝑒𝑟𝑒 𝐻 𝑃, 𝑄 = − 𝑖=1 𝑐 𝑝𝑖 𝑙𝑜𝑔(𝑞𝑖) : 𝐶𝑟𝑜𝑠𝑠 𝐸𝑛𝑡𝑟𝑜𝑝𝑦
  • 29. • Theoretical Derivation - Binary Classification Problem - Multiclass Classification Problem • Intuitive understanding - Relation to the KL-Divergence
  • 30. 𝐻 𝑃, 𝑄 = 𝑖=1 𝑐 𝑝𝑖 𝑙𝑜𝑔 1 𝑞𝑖 * KL-Divergence : Kullback–Leibler divergence
  • 31. 𝐻 𝑃, 𝑄 = 𝑖=1 𝑐 𝑝𝑖 𝑙𝑜𝑔 1 𝑞𝑖 = 𝑖=1 𝑐 (𝑝𝑖 𝑙𝑜𝑔 𝑝𝑖 𝑞𝑖 + 𝑝𝑖 𝑙𝑜𝑔 1 𝑝𝑖 )
  • 32. 𝐻 𝑃, 𝑄 = 𝑖=1 𝑐 𝑝𝑖 𝑙𝑜𝑔 1 𝑞𝑖 = 𝑖=1 𝑐 (𝑝𝑖 𝑙𝑜𝑔 𝑝𝑖 𝑞𝑖 + 𝑝𝑖 𝑙𝑜𝑔 1 𝑝𝑖 ) = 𝐾𝐿(𝑃| 𝑄 + 𝐻(𝑃) P 자체가 갖는 entropy KL-Divergence Cross-entropy
  • 33. 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 Maximize Likelihood Minimize Cross Entropy Multiclass Classification Problem
  • 34. 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 (𝐾𝐿(𝑦𝑖||ℎ 𝜃 𝑥𝑖 ) + 𝐻(𝑦𝑖) ) (∵ 𝐻 𝑃, 𝑄 = 𝐾𝐿(𝑃| 𝑄 + 𝐻 𝑃 )
  • 35. 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 (𝐾𝐿(𝑦𝑖||ℎ 𝜃 𝑥𝑖 ) + 𝐻(𝑦𝑖) ) = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 (𝐾𝐿(𝑦𝑖||ℎ 𝜃 𝑥𝑖 ) (∵OnehotEncoding된 label의 entropy는 0)
  • 36. Maximize Likelihood Minimize Cross Entropy Multiclass Classification Problem Minimize KL-Divergence 𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 𝐻 𝑦𝑖, ℎ 𝜃 𝑥𝑖 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 (𝐾𝐿(𝑦𝑖||ℎ 𝜃 𝑥𝑖 ) + 𝐻(𝑦𝑖) ) = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖=1 𝑚 (𝐾𝐿(𝑦𝑖||ℎ 𝜃 𝑥𝑖 ) (∵OnehotEncoding된 label의 entropy는 0)
  • 37.  정보 이론의 관점에서는 KL-divergence를 직관적으로 “놀라움의 정도”로 이해 가능  (예) 준결승 진출팀 : LG 트윈스, 한화 이글스, NC 다이노스, 삼성 라이온즈 - 예측 모델 1) : - 예측 모델 2) : - 경기 결과 : - 예측 모델 2)에서 더 큰 놀라움을 확인 - 놀라움의 정도를 최소화  Q가 P로 근사됨  두 확률 분포가 닮음  정확한 예측 𝑦 = 𝑃 = [1, 0, 0, 0] 𝑦 = 𝑄 = [𝟎. 𝟗, 0.03, 0.03, 0.04] 𝑦 = 𝑄 = [0.3, 𝟎. 𝟔 0.05, 0.05] 𝐾𝐿(𝑃| 𝑄 = 𝑖=1 𝑐 (𝑝𝑖 𝑙𝑜𝑔 𝑝𝑖 𝑞𝑖 )
  • 38. Maximize Likelihood Minimize Cross Entropy Multiclass Classification Problem Minimize KL-Divergence Minimize Surprisal Approximate prediction to label Better classification performance in general