SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Road to
Winning at Horse Racing
with Data Science
2018/06/27(Wed)
AlphaImpact
NUKUI Shun
How to win?
2
Profile
● Name: NUKUI Shun
● Speciality: Machine Learning
● Experience of horse racing: 11 years
● Bio
○ The president of horse racing club in Tokyo Tech ~2017/03
○ Participated in 電脳賞(春) 2016/03
○ AlphaImpact (development of horse racing AI) 2016/06~
● Favorites: Watching training progress of LightGBM, Kaggle(Home Credit Default Risk)
3
AlphaImpact Project
● Developing horse racing AI
● Members
○ NUKUI Shun : Machine Learning, Horse racing domain knowledge
○ OMOTO Tsukasa : Machine Learning, System Architect, One of committers of LightGBM
○ HARA Tomonori : Horse racing hacker
● Activities
○ HP: https://alphaimpact.jp/
○ Published AI scores for free (~2018/06/24)
○ Sell predictions on netkeiba ウマい馬券
■ http://yoso.netkeiba.com/
4
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
5
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
6
What is the Purpose of Horse Racing Prediction?
● Hit?
7
What is the Purpose of Horse Racing Prediction?
● Hit?
● MAKE A PROFIT!!
8
Which should we bet on?
1 2 3
win proba. (勝率) 70% 20% 10%
best choice?
9
Which should we bet on?
1 2 3
win proba. (勝率) 70% 20% 10%
x1.1 x5.2 x8.0odds (オッズ)
exp. (期待値) 1.1 * 0.70 = 0.77 5.2 * 0.20 = 1.04 8.0 * 0.10 = 0.8
best choice!!
10
House Edge (控除率)
House edge
total sales
Refund as payoff 70%~80% = mean of return ratio
11
Betting Type
● Win (単勝)
● Place (複勝)
● Bracket Quinella (枠連)
● Quinella (馬連)
● Quinella Place (ワイド)
● Exacta (馬単)
● Trio (3連複)
● Trifecta (3連単)
20.0%
22.5%
25.0%
27.5%
house edge
12
Not Impossible to Win
● Mr. 卍 (Manji) made a profit of 140 million in a 3 years
● His theory and analytical skill is great, but we believe it is
not impossible to exceed him by machine learning
● Not hitting tickets are acknowledged as expenses under
certain conditions
13
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
14
What should we solve?
● classification
○ imbalance problem
● regression
○ better choice
○ important to design smoothed objective
● ranking
○ seems to be a natural choice too
○ but not better performance than regression
15
● Strength
○ place of order (着順)
○ standardized time (標準化走破タイム)
○ standardized velocity (標準化平均速度)
○ prize (賞金)
○ speed index (スピード指数)
● Profit
○ place (=within top3) payoff (複勝払い戻し)
■ 1st 120 yen < 3rd 540 yen
○ dark score (business secret)
■ transform strength score to be high correlated with return ratio
Objective of Regression
16
Two types of Objective vs Hit Ratio
standardized velocity dark score
over-popular
17
all turf races in 2017
Two types of Objective vs Return Ratio
standardized velocity dark score 18
all turf races in 2017
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
19
Feature Engineering in Horse Racing
● Most important and most time-consuming part
● Necessary to collect data by ourselves, unlike Kaggle
● Difficult to handle complicated structured data
● Requires deep domain knowledge to horse racing
20
Horse Table (馬柱)
race info
horse attribute
jockey
odds
race histories
21
Many Types of History
horse
jockey
trainer
22
Automated Achievement Features
horse
jockey
trainer
owner
sire (父馬)
…
track type
course
length
course×length
weather
field condition
pace
…
count
win hit ratio
place hit ratio
win return ratio
place return ratio
subject race condition statistic
● We made more than 1500 achievement features
23require domain knowledge
Smoothing Ratio Features
● Eliminate noise of unpopular subject×condition
count=2
hit ratio=0.5
count=100
hit ratio=0.4
A B
hit ratio=0.2
average
unconfident
where α=0.1 hit ratio=0.254 hit ratio=0.399
24
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
25
● One of the most popular frameworks in Kaggle
● Very strong to structured data
● Not affected by scales of features
● Can handle the missing value
● Robust to meaningless features
● We have a committer of LightGBM
LightGBM
26
Training Architecture
Race1
Race2
RaceN
・・・
Race
Race
Race
split into k-folds
by the race
(not by the horse)
LightGBM
LightGBM
LightGBM
cross validation
(the valid set is used for
early stopping)
valid score
hyperopt
score
feedback
update params
average
27
shuffle
Notice of LightGBM
● Early stopping is important to avoid overfitting
● Dummied features are better than categorical ones
○ Depends on dataset
● Sensitive to random_state
○ May result from subsample/colsample_bytree?
● pred_contrib is useful for feature analysis
28
Feature Analysis with LightGBM
= predict(data, pred_contrib=True)# of horses
# of features
feature contribution
matrix
29
Feature Analysis with LightGBM
30
Road to winning
1. Introduction to the system of horse racing
2. Definition of objective
3. Feature engineering
4. Prediction model
5. Evaluation
31
nDCG
● Used in ranking evaluation
● Higher relevant score (reli
) should be positioned at higher rank
non-negative, “the higher, the better”
32
The Relevant Score of nDCG
● inverse of order
○ 1/1, 1/2, …, 1/N
○ consider the whole ranking
● prize
○ 15000, 6000, 3800, 2300, 1500, 0, 0, ..., 0
○ consider only top 5
● prize@3
○ 15000, 6000, 3800, 0, 0, 0, 0, ..., 0
● place payoff
○ emphasize the dark horses
● win betting share (単勝支持率)
○ how close to popularity
33
The Relevant Score of nDCG
strength model1 (standardized velocity)
34
strength model2 (standardized velocity)
● model1 is more accurate than model2
● model2 is closer to win betting share
● model1 is able to hit the horses that the public cannot predict
all turf races in 2017
Evaluation of Top-N Box Betting
35
storength model (standardized velocity) profit model (dark score)
the lower hit,
the higher return
all turf races in 2017
Summary
● The purpose of horse racing prediction is making a profit
● Design of objective is critical to performance
● Feature engineering requires domain knowledge too
● LightGBM is cool
● nDCG is useful for model evaluation
36
Future work
● Calculate expectation with predicted probability and real-time odds
● Auto buying system with investment strategies
37
Enjoy horse racing life!!
38

Contenu connexe

Tendances

時系列分析による異常検知入門
時系列分析による異常検知入門時系列分析による異常検知入門
時系列分析による異常検知入門
Yohei Sato
 
オープンデータのためのスクレイピング
オープンデータのためのスクレイピングオープンデータのためのスクレイピング
オープンデータのためのスクレイピング
直之 伊藤
 

Tendances (20)

協調フィルタリング入門
協調フィルタリング入門協調フィルタリング入門
協調フィルタリング入門
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理
 
レコメンドアルゴリズムの基本と周辺知識と実装方法
レコメンドアルゴリズムの基本と周辺知識と実装方法レコメンドアルゴリズムの基本と周辺知識と実装方法
レコメンドアルゴリズムの基本と周辺知識と実装方法
 
[DL Hacks]Variational Approaches For Auto-Encoding Generative Adversarial Ne...
[DL Hacks]Variational Approaches For Auto-Encoding  Generative Adversarial Ne...[DL Hacks]Variational Approaches For Auto-Encoding  Generative Adversarial Ne...
[DL Hacks]Variational Approaches For Auto-Encoding Generative Adversarial Ne...
 
逆求人自己紹介プレゼン(平木場)
逆求人自己紹介プレゼン(平木場)逆求人自己紹介プレゼン(平木場)
逆求人自己紹介プレゼン(平木場)
 
バンディットアルゴリズム入門と実践
バンディットアルゴリズム入門と実践バンディットアルゴリズム入門と実践
バンディットアルゴリズム入門と実践
 
機械学習モデルのサービングとは?
機械学習モデルのサービングとは?機械学習モデルのサービングとは?
機械学習モデルのサービングとは?
 
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
 
機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門
 
Graphic Notes on Linear Algebra and Data Science
Graphic Notes on Linear Algebra and Data ScienceGraphic Notes on Linear Algebra and Data Science
Graphic Notes on Linear Algebra and Data Science
 
時系列分析による異常検知入門
時系列分析による異常検知入門時系列分析による異常検知入門
時系列分析による異常検知入門
 
情報推薦システム入門:講義スライド
情報推薦システム入門:講義スライド情報推薦システム入門:講義スライド
情報推薦システム入門:講義スライド
 
実務と論文で学ぶジョブレコメンデーション最前線2022
実務と論文で学ぶジョブレコメンデーション最前線2022実務と論文で学ぶジョブレコメンデーション最前線2022
実務と論文で学ぶジョブレコメンデーション最前線2022
 
機械学習システムのアーキテクチャアラカルト
機械学習システムのアーキテクチャアラカルト機械学習システムのアーキテクチャアラカルト
機械学習システムのアーキテクチャアラカルト
 
最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)
最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)
最新の多様な深層強化学習モデルとその応用(第40回強化学習アーキテクチャ講演資料)
 
MLOps入門
MLOps入門MLOps入門
MLOps入門
 
#IkaLog によるスプラトゥーンの画像解析と機械学習
#IkaLog によるスプラトゥーンの画像解析と機械学習#IkaLog によるスプラトゥーンの画像解析と機械学習
#IkaLog によるスプラトゥーンの画像解析と機械学習
 
オープンデータのためのスクレイピング
オープンデータのためのスクレイピングオープンデータのためのスクレイピング
オープンデータのためのスクレイピング
 
LINE のUI自動テスト事例
LINE のUI自動テスト事例LINE のUI自動テスト事例
LINE のUI自動テスト事例
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 

Similaire à Road to Winning at Horse Racing with Data Science

Final Report HeroMoto
Final Report HeroMotoFinal Report HeroMoto
Final Report HeroMoto
Vivek Pabla
 
Formula SAE: Racing Cars Built by Students
Formula SAE: Racing Cars Built by StudentsFormula SAE: Racing Cars Built by Students
Formula SAE: Racing Cars Built by Students
Speck&Tech
 

Similaire à Road to Winning at Horse Racing with Data Science (20)

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 
Implementation and analysis of search algorithms in single player connect fou...
Implementation and analysis of search algorithms in single player connect fou...Implementation and analysis of search algorithms in single player connect fou...
Implementation and analysis of search algorithms in single player connect fou...
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move PruningImproving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
 
Click prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real lifeClick prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real life
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Predicting Winning Price in Real Time Bidding with Censored Data
Predicting Winning Price in Real Time Bidding with Censored DataPredicting Winning Price in Real Time Bidding with Censored Data
Predicting Winning Price in Real Time Bidding with Censored Data
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Final Report HeroMoto
Final Report HeroMotoFinal Report HeroMoto
Final Report HeroMoto
 
Formula SAE: Racing Cars Built by Students
Formula SAE: Racing Cars Built by StudentsFormula SAE: Racing Cars Built by Students
Formula SAE: Racing Cars Built by Students
 
Machine learning presentation
Machine learning presentationMachine learning presentation
Machine learning presentation
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
Carvana auctioned kicked car prediction
Carvana auctioned kicked car predictionCarvana auctioned kicked car prediction
Carvana auctioned kicked car prediction
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
 
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the C...
 
Ad Placement Challenge
Ad Placement ChallengeAd Placement Challenge
Ad Placement Challenge
 

Dernier

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 

Dernier (20)

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Road to Winning at Horse Racing with Data Science

  • 1. Road to Winning at Horse Racing with Data Science 2018/06/27(Wed) AlphaImpact NUKUI Shun
  • 3. Profile ● Name: NUKUI Shun ● Speciality: Machine Learning ● Experience of horse racing: 11 years ● Bio ○ The president of horse racing club in Tokyo Tech ~2017/03 ○ Participated in 電脳賞(春) 2016/03 ○ AlphaImpact (development of horse racing AI) 2016/06~ ● Favorites: Watching training progress of LightGBM, Kaggle(Home Credit Default Risk) 3
  • 4. AlphaImpact Project ● Developing horse racing AI ● Members ○ NUKUI Shun : Machine Learning, Horse racing domain knowledge ○ OMOTO Tsukasa : Machine Learning, System Architect, One of committers of LightGBM ○ HARA Tomonori : Horse racing hacker ● Activities ○ HP: https://alphaimpact.jp/ ○ Published AI scores for free (~2018/06/24) ○ Sell predictions on netkeiba ウマい馬券 ■ http://yoso.netkeiba.com/ 4
  • 5. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 5
  • 6. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 6
  • 7. What is the Purpose of Horse Racing Prediction? ● Hit? 7
  • 8. What is the Purpose of Horse Racing Prediction? ● Hit? ● MAKE A PROFIT!! 8
  • 9. Which should we bet on? 1 2 3 win proba. (勝率) 70% 20% 10% best choice? 9
  • 10. Which should we bet on? 1 2 3 win proba. (勝率) 70% 20% 10% x1.1 x5.2 x8.0odds (オッズ) exp. (期待値) 1.1 * 0.70 = 0.77 5.2 * 0.20 = 1.04 8.0 * 0.10 = 0.8 best choice!! 10
  • 11. House Edge (控除率) House edge total sales Refund as payoff 70%~80% = mean of return ratio 11
  • 12. Betting Type ● Win (単勝) ● Place (複勝) ● Bracket Quinella (枠連) ● Quinella (馬連) ● Quinella Place (ワイド) ● Exacta (馬単) ● Trio (3連複) ● Trifecta (3連単) 20.0% 22.5% 25.0% 27.5% house edge 12
  • 13. Not Impossible to Win ● Mr. 卍 (Manji) made a profit of 140 million in a 3 years ● His theory and analytical skill is great, but we believe it is not impossible to exceed him by machine learning ● Not hitting tickets are acknowledged as expenses under certain conditions 13
  • 14. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 14
  • 15. What should we solve? ● classification ○ imbalance problem ● regression ○ better choice ○ important to design smoothed objective ● ranking ○ seems to be a natural choice too ○ but not better performance than regression 15
  • 16. ● Strength ○ place of order (着順) ○ standardized time (標準化走破タイム) ○ standardized velocity (標準化平均速度) ○ prize (賞金) ○ speed index (スピード指数) ● Profit ○ place (=within top3) payoff (複勝払い戻し) ■ 1st 120 yen < 3rd 540 yen ○ dark score (business secret) ■ transform strength score to be high correlated with return ratio Objective of Regression 16
  • 17. Two types of Objective vs Hit Ratio standardized velocity dark score over-popular 17 all turf races in 2017
  • 18. Two types of Objective vs Return Ratio standardized velocity dark score 18 all turf races in 2017
  • 19. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 19
  • 20. Feature Engineering in Horse Racing ● Most important and most time-consuming part ● Necessary to collect data by ourselves, unlike Kaggle ● Difficult to handle complicated structured data ● Requires deep domain knowledge to horse racing 20
  • 21. Horse Table (馬柱) race info horse attribute jockey odds race histories 21
  • 22. Many Types of History horse jockey trainer 22
  • 23. Automated Achievement Features horse jockey trainer owner sire (父馬) … track type course length course×length weather field condition pace … count win hit ratio place hit ratio win return ratio place return ratio subject race condition statistic ● We made more than 1500 achievement features 23require domain knowledge
  • 24. Smoothing Ratio Features ● Eliminate noise of unpopular subject×condition count=2 hit ratio=0.5 count=100 hit ratio=0.4 A B hit ratio=0.2 average unconfident where α=0.1 hit ratio=0.254 hit ratio=0.399 24
  • 25. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 25
  • 26. ● One of the most popular frameworks in Kaggle ● Very strong to structured data ● Not affected by scales of features ● Can handle the missing value ● Robust to meaningless features ● We have a committer of LightGBM LightGBM 26
  • 27. Training Architecture Race1 Race2 RaceN ・・・ Race Race Race split into k-folds by the race (not by the horse) LightGBM LightGBM LightGBM cross validation (the valid set is used for early stopping) valid score hyperopt score feedback update params average 27 shuffle
  • 28. Notice of LightGBM ● Early stopping is important to avoid overfitting ● Dummied features are better than categorical ones ○ Depends on dataset ● Sensitive to random_state ○ May result from subsample/colsample_bytree? ● pred_contrib is useful for feature analysis 28
  • 29. Feature Analysis with LightGBM = predict(data, pred_contrib=True)# of horses # of features feature contribution matrix 29
  • 30. Feature Analysis with LightGBM 30
  • 31. Road to winning 1. Introduction to the system of horse racing 2. Definition of objective 3. Feature engineering 4. Prediction model 5. Evaluation 31
  • 32. nDCG ● Used in ranking evaluation ● Higher relevant score (reli ) should be positioned at higher rank non-negative, “the higher, the better” 32
  • 33. The Relevant Score of nDCG ● inverse of order ○ 1/1, 1/2, …, 1/N ○ consider the whole ranking ● prize ○ 15000, 6000, 3800, 2300, 1500, 0, 0, ..., 0 ○ consider only top 5 ● prize@3 ○ 15000, 6000, 3800, 0, 0, 0, 0, ..., 0 ● place payoff ○ emphasize the dark horses ● win betting share (単勝支持率) ○ how close to popularity 33
  • 34. The Relevant Score of nDCG strength model1 (standardized velocity) 34 strength model2 (standardized velocity) ● model1 is more accurate than model2 ● model2 is closer to win betting share ● model1 is able to hit the horses that the public cannot predict all turf races in 2017
  • 35. Evaluation of Top-N Box Betting 35 storength model (standardized velocity) profit model (dark score) the lower hit, the higher return all turf races in 2017
  • 36. Summary ● The purpose of horse racing prediction is making a profit ● Design of objective is critical to performance ● Feature engineering requires domain knowledge too ● LightGBM is cool ● nDCG is useful for model evaluation 36
  • 37. Future work ● Calculate expectation with predicted probability and real-time odds ● Auto buying system with investment strategies 37
  • 38. Enjoy horse racing life!! 38