1. 2017 IEEE CIG
Game Data Mining Competition (GDMC)
(https://cilab.sejong.ac.kr/gdmc2017/
1
KyungJoong Kim, Dumim Yoon and Jihoon Jeon
(Cognition & Intelligence Lab, Sejong University)
Sung-il Yang and SangKwang Lee
(Electronics and Telecommunications Research Institute)
EunJo Lee and Yoonjae Jang
(NCSOFT)
2. Game Data Mining
• Understanding game players’ behaviors from data
• Especially, predict players’ churn/retention or purchase
behaviors from game log data
• Few public datasets available to researchers and it limits the
growth of the field
2
3. Game Data Mining Competition
• Access to the big game log data (about 100G) from
commercially successful MMORPG game, Blade & Soul
by NCSOFT, one of the biggest game companies in South
Korea
• Predict the game players’ churn (binary classification problem)
and survival time (regression problem) from the massive
game log data
3
5. Competition Tracks
Track 1: Churn Prediction
In this track, participants will predict players’ churn or retention
on the test datasets. The winner will be determined based on
the average F1-Measure.
Track 2: Survival Analysis
In this track, participants will predict the survival time (the
number of days) of game players on the test datasets. The
winner will be determined based on the average Root Mean
Squared Logarithmic Error (RMSLE).
5
6. GDMC 2017 Homepage
• Important Dates
• Problem Description
• Tutorial (with R)
• Data Description
• Rules
6
https://cilab.sejong.ac.kr/gdmc2017/
7. GDMC 2017 Google Groups
https://groups.google.com/d/forum/gdmc2017
• Announcement
• Sample Log
• Log Schema
• Log Data Download
• Training Data
• Test Data without Label
• Question/Answer
7
0
76
106
206
255 264
0
50
100
150
200
250
300
March April May June July August
#ofMembers
8. Test Server
http://web_cilab.sejong.ac.kr/gdmcServer/
8
• Test your predictions
before the deadline
• 10% of test data used for
this test server (not used in
final rankings)
• For security reason, limit
maximum 48 trials per day
(30 minutes waiting time
from the last submission)
11. Predictions about 3 Weeks from Now
11
Churn/Retention
Time
Three WeeksTwo Months
User Data
12. Churn/Retention
• Long-term inactive state
as a Churn
• How many weeks for
churn decision?
• Five Weeks
• Retention: Logged in the
game more than once
during the five weeks
12
14. Data Description
14
Data Set Time Period Weeks
Number of
Gamers
Data Size*
Training APR-1-2017 ~ MAY-11-2017 6
4000
(30% churn)
48G
(175m
Events)
Test Set 1 JULY-27-2016 ~ SEP-21-2016 8
3000
(30% churn)
30G
Test Set 2 DEC-14-2017 ~ FEB-08-2017 8
3000
(30% churn)
30G
* Uncompressed Size
15. Log Data Sample
15
Time Event Type Details (up to 72 columns)
2016-05-04 6:38:32 PM Enter World Login Type, Actor Data …
2016-05-04 6:39:16 PM Enter Zone Enter Zone Reason, Zone Type …
2016-05-04 6:39:36 PM Lose Item Item Type, Item Count, …
2016-05-04 6:39:36 PM Get Item Item Type, Item Count, …
2016-05-04 6:39:40 PM Get Item Item Type, Item Count, …
⋮ ⋮ ⋮
82 Event Types
(World, Zone, Item, Party, Quest, Guild)
17. Participants (13 Teams)
17
Team name Team member Affiliation Type County
GoAlone 1 Yonsei University Academia South Korea
DTND 3 DTND ? South Korea
goedle.io 2 goedle.io GmbH Industry Germany
IISLABSKKU 3 Sungkyunkwan University Academia South Korea
leessang 2 Yonsei University Academia South Korea
TheCowKing 2 KAIST Academia South Korea
TripleS 3 - ? South Korea
UTU 4 University of Turku Academia Finland
YD 6 Silicon Studio Industry Japan
YK 1 Yonsei University Academia South Korea
suya 1 Yonsei University Academia South Korea
NoJam 3 Yonsei University Academia South Korea
MNDS 3 Yonsei University Academia South Korea
19. YD (Winner)
• Silicon Studio, Japan
• Team Members: Paul Bertens, Pei Pei Chen, Kexin Chen, Anna
Guitart, Sovann Lay, Africa Perianez
• Find features which have similar distribution between training
set and testing set.
• Test 1 : LSTM + DNN (implemented with Keras)
• Test 2 : Extra Tree Classifier (# of trees = 50)
19
23. Participants (5 Teams)
23
Team name Team member Affiliation County
DTND 3 DTND South Korea
IISLABSKKU 3 Sungkyunkwan University South Korea
TripleS 3 - South Korea
UTU 4 University of Turku Finland
YD 6 Silicon Studio Japan
25. 25
Rank Team Techniques
1 YD
Ensemble of Conditional Inference Trees
(# of Trees = 900)
2 IISLABSKKU Tree Boosting
3 UTU Linear Regression
4 TripleS Ensemble Tree Method
5 DTND Generalized Linear Model
Neural Net
Tree
Approach
Linear
Models
26. Future Data Use
• Data Download Deadline
• Active until end of August, we’re under discussion to extend the
deadline
• Data Use for Academic Research
• No restriction on the data use for academic research (please include
acknowledgement on this competition and NCSOFT)
• Test Data Label
• We’ll open the test data label soon.
26