The first days of Kaggle. Beginner’s Experience in 15 Lessons Learned.

The first days of Kaggle
Beginner’s Experience in 15 Lessons Learned
2023, June 8 | Samvel Kocharyan

Who am I?
https://www.kaggle.com/samvelkoch

Competition
1805 Teams
13% Public LB
248 Patients (Train)
384 Patients Total
Peptides Train Proteins Train

Metric
SMAPE (+1)
Symmetric Mean Absolute Percentage (+1)
UPDRS
Unified Parkinson's
Disease Rating Scale
The goal is to predict UPDRS scores that
measure the severity of Parkinson's disease:
• UPDRS_1 - Mentation, Behavior, and Mood
• UPDRS_2 - Activities of Daily Living
• UPDRS_3 – Body Motor Functions
• UPDRS_4 - Complications of Therapy
The higher the value, the higher the severity
Predict values for the current month and values
6, 12, 24 months later.
So, for one visit we need to predict 16 values.

Results
• Team Experience on Kaggle: none
• Notebooks created: 242
• Models created: 53
• Submissions: 91
• Score result: TOP 6.94%
• PB result: TOP 15% (262nd place)
• Winning team score by competition
metric (SMAPE): 60.042
• Average score in PB: 72.278
• Team score: 69.759
• Bronze Score: 69.743
• Silver Score: 69.738
• Gold Score: 60.936

1st place solution
Final solution is a simple average of two models: LGB and NN.
Both models were trained on the same features
• Visit month
• Forecast horizon
• Target prediction month
• Indicator whether blood was taken during the visit
• Indicators whether a patient visit occurred on 6th, 18th and 48th
month
• Count of number of previous “non-annual” visits (6th or 18th)
• Index of the target (pivot the dataset to have a single target column)
The winning solution fully ignores the results of the blood tests. Team
tried hard to find any signal in this crucial piece of the data, but
unfortunately came to the conclusion that none of their approaches or
models can benefit from blood test features significant enough to
distinguish it from random variations.

Lesson 1
Effective solutions can
be simple.

Lesson 2
Competitive Data
Science will take up all
your free time

Lesson 3
Be prepared for the tree of
hypotheses and options to
grow indefinitely.
A system for tracking
experiment results and
logging changes will be
needed very soon.

Lesson 4
You will probably
spend a lot of time
on ideas that will not
work….
But it will be an
invaluable
experience.

Lesson 5
Search for similar
competitions in the
past. Learn winning
techniques. Apply it.

Lesson 6
Don’t rely on other
people’s EDA and
automated data
analysis packages

Lesson 7
Ask all kinds of
questions, even the
wildest ones, about the
data and the topic of
the competition. Find
your answers. Consult
the experts in the field
relevant publications.
How many
times do I have to
lose at Kaggle to win ?

Lesson 8
Explain your mission
and approaches at
Kaggle competition to
ducks people far from
data science. Simple
questions and
explanations often
reveal valuable
https://en.wikipedia.org/wiki/Rubber_duck_debugging

Lesson 9
Shake-ups happens…
Public LB may not
reflect the true state of
affairs.

Lesson 10
If the it is a ”Code
Competition” (API for
submitting solutions) be
prepared for a blind battle.
Getting a finished solution
to an accepted submission
via the API may take
longer than you think. ”Take a deep breath, step away from the code, sleep or go
for a walk, take your mind off it, then come back and examine
with fresh eyes”
https://www.kaggle.com/code-competition-debugging
You’re getting an error in a code competition. Now what? Writing code that
works perfectly on unseen data is difficult, even for experts. Don't get
discouraged or feel that you're the only one stuck.
To prevent probing, Kaggle does not provide highly specific debugging
messages in code competitions (whereby Kaggle reruns your code on a
hidden dataset). Submissions that error also count towards your team’s
daily submission limit…

Lesson 11
Don’t give up. There will
be demotivation. Just
don’t give up and go all
the way.

Lesson 12
The Team is Great!
Sharing your suffering,
joys, and triumphs with
your teammates is
priceless.

Lesson 13
Perhaps not everyone in
your own social circle will
appreciate the level of
involvement in the
competition. It really takes
a lot of time and attention.

Lesson 14
The competition is not
over until you understand
the winners’ solutions.
Me starting the Kaggle
competition
Me reading winners' solutions

Lesson 15
Competition is first about
learning and experience,
then about winning over
yourself, and only then
about winning over
others.

Send your complaints, suggestions and job offers
https://www.linkedin.com/in/samvelkoch/
samvelkoch@gmail.com

Bonus 1
Learn the
competition’s
metrics. First along,
then across.

The first days of Kaggle. Beginner’s Experience in 15 Lessons Learned.

Recommandé

Recommandé

Contenu connexe

Similaire à The first days of Kaggle. Beginner’s Experience in 15 Lessons Learned.

Similaire à The first days of Kaggle. Beginner’s Experience in 15 Lessons Learned. (20)

Dernier

Dernier (20)

The first days of Kaggle. Beginner’s Experience in 15 Lessons Learned.

Notes de l'éditeur