SlideShare a Scribd company logo
1 of 47
Tweet
Sentiment
Extraction
1st place solution
by Artsem Zhyvalkouski
from Dark of the Moon
Agenda
1. Background
2. Competition overview
3. Solution summary
4. The “Magic”
5. Other solutions
6. Conclusion
Background
Self-introduction ● Artsem Zhyvalkouski
● From: Minsk, Belarus 🇧🇾
● CS student @ Tokyo City University 🇯🇵
● Did R&D in CV / NLP at a startup
● Love reading NLP papers
● Enjoy learning languages
○ Fluent in 🇧🇾🇷🇺🇬🇧🇯🇵
○ Interested in 🇫🇷🇰🇷
Kaggle profile
Motivation Why did I start working on this competition?
● To familiarize myself with current SOTA in NLP:
Transformer-based models
● To learn PyTorch🔥 and HuggingFace🤗 libraries
● The dataset is rather small so no need in powerful
machines: Colab Pro was enough
● The task seemed fun / unusual and I had time for it
Teammates ● Théo
○ Student in applied Mathematics & Machine Learning
○ 10th in Google Quest Q&A
● Anton
○ Works in IT
○ 10th in Google Quest Q&A
● Hikkiiii
○ NLP / QA background
○ 10th in TensorFlow 2.0 Question Answering
Competition
overview
Task & data ● Task: for a given tweet, predict what word or phrase best
supports the labeled sentiment
● Application example: some businesses may want to know
exactly why people think something about their product
● Data
○ Train: 27k tweets
○ Public / private test: 4k / 8k tweets
text (given) sentiment (given) selected_text (target)
I really really like the
song Love Story by
Taylor Swift
positive like
i need to get my
computer fixed
neutral i need to get my computer
fixed
Sooo SAD I will miss you
here in San Diego!!!
negative Sooo SAD
Evaluation ● Metric: word-level Jaccard score
Problem with
labels
● Some labels are noisy
text (given) sentiment (given) selected_text (target)
hey mia! totally adore
your music. when will
your cd be out?
positive y adore
I know It was worth a
shot, though!
positive as wort
the exact one i was
thinking of the bestttt.
positive e bestttt
Solution
summary
Transformers ● Transformers like BERT, RoBERTa, BART etc. have
become default in SOTA NLP
● Pretrained on a huge amount of texts
● Somewhat heavy and long to train
● Can be used in either NER or QA setup for this task
● QA worked best
Leveraging the
QA setup
● Question: sentiment
● Answer: support phrase
My models:
summary
● RoBERTa-base-squad2, RoBERTa-large-squad2,
DistilRoBERTa-base, XLNet-base-cased
● Pretraining on SQuAD 2.0
○ Task pretraining works
● Avg / Max of layers w/o embedding layer
● Multi Sample Dropout
● AdamW with linear warmup schedule
● Custom loss: Jaccard-based Soft Labels
● Best single model: RoBERTa-base-squad2, 5 fold
stratified CV: 0.715
My models:
architecture
Layer 1
Layer n
...
Embeddings
Transformer
Head
MSD + Dense
Sentiment Sentence Tokens<s> </s> </s> </s>
Probabilities of the token being the start of the selected text
Probabilities of the token being the end of the selected text
MaxPoolAvgPool
Multi Sample
Dropout
● Multi-Sample Dropout for Accelerated Training and
Better Generalization
(https://arxiv.org/pdf/1905.09788.pdf)
*Image from Jigsaw Unintended Bias in Toxicity Classification 8th place solution
Optimizer and
schedule
● AdamW optimizer: Decoupled Weight Decay
Regularization
(https://arxiv.org/pdf/1711.05101.pdf)
● Linear warmup schedule
My models:
custom loss
● Jaccard-based Soft Labels
● Modified version
Théo’s models ● Transformers
○ BERT-base-uncased
○ BERT-large-uncased-wwm
○ DistilBERT
○ ALBERT-large-v2
● Architecture
○ MSD on the concatenation of the last 8 hidden states
● Training
○ Smoothed categorical cross-entropy
○ Discriminative learning rate
(https://arxiv.org/pdf/1801.06146.pdf)
○ Sequence bucketing to speed up the training
Anton’s models
● Transformers
○ RoBERTa-base
○ BERTweet: A pre-trained language model for English
Tweets (https://arxiv.org/pdf/2005.10200.pdf)
● Architecture
○ Same as Théo’s
● Training
○ Smoothed categorical cross-entropy
○ Discriminative learning rate
○ Custom merges.txt file for RoBERTa
Hikkiiii’s models
● Transformers
○ RoBERTa-base
○ RoBERTa-large
● Architecture
○ Append sentiment token to the end of the text
○ CNN + Linear layer on the concatenation of the last
3 hidden states
● Training
○ Standard cross-entropy loss
● Are we done? 🤗
○ Transformers are token-level, hence we
can’t capture the noisy pattern
○ No obvious way to make transformers
character-level
○ Character-level RNNs are not even
nearly as powerful as transformers
○ We can’t simply blend models with
different tokenizations
Problems
● Solution: stacking to the rescue!
○ Convert token probabilities from
transformers to char-level by assigning
each char the probability of its token
○ Feed OOF char-level probabilities from
several transformers into a char-level
NN using stacking
Stacking
Stacking
token level start & end proba
Transformer
target :
Start / end tokens
<sos> Sentiment <sep> Tokens <sep>
char level start & end proba
(token based)
Char NN
target :
Start / end chars
using offsets
char level start / end proba
Characters
…
n models
Concatenate
token level start & end proba
Transformer
target :
Start / end tokens
<sos> Sentiment <sep> Tokens <sep>
char level start & end proba
(token based)
Start & end featuresSentiment
using offsets
Char-level NN:
RNN
Start & end probas
Bidirectional LSTM Embedding
Characters
Embedding
Sentiment
Bidirectional LSTM x2 with skip connection
MSD + Linear
Softmax
Start & end probas
Char-level NN:
CNN
Start & end probas
Conv1D +
BatchNorm
Embedding
Characters
Embedding
Sentiment
Conv1D +
BatchNorm
x4
MSD + Linear
Softmax
Start & end probas
Char-level NN:
WaveNet
Start & end probas
Conv1D +
BatchNorm
Embedding
Characters
Embedding
Sentiment
WaveBlock +
BatchNorm
x3
MSD + Linear
Softmax
Start & end probas
Char-level NNs:
details ● Adam optimizer
● Linear learning rate decay without warmup
● Smoothed Cross Entropy Loss
● Stochastic Weighted Average: Averaging Weights Leads
to Wider Optima and Better Generalization
(https://arxiv.org/pdf/1803.05407.pdf)
● Select the whole text if predicted start_idx > end_idx
An obvious step ● So now we have a lot of different 1st level models
and different 2nd level architectures
● If you participated in a tabular data competition, an
obvious next step is...
1st submission
Public LB : 0.734 (#3)
Private LB : 0.736 (#1)
RoBERTa-large
CV 0.715
XLNet-base-cased
CV 0.707
RoBERTa-base
CV 0.715
DistilRoBERTa-base
CV 0.713
DistilBERT-base-uncased
CV 0.705
BERT-base-uncased
CV 0.710
BERT-large-uncased-wwm
CV 0.710
ALBERT-large-v2
CV 0.711
BERTweet
CV 0.711
RoBERTa-base
CV 0.715
RoBERTa-base
CV 0.712
RoBERTa-large
CV 0.714
CNN
CV 0.7342
CNN
CV 0.7335
WaveNet
CV 0.7347
WaveNet
CV 0.7330
Average
CV
0.7363
1st level
Transformers
2nd level
Char-level NNs
2nd submission
Public LB : 0.734 (#3)
Private LB : 0.735 (#1)
RoBERTa-large
CV 0.715
XLNet-base-cased
CV 0.707
RoBERTa-base
CV 0.715
DistilRoBERTa-base
CV 0.713
DistilBERT-base-uncased
CV 0.705
BERT-base-uncased
CV 0.710
BERT-large-uncased-wwm
CV 0.710
ALBERT-large-v2
CV 0.711
BERTweet
CV 0.711
RoBERTa-base
CV 0.715
RoBERTa-base
CV 0.712
RoBERTa-large
CV 0.714
CNN
CV 0.7342
WaveNet
CV 0.7337
RNN
CV 0.7343
WaveNet
CV 0.7335
Average
CV
0.7365
1st level
Transformers
2nd level
Char-level NNs
Pseudo-labeling ● We used one of our CV 0.7354 blends to pseudo-label the public
test data
● Approach from
the Google Quest Q&A 1st place
solution: “leakless” pseudo-labels
● Confidence score:
(start_probas.max() + end_probas.max()) / 2
● Threshold=0.35 to cut off
low-confidence samples
● This gave a pretty robust boost
of 0.001-0.002 for many models
*Image from https://datawhatnow.com/pseudo-labeling-semi-supervised-
learning/
Final standings
Things that
didn’t work
● NER approach
● BCE + SoftIOU loss like in image segmentation
● Sample weighting
● T5, BART, GPT-2, ELECTRA, XLM-RoBERTa
● Char-level transformer
● Pre-trained embeddings: FastText, Flair
● Statistical features: num_words, num_spaces etc.
● XGBoost as 2nd level
● 3rd level models
● Pre/post-processing
The “Magic”
Finding the
“Magic”
● The noise in the label comes from consecutive spaces
Selected text :
onna
Original text :
is _ back _ home _ now _ _ _ _ _ _ gonna _ miss _ every _ one
● We assumed they were removed during annotation
Text with spaces cleaned :
is _ back _ home _ now _ gonna _ miss _ every _ one
Annotation, on the cleaned text :
is _ back _ home _ now _ gonna _ miss _ every _ one
➔ Stores the start and end indices (?)
● Which results in problems when retrieving the labels on the original text
Retrieved label, on the original text :
is _ back _ home _ now _ _ _ _ _ _ gonna _ miss _ every _ one
➔ 5 removed spaces offsets the label by 5 characters
Using the
“Magic”
● Then, we can post-process our predictions to retrieve the noise
Selected text:
onna
Original text :
is _ back _ home _ now _ _ _ _ _ _ gonna _ miss _ every _ one
● We use a transformer to get the start/end token
We use it on the cleaned text :
Assuming the model perfectly predicts “miss”, perfect start and end predictions would look this way :
is _ back _ home _ now _ gonna _ miss _ every _ one
0 … 0 1 1 1 1 0 ... 0
0 … 0 1 1 1 1 0 ... 0
Because transformer work at token level, the whole word is gonna be selected
● Finally, we can align those predictions with the original text
Prediction on the noisy data :
is _ back _ home _ now _ _ _ _ _ _ gonna _ miss _ every _ one
0 … 0 1 1 1 1 0 ... 0
0 … 0 1 1 1 1 0 ... 0
● Which matches the noisy label !
Why we didn’t
use the “Magic” ● We found the pattern pretty late and didn’t have
enough time to leverage it directly
● Eventually, our 2nd level models learnt the pattern
even better then simple pre/post-processing
Other
solutions
2nd place
solution
● Pre/post-process following the “Magic”
● Sample tweets equally according to their sentiment to
mitigate the imbalance within batches
● Reranking model
1. Store top n candidates from the base model and
assign a “step_1_score” accordingly
2. Train a RoBERTa to predict Jaccard for the
candidates: “step_2_score”
3. Choose the best one by:
step_2_score + step_1_score * 0.5
3rd place
solution: 1st
model
Training
Inference
● Results in k*k combinations
● Argmax over multiplications
3rd place
solution: 2nd
model
Conclusion
Conclusion
● Transformers are perfect for QA
● PyTorch🔥 & HuggingFace🤗 are awesome
● Transformers can be extended to char-level
● Diversity rules
● Annotation process is crucial
● Teaming up is rewarding
● Kaggle is good for learning but impractical sometimes
● Kaggle community is amazing
My links
● Follow me on
○ LinkedIn: https://www.linkedin.com/in/zhyvalkouski
○ Kaggle: https://www.kaggle.com/aruchomu
○ Twitter: https://twitter.com/artem_aruchomu
○ GitHub: https://github.com/heartkilla
● I’m open to new opportunities!
Thanks for listening, stay safe and happy kaggling!

More Related Content

What's hot

BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Yuta Niki
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillationNAVER Engineering
 
Undecidable Problem
Undecidable ProblemUndecidable Problem
Undecidable ProblemMGoodhew
 
Boyer moore algorithm
Boyer moore algorithmBoyer moore algorithm
Boyer moore algorithmAYESHA JAVED
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representationhyunyoung Lee
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneXiaoweiJiang7
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
BERT MODULE FOR TEXT CLASSIFICATION.pptx
BERT MODULE FOR TEXT CLASSIFICATION.pptxBERT MODULE FOR TEXT CLASSIFICATION.pptx
BERT MODULE FOR TEXT CLASSIFICATION.pptxManvanthBC
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Boyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceBoyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceTransweb Global Inc
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa ReformerSan Kim
 

What's hot (20)

BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Turing machines
Turing machinesTuring machines
Turing machines
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
BERT
BERTBERT
BERT
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Undecidable Problem
Undecidable ProblemUndecidable Problem
Undecidable Problem
 
Boyer moore algorithm
Boyer moore algorithmBoyer moore algorithm
Boyer moore algorithm
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tune
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
BERT MODULE FOR TEXT CLASSIFICATION.pptx
BERT MODULE FOR TEXT CLASSIFICATION.pptxBERT MODULE FOR TEXT CLASSIFICATION.pptx
BERT MODULE FOR TEXT CLASSIFICATION.pptx
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Boyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceBoyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer Science
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
 

Similar to Kaggle Tweet Sentiment Extraction: 1st place solution

Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Jeongkyu Shin
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalSuhas Pillai
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemAnoop Deoras
 
Deep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsDeep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsJeongkyu Shin
 
TDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageTDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageLuciano Sabença
 
Using Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applicationsUsing Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applicationsTejas Patil
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Maninda Edirisooriya
 
DNA Splice site prediction
DNA Splice site predictionDNA Splice site prediction
DNA Splice site predictionsageteam
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition子毅 楊
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...Jisang Yoon
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...AI Frontiers
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdfssuser849b73
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 

Similar to Kaggle Tweet Sentiment Extraction: 1st place solution (20)

Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
 
Latest trends in NLP - Exploring BERT
Latest trends in NLP -  Exploring BERTLatest trends in NLP -  Exploring BERT
Latest trends in NLP - Exploring BERT
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Deep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsDeep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractions
 
TDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageTDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-Language
 
Introduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimizationIntroduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimization
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Using Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applicationsUsing Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applications
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
 
Chord DHT
Chord DHTChord DHT
Chord DHT
 
DNA Splice site prediction
DNA Splice site predictionDNA Splice site prediction
DNA Splice site prediction
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdf
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 

Recently uploaded

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Recently uploaded (20)

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

Kaggle Tweet Sentiment Extraction: 1st place solution

  • 1. Tweet Sentiment Extraction 1st place solution by Artsem Zhyvalkouski from Dark of the Moon
  • 2. Agenda 1. Background 2. Competition overview 3. Solution summary 4. The “Magic” 5. Other solutions 6. Conclusion
  • 4. Self-introduction ● Artsem Zhyvalkouski ● From: Minsk, Belarus 🇧🇾 ● CS student @ Tokyo City University 🇯🇵 ● Did R&D in CV / NLP at a startup ● Love reading NLP papers ● Enjoy learning languages ○ Fluent in 🇧🇾🇷🇺🇬🇧🇯🇵 ○ Interested in 🇫🇷🇰🇷
  • 6. Motivation Why did I start working on this competition? ● To familiarize myself with current SOTA in NLP: Transformer-based models ● To learn PyTorch🔥 and HuggingFace🤗 libraries ● The dataset is rather small so no need in powerful machines: Colab Pro was enough ● The task seemed fun / unusual and I had time for it
  • 7. Teammates ● Théo ○ Student in applied Mathematics & Machine Learning ○ 10th in Google Quest Q&A ● Anton ○ Works in IT ○ 10th in Google Quest Q&A ● Hikkiiii ○ NLP / QA background ○ 10th in TensorFlow 2.0 Question Answering
  • 9. Task & data ● Task: for a given tweet, predict what word or phrase best supports the labeled sentiment ● Application example: some businesses may want to know exactly why people think something about their product ● Data ○ Train: 27k tweets ○ Public / private test: 4k / 8k tweets text (given) sentiment (given) selected_text (target) I really really like the song Love Story by Taylor Swift positive like i need to get my computer fixed neutral i need to get my computer fixed Sooo SAD I will miss you here in San Diego!!! negative Sooo SAD
  • 10. Evaluation ● Metric: word-level Jaccard score
  • 11. Problem with labels ● Some labels are noisy text (given) sentiment (given) selected_text (target) hey mia! totally adore your music. when will your cd be out? positive y adore I know It was worth a shot, though! positive as wort the exact one i was thinking of the bestttt. positive e bestttt
  • 13. Transformers ● Transformers like BERT, RoBERTa, BART etc. have become default in SOTA NLP ● Pretrained on a huge amount of texts ● Somewhat heavy and long to train ● Can be used in either NER or QA setup for this task ● QA worked best
  • 14. Leveraging the QA setup ● Question: sentiment ● Answer: support phrase
  • 15. My models: summary ● RoBERTa-base-squad2, RoBERTa-large-squad2, DistilRoBERTa-base, XLNet-base-cased ● Pretraining on SQuAD 2.0 ○ Task pretraining works ● Avg / Max of layers w/o embedding layer ● Multi Sample Dropout ● AdamW with linear warmup schedule ● Custom loss: Jaccard-based Soft Labels ● Best single model: RoBERTa-base-squad2, 5 fold stratified CV: 0.715
  • 16. My models: architecture Layer 1 Layer n ... Embeddings Transformer Head MSD + Dense Sentiment Sentence Tokens<s> </s> </s> </s> Probabilities of the token being the start of the selected text Probabilities of the token being the end of the selected text MaxPoolAvgPool
  • 17. Multi Sample Dropout ● Multi-Sample Dropout for Accelerated Training and Better Generalization (https://arxiv.org/pdf/1905.09788.pdf) *Image from Jigsaw Unintended Bias in Toxicity Classification 8th place solution
  • 18. Optimizer and schedule ● AdamW optimizer: Decoupled Weight Decay Regularization (https://arxiv.org/pdf/1711.05101.pdf) ● Linear warmup schedule
  • 19. My models: custom loss ● Jaccard-based Soft Labels ● Modified version
  • 20. Théo’s models ● Transformers ○ BERT-base-uncased ○ BERT-large-uncased-wwm ○ DistilBERT ○ ALBERT-large-v2 ● Architecture ○ MSD on the concatenation of the last 8 hidden states ● Training ○ Smoothed categorical cross-entropy ○ Discriminative learning rate (https://arxiv.org/pdf/1801.06146.pdf) ○ Sequence bucketing to speed up the training
  • 21. Anton’s models ● Transformers ○ RoBERTa-base ○ BERTweet: A pre-trained language model for English Tweets (https://arxiv.org/pdf/2005.10200.pdf) ● Architecture ○ Same as Théo’s ● Training ○ Smoothed categorical cross-entropy ○ Discriminative learning rate ○ Custom merges.txt file for RoBERTa
  • 22. Hikkiiii’s models ● Transformers ○ RoBERTa-base ○ RoBERTa-large ● Architecture ○ Append sentiment token to the end of the text ○ CNN + Linear layer on the concatenation of the last 3 hidden states ● Training ○ Standard cross-entropy loss
  • 23. ● Are we done? 🤗 ○ Transformers are token-level, hence we can’t capture the noisy pattern ○ No obvious way to make transformers character-level ○ Character-level RNNs are not even nearly as powerful as transformers ○ We can’t simply blend models with different tokenizations Problems
  • 24. ● Solution: stacking to the rescue! ○ Convert token probabilities from transformers to char-level by assigning each char the probability of its token ○ Feed OOF char-level probabilities from several transformers into a char-level NN using stacking Stacking
  • 25. Stacking token level start & end proba Transformer target : Start / end tokens <sos> Sentiment <sep> Tokens <sep> char level start & end proba (token based) Char NN target : Start / end chars using offsets char level start / end proba Characters … n models Concatenate token level start & end proba Transformer target : Start / end tokens <sos> Sentiment <sep> Tokens <sep> char level start & end proba (token based) Start & end featuresSentiment using offsets
  • 26. Char-level NN: RNN Start & end probas Bidirectional LSTM Embedding Characters Embedding Sentiment Bidirectional LSTM x2 with skip connection MSD + Linear Softmax Start & end probas
  • 27. Char-level NN: CNN Start & end probas Conv1D + BatchNorm Embedding Characters Embedding Sentiment Conv1D + BatchNorm x4 MSD + Linear Softmax Start & end probas
  • 28. Char-level NN: WaveNet Start & end probas Conv1D + BatchNorm Embedding Characters Embedding Sentiment WaveBlock + BatchNorm x3 MSD + Linear Softmax Start & end probas
  • 29. Char-level NNs: details ● Adam optimizer ● Linear learning rate decay without warmup ● Smoothed Cross Entropy Loss ● Stochastic Weighted Average: Averaging Weights Leads to Wider Optima and Better Generalization (https://arxiv.org/pdf/1803.05407.pdf) ● Select the whole text if predicted start_idx > end_idx
  • 30. An obvious step ● So now we have a lot of different 1st level models and different 2nd level architectures ● If you participated in a tabular data competition, an obvious next step is...
  • 31. 1st submission Public LB : 0.734 (#3) Private LB : 0.736 (#1) RoBERTa-large CV 0.715 XLNet-base-cased CV 0.707 RoBERTa-base CV 0.715 DistilRoBERTa-base CV 0.713 DistilBERT-base-uncased CV 0.705 BERT-base-uncased CV 0.710 BERT-large-uncased-wwm CV 0.710 ALBERT-large-v2 CV 0.711 BERTweet CV 0.711 RoBERTa-base CV 0.715 RoBERTa-base CV 0.712 RoBERTa-large CV 0.714 CNN CV 0.7342 CNN CV 0.7335 WaveNet CV 0.7347 WaveNet CV 0.7330 Average CV 0.7363 1st level Transformers 2nd level Char-level NNs
  • 32. 2nd submission Public LB : 0.734 (#3) Private LB : 0.735 (#1) RoBERTa-large CV 0.715 XLNet-base-cased CV 0.707 RoBERTa-base CV 0.715 DistilRoBERTa-base CV 0.713 DistilBERT-base-uncased CV 0.705 BERT-base-uncased CV 0.710 BERT-large-uncased-wwm CV 0.710 ALBERT-large-v2 CV 0.711 BERTweet CV 0.711 RoBERTa-base CV 0.715 RoBERTa-base CV 0.712 RoBERTa-large CV 0.714 CNN CV 0.7342 WaveNet CV 0.7337 RNN CV 0.7343 WaveNet CV 0.7335 Average CV 0.7365 1st level Transformers 2nd level Char-level NNs
  • 33. Pseudo-labeling ● We used one of our CV 0.7354 blends to pseudo-label the public test data ● Approach from the Google Quest Q&A 1st place solution: “leakless” pseudo-labels ● Confidence score: (start_probas.max() + end_probas.max()) / 2 ● Threshold=0.35 to cut off low-confidence samples ● This gave a pretty robust boost of 0.001-0.002 for many models *Image from https://datawhatnow.com/pseudo-labeling-semi-supervised- learning/
  • 35. Things that didn’t work ● NER approach ● BCE + SoftIOU loss like in image segmentation ● Sample weighting ● T5, BART, GPT-2, ELECTRA, XLM-RoBERTa ● Char-level transformer ● Pre-trained embeddings: FastText, Flair ● Statistical features: num_words, num_spaces etc. ● XGBoost as 2nd level ● 3rd level models ● Pre/post-processing
  • 37. Finding the “Magic” ● The noise in the label comes from consecutive spaces Selected text : onna Original text : is _ back _ home _ now _ _ _ _ _ _ gonna _ miss _ every _ one ● We assumed they were removed during annotation Text with spaces cleaned : is _ back _ home _ now _ gonna _ miss _ every _ one Annotation, on the cleaned text : is _ back _ home _ now _ gonna _ miss _ every _ one ➔ Stores the start and end indices (?) ● Which results in problems when retrieving the labels on the original text Retrieved label, on the original text : is _ back _ home _ now _ _ _ _ _ _ gonna _ miss _ every _ one ➔ 5 removed spaces offsets the label by 5 characters
  • 38. Using the “Magic” ● Then, we can post-process our predictions to retrieve the noise Selected text: onna Original text : is _ back _ home _ now _ _ _ _ _ _ gonna _ miss _ every _ one ● We use a transformer to get the start/end token We use it on the cleaned text : Assuming the model perfectly predicts “miss”, perfect start and end predictions would look this way : is _ back _ home _ now _ gonna _ miss _ every _ one 0 … 0 1 1 1 1 0 ... 0 0 … 0 1 1 1 1 0 ... 0 Because transformer work at token level, the whole word is gonna be selected ● Finally, we can align those predictions with the original text Prediction on the noisy data : is _ back _ home _ now _ _ _ _ _ _ gonna _ miss _ every _ one 0 … 0 1 1 1 1 0 ... 0 0 … 0 1 1 1 1 0 ... 0 ● Which matches the noisy label !
  • 39. Why we didn’t use the “Magic” ● We found the pattern pretty late and didn’t have enough time to leverage it directly ● Eventually, our 2nd level models learnt the pattern even better then simple pre/post-processing
  • 41. 2nd place solution ● Pre/post-process following the “Magic” ● Sample tweets equally according to their sentiment to mitigate the imbalance within batches ● Reranking model 1. Store top n candidates from the base model and assign a “step_1_score” accordingly 2. Train a RoBERTa to predict Jaccard for the candidates: “step_2_score” 3. Choose the best one by: step_2_score + step_1_score * 0.5
  • 42. 3rd place solution: 1st model Training Inference ● Results in k*k combinations ● Argmax over multiplications
  • 45. Conclusion ● Transformers are perfect for QA ● PyTorch🔥 & HuggingFace🤗 are awesome ● Transformers can be extended to char-level ● Diversity rules ● Annotation process is crucial ● Teaming up is rewarding ● Kaggle is good for learning but impractical sometimes ● Kaggle community is amazing
  • 46. My links ● Follow me on ○ LinkedIn: https://www.linkedin.com/in/zhyvalkouski ○ Kaggle: https://www.kaggle.com/aruchomu ○ Twitter: https://twitter.com/artem_aruchomu ○ GitHub: https://github.com/heartkilla ● I’m open to new opportunities!
  • 47. Thanks for listening, stay safe and happy kaggling!