SlideShare a Scribd company logo
1 of 31
Hierarchical
Transformers
for User
Semantic Similarity
Marco Di Giovanni
MARCO BRAMBILLA
marco.brambilla@polimi.it
@marcobrambi
2
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Agenda
1. Motivation
2. Model: hierarchical configuration of BERT text transformers
3. Evaluation
4. Conclusions
3
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Context and Motivation
Analysis of users’ behaviour and profiling of social media users
► customization of the overall personal experience
► recommendations
► detection of duplicates
► social threats
Sources:
► textual-content shared by users
► the social graphs involving users
– Friendship /followship
– Mention, likes, …
► shared resources (links, media, content)
4
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
► RQ1: best model to compute semantic user similarity?
► RQ2: use of Transformer-based model?
► RQ3: embeddings reflect our idea of similarity?
Can we use them for further tasks?
► Aim at a fully reproducible approach without influencing the results with
biased selections of small sets of users
Objectives
5
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
► Large dataset of Twitter users, with automatic labelling approach
► Training of a Hierarchical Language Model to compute accurate user
similarity
► Optimization of hyper-parameters to obtain the best configuration of the
model;
► Test accuracy of embeddings when applied to othertasks
Contributions
POLITECNICO DI MILANO
Data
7
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
► Twitter
► Assumption: Retweet somehow represent agreement or interest or perceived
importance
► Data from Archive Team Twitter [*]
► Only the textual content shared by users. No demographics, no screen names
► We select English tweets, filtered accordingly to the “lang” field posted in November
and December 2020. They amount to about 27GB of compressed data.
[*] https://archive.org/details/twitterstream
Dataset and preprocessing
8
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
► We remove texts shorter than 20 characters (29M texts tweeted by 10M
unique users)
► We set the maximum number of tweets to 60 and minimum 5 tweets (1M
users)
► Clean the connections between users, removing from pairs of ids of users
retweeting each other, and the auto-retweets (when a user retweets one of its
own tweets), duplicate pairs, links to excluded users, and users with more
than 50 connections (1.9M
connections between 950k unique users.)
► Benchmark consists of comparing a user with 30 other candidate users, 5 of
them considered similar to it since they share at least one retweet
connection, and 25 of them considered not similar
Preprocessing
POLITECNICO DI MILANO
Models
10
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Language Model
11
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Down Memory Lane
12
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Encoders and Decoders
13
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Hierarchical Transformer Model
14
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Tweet Embedding
► Obtain embedding of tweets using one of the following four
Transformer-based models that share the same architecture but are
pretrained with different approaches and datasets:
– RoBERTa2,
– BERTweet3,
– Sentence BERT 4,
– Twitter4SSE 5.
► We test them by freezing and unfreezing their weights during the
training step.
► BERTweet and Twitter4SSE models, being pretrained on texts from
Twitter, are able to successfully deal with the intrinsic noise of data
from social media, thus no further special cleaning is required (such as
dealing with hashtags, abbreviations, and typos).
https://huggingface.co/roberta-base
https://huggingface.co/vinai/bertweet-base
https://huggingface.co/sentence-transformers/stsb-roberta-base-v2
https://huggingface.co/digio/Twitter4SSE
15
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
User Embedding
We test three techniques to process Twitter embeddings to generate accurate user
embeddings:
► MEAN: the weights of the Stage-1 model are frozen (no training is performed
when we select this variant). However we test this approach also unfreezing
the weights of the Stage-1 model, thus we limit the number of tweets per
user, also for a fair comparison with other variants;
► Recurrence over BERT (RoBERT): the embeddings of tweets are used
as input of a Recurrent Model. We select a 2-layer LSTM model with hidden
size of 768.6 We use the last output as the user embedding. We test this
approach both freezing and unfreezing the weights of the Stage-1 model;
► Transformer over BERT (ToBERT): the embeddings of tweets are used
as input of a Transformer Model with 2 encoding layers (EL) and 2 decoding
layers (DL), 16 heads, and 0.1 dropout. We also experimented with a model
with 1 encoding and 1 decoding layer and without. We test this approach
both freezing and unfreezing the weights of the Stage-1 model.
POLITECNICO DI MILANO
Evaluation
17
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
► Evaluation set on 5K users
► benchmark consists of comparing a user with 30 other candidate users
► 5 of them considered similar and 25 of them considered not similar
Evaluation
18
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Optimization
► We select Multiple Negative Loss (MNLoss) as our loss function
► We assume that a user did not retweet posts from any of the other n − 1
users. This assumption is valid for small batches due to the big total number
of users and the approach selected to collect data.
► We use AdamW optimizer, learning rate 2×10−5, linear scheduler with 10%
warmup steps on a single GPU (NVIDIA Tesla P100).
19
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Model Evaluation Results
20
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Results discussion
► Naive approaches underperform Hierarchical approaches confirming an ad-
vantage to encode single tweets independently.
► The hierarchical approach with a Stage-1 Twitter4SSE model and a Stage-
2 Transformer model outperforms the other alternatives.
21
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Evaluation on the task
► 20 tweets per user, thus 124k pairs of users in the training set.
We evaluate the models by comparing three metrics
► Mean Average Precision (MAP) between the binary labels (connected
or not connected by retweets) and the similarities.
► Mean Reciprocal Rank (MRR) @10 as a ranking quality measure defined
as the reciprocal of the rank of the first relevant element
► normalized Discounted Cumulative Gain (nDCG)
22
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Details on evaluation
► Stage-1 Model Comparison. Firstly we investigate the best initialization model. For each experiment,
we keep the same hyper-parameters and the same Stage-2 model is trained on top of it: ToBERT with
2 encoding layers (EL) and 2 decoding layers (DL), 0.1 dropout, and MEAN pooling. We test
RoBERTa, BERTweet, S-RoBERTa, and Twitter4SSE. Table 1 shows that Twitter4SSE is the best
initialization. As expected, this model, trained to generate accurate tweet embeddings, outperforms
both the model trained on Tweets using only MLM (BERTweet) and the model trained to generate
accurate sentence embed- dings on formal data (S-RoBERTa).
► MEAN Stage-2 Models Comparison. We test the MEAN Stage-2 approach on the four Stage-1
models with and without freezing their weights. Table 2 shows that unfreezing the weights leads to
better results, even if the batch size has to be reduced to 10 and the number of tokens per tweet is
reduced to 32 to fit in memory. We confirm that the best Stage-1 model is Twitter4SSE for these
configurations too.
► ToBERT Hyperparameter Comparison. We investigate the best hyperpa- rameter configuration of the
Stage-2 Transformer model (ToBERT). We inves- tigate with 1 and 2 encoding and decoding layers
(EL-DL), with and without dropout. We fix Twitter4SSE as initial model. Table 3 shows that 2 EL and 2
DL without dropout is the best overall configuration.
► Full Comparison. We compare the performance of the models with a Random baseline and with the
two best approaches from related work.
23
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
24
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
► As expected, a greater number of tweets per users results in a better
model, when the number of pairs of training users is fixed.
► However, a greater n implies a lower number of users since we have a
limited collection of tweets.
► We investigate what is the best trade-off between the number of users and
the number of tweets per user.
► The performance of models trained changing the number of tweets per user,
including every user available varies.
► A peak around 20 tweets is the best trade-off.
► !!! this number is highly dependent on our collection since the number of
downloaded tweets is high but finite (2 complete months).
25
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
26
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Other tasks
► Community analysis
► Polarization detection
► Outlier detection
► Fixed model: a hierarchical model with a frozen Stage-1 Twitter4SSE model and a Stage-2 ToBERT model with 2 layers,
0.1 dropout rate, MEAN pooling, trained using 20 tweets for each user for one epoch.
27
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Other tasks
28
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
29
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Outliers
► Local Outlier Factor (LOF) algorithm on three lists of users and we manually
inspected the results.
► On embeddings of technology list
– Outlier on videogames
► On embeddings of chefs list
– Outlier on cook talking about other stuff
► On embeddings of charity-ngo list
– Outlier account of Charlize Theron
30
M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
Concluding
► Large unbiased dataset ready for user similarity analysis
► Selection and optimization of herarchical LLM model
► Validation of models on similarity
► Application to related problems
► Future and ongoing work: Impact of time and topic drift
Hierarchical Transformers for User Semantic Similarity
THANKS!
Marco Di Giovanni
MARCO BRAMBILLA
http://datascience.deib.polimi.it/
https://marco-brambilla.com/
marco.brambilla@polimi.it
@marcobrambi

More Related Content

Similar to Hierarchical Transformers for User Semantic Similarity - ICWE 2023

IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONIMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
adeij1
 
14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf
MehwishKanwal14
 
2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-Box2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-Box
Marc Smith
 
I want to answer, who has a
I want to answer, who has aI want to answer, who has a
I want to answer, who has a
chenbojyh
 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogs
moresmile
 
Multilingual Tweet Intimacy Analysis using Bidirectional LSTM.pptx
Multilingual Tweet Intimacy Analysis using Bidirectional LSTM.pptxMultilingual Tweet Intimacy Analysis using Bidirectional LSTM.pptx
Multilingual Tweet Intimacy Analysis using Bidirectional LSTM.pptx
SAMIMAKTAR9
 

Similar to Hierarchical Transformers for User Semantic Similarity - ICWE 2023 (20)

Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Applying Deep Learning to Enhance Momentum Trading Strategies in StocksApplying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
 
Fine grained irony classification through transfer learning approach
Fine grained irony classification through transfer learning approachFine grained irony classification through transfer learning approach
Fine grained irony classification through transfer learning approach
 
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONIMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
 
14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf
 
Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite NetworksLatent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networks
 
Fundamentals of Deep Recommender Systems
 Fundamentals of Deep Recommender Systems Fundamentals of Deep Recommender Systems
Fundamentals of Deep Recommender Systems
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
MITA visualization paper.pdf
MITA visualization paper.pdfMITA visualization paper.pdf
MITA visualization paper.pdf
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-Box2011 IEEE Social Computing Nodexl: Group-In-A-Box
2011 IEEE Social Computing Nodexl: Group-In-A-Box
 
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsOn Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
 
Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and Simulation
 
I want to answer, who has a
I want to answer, who has aI want to answer, who has a
I want to answer, who has a
 
Next directions in Mahout's recommenders
Next directions in Mahout's recommendersNext directions in Mahout's recommenders
Next directions in Mahout's recommenders
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detection
 
9517cnc03
9517cnc039517cnc03
9517cnc03
 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogs
 
Multilingual Tweet Intimacy Analysis using Bidirectional LSTM.pptx
Multilingual Tweet Intimacy Analysis using Bidirectional LSTM.pptxMultilingual Tweet Intimacy Analysis using Bidirectional LSTM.pptx
Multilingual Tweet Intimacy Analysis using Bidirectional LSTM.pptx
 
IRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using Cobweb
 
New books jun 2014
New books jun 2014New books jun 2014
New books jun 2014
 

More from Marco Brambilla

Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Marco Brambilla
 
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Marco Brambilla
 
Community analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksCommunity analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networks
Marco Brambilla
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
Marco Brambilla
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Marco Brambilla
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
Marco Brambilla
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
Marco Brambilla
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
Marco Brambilla
 

More from Marco Brambilla (20)

M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
 
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
 
Trigger.eu: Cocteau game for policy making - introduction and demo
Trigger.eu: Cocteau game for policy making - introduction and demoTrigger.eu: Cocteau game for policy making - introduction and demo
Trigger.eu: Cocteau game for policy making - introduction and demo
 
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
 
Analyzing rich club behavior in open source projects
Analyzing rich club behavior in open source projectsAnalyzing rich club behavior in open source projects
Analyzing rich club behavior in open source projects
 
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
 
Community analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksCommunity analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networks
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
 

Recently uploaded

Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
ZurliaSoop
 
Capstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfCapstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdf
eliklein8
 
Sociocosmos empowers you to go trendy on social media with a few clicks..pdf
Sociocosmos empowers you to go trendy on social media with a few clicks..pdfSociocosmos empowers you to go trendy on social media with a few clicks..pdf
Sociocosmos empowers you to go trendy on social media with a few clicks..pdf
SocioCosmos
 
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
Cara Menggugurkan Kandungan 087776558899
 
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
Cara Menggugurkan Kandungan 087776558899
 
Capstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdfCapstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdf
eliklein8
 
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
Heena Escort Service
 
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
ZurliaSoop
 

Recently uploaded (20)

Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
Jual Obat Aborsi Kudus ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cy...
 
Capstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfCapstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdf
 
Capstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolutionCapstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolution
 
BVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIR
BVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIRBVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIR
BVG BEACH CLEANING PROJECTS- ORISSA , ANDAMAN, PORT BLAIR
 
Sociocosmos empowers you to go trendy on social media with a few clicks..pdf
Sociocosmos empowers you to go trendy on social media with a few clicks..pdfSociocosmos empowers you to go trendy on social media with a few clicks..pdf
Sociocosmos empowers you to go trendy on social media with a few clicks..pdf
 
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdfSEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
 
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
💊💊 OBAT PENGGUGUR KANDUNGAN SEMARANG 087776-558899 ABORSI KLINIK SEMARANG
 
Sri Ganganagar Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Sri Ganganagar Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsSri Ganganagar Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Sri Ganganagar Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Enhancing Consumer Trust Through Strategic Content Marketing
Enhancing Consumer Trust Through Strategic Content MarketingEnhancing Consumer Trust Through Strategic Content Marketing
Enhancing Consumer Trust Through Strategic Content Marketing
 
Coorg Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Coorg Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsCoorg Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Coorg Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Kayamkulam Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kayamkulam Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsKayamkulam Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kayamkulam Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Jhunjhunu Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Jhunjhunu Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsJhunjhunu Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Jhunjhunu Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
JUAL PILL CYTOTEC PALOPO SULAWESI 087776558899 OBAT PENGGUGUR KANDUNGAN PALOP...
 
Content strategy : Content empire and cash in
Content strategy : Content empire and cash inContent strategy : Content empire and cash in
Content strategy : Content empire and cash in
 
Capstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdfCapstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdf
 
The Butterfly Effect
The Butterfly EffectThe Butterfly Effect
The Butterfly Effect
 
Madikeri Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Madikeri Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsMadikeri Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Madikeri Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
Meet Incall & Out Escort Service in D -9634446618 | #escort Service in GTB Na...
 
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
 
Marketing Plan - Social Media. The Sparks Foundation
Marketing Plan -  Social Media. The Sparks FoundationMarketing Plan -  Social Media. The Sparks Foundation
Marketing Plan - Social Media. The Sparks Foundation
 

Hierarchical Transformers for User Semantic Similarity - ICWE 2023

  • 1. Hierarchical Transformers for User Semantic Similarity Marco Di Giovanni MARCO BRAMBILLA marco.brambilla@polimi.it @marcobrambi
  • 2. 2 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Agenda 1. Motivation 2. Model: hierarchical configuration of BERT text transformers 3. Evaluation 4. Conclusions
  • 3. 3 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Context and Motivation Analysis of users’ behaviour and profiling of social media users ► customization of the overall personal experience ► recommendations ► detection of duplicates ► social threats Sources: ► textual-content shared by users ► the social graphs involving users – Friendship /followship – Mention, likes, … ► shared resources (links, media, content)
  • 4. 4 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 ► RQ1: best model to compute semantic user similarity? ► RQ2: use of Transformer-based model? ► RQ3: embeddings reflect our idea of similarity? Can we use them for further tasks? ► Aim at a fully reproducible approach without influencing the results with biased selections of small sets of users Objectives
  • 5. 5 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 ► Large dataset of Twitter users, with automatic labelling approach ► Training of a Hierarchical Language Model to compute accurate user similarity ► Optimization of hyper-parameters to obtain the best configuration of the model; ► Test accuracy of embeddings when applied to othertasks Contributions
  • 7. 7 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 ► Twitter ► Assumption: Retweet somehow represent agreement or interest or perceived importance ► Data from Archive Team Twitter [*] ► Only the textual content shared by users. No demographics, no screen names ► We select English tweets, filtered accordingly to the “lang” field posted in November and December 2020. They amount to about 27GB of compressed data. [*] https://archive.org/details/twitterstream Dataset and preprocessing
  • 8. 8 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 ► We remove texts shorter than 20 characters (29M texts tweeted by 10M unique users) ► We set the maximum number of tweets to 60 and minimum 5 tweets (1M users) ► Clean the connections between users, removing from pairs of ids of users retweeting each other, and the auto-retweets (when a user retweets one of its own tweets), duplicate pairs, links to excluded users, and users with more than 50 connections (1.9M connections between 950k unique users.) ► Benchmark consists of comparing a user with 30 other candidate users, 5 of them considered similar to it since they share at least one retweet connection, and 25 of them considered not similar Preprocessing
  • 10. 10 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Language Model
  • 11. 11 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Down Memory Lane
  • 12. 12 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Encoders and Decoders
  • 13. 13 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Hierarchical Transformer Model
  • 14. 14 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Tweet Embedding ► Obtain embedding of tweets using one of the following four Transformer-based models that share the same architecture but are pretrained with different approaches and datasets: – RoBERTa2, – BERTweet3, – Sentence BERT 4, – Twitter4SSE 5. ► We test them by freezing and unfreezing their weights during the training step. ► BERTweet and Twitter4SSE models, being pretrained on texts from Twitter, are able to successfully deal with the intrinsic noise of data from social media, thus no further special cleaning is required (such as dealing with hashtags, abbreviations, and typos). https://huggingface.co/roberta-base https://huggingface.co/vinai/bertweet-base https://huggingface.co/sentence-transformers/stsb-roberta-base-v2 https://huggingface.co/digio/Twitter4SSE
  • 15. 15 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 User Embedding We test three techniques to process Twitter embeddings to generate accurate user embeddings: ► MEAN: the weights of the Stage-1 model are frozen (no training is performed when we select this variant). However we test this approach also unfreezing the weights of the Stage-1 model, thus we limit the number of tweets per user, also for a fair comparison with other variants; ► Recurrence over BERT (RoBERT): the embeddings of tweets are used as input of a Recurrent Model. We select a 2-layer LSTM model with hidden size of 768.6 We use the last output as the user embedding. We test this approach both freezing and unfreezing the weights of the Stage-1 model; ► Transformer over BERT (ToBERT): the embeddings of tweets are used as input of a Transformer Model with 2 encoding layers (EL) and 2 decoding layers (DL), 16 heads, and 0.1 dropout. We also experimented with a model with 1 encoding and 1 decoding layer and without. We test this approach both freezing and unfreezing the weights of the Stage-1 model.
  • 17. 17 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 ► Evaluation set on 5K users ► benchmark consists of comparing a user with 30 other candidate users ► 5 of them considered similar and 25 of them considered not similar Evaluation
  • 18. 18 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Optimization ► We select Multiple Negative Loss (MNLoss) as our loss function ► We assume that a user did not retweet posts from any of the other n − 1 users. This assumption is valid for small batches due to the big total number of users and the approach selected to collect data. ► We use AdamW optimizer, learning rate 2×10−5, linear scheduler with 10% warmup steps on a single GPU (NVIDIA Tesla P100).
  • 19. 19 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Model Evaluation Results
  • 20. 20 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Results discussion ► Naive approaches underperform Hierarchical approaches confirming an ad- vantage to encode single tweets independently. ► The hierarchical approach with a Stage-1 Twitter4SSE model and a Stage- 2 Transformer model outperforms the other alternatives.
  • 21. 21 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Evaluation on the task ► 20 tweets per user, thus 124k pairs of users in the training set. We evaluate the models by comparing three metrics ► Mean Average Precision (MAP) between the binary labels (connected or not connected by retweets) and the similarities. ► Mean Reciprocal Rank (MRR) @10 as a ranking quality measure defined as the reciprocal of the rank of the first relevant element ► normalized Discounted Cumulative Gain (nDCG)
  • 22. 22 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Details on evaluation ► Stage-1 Model Comparison. Firstly we investigate the best initialization model. For each experiment, we keep the same hyper-parameters and the same Stage-2 model is trained on top of it: ToBERT with 2 encoding layers (EL) and 2 decoding layers (DL), 0.1 dropout, and MEAN pooling. We test RoBERTa, BERTweet, S-RoBERTa, and Twitter4SSE. Table 1 shows that Twitter4SSE is the best initialization. As expected, this model, trained to generate accurate tweet embeddings, outperforms both the model trained on Tweets using only MLM (BERTweet) and the model trained to generate accurate sentence embed- dings on formal data (S-RoBERTa). ► MEAN Stage-2 Models Comparison. We test the MEAN Stage-2 approach on the four Stage-1 models with and without freezing their weights. Table 2 shows that unfreezing the weights leads to better results, even if the batch size has to be reduced to 10 and the number of tokens per tweet is reduced to 32 to fit in memory. We confirm that the best Stage-1 model is Twitter4SSE for these configurations too. ► ToBERT Hyperparameter Comparison. We investigate the best hyperpa- rameter configuration of the Stage-2 Transformer model (ToBERT). We inves- tigate with 1 and 2 encoding and decoding layers (EL-DL), with and without dropout. We fix Twitter4SSE as initial model. Table 3 shows that 2 EL and 2 DL without dropout is the best overall configuration. ► Full Comparison. We compare the performance of the models with a Random baseline and with the two best approaches from related work.
  • 23. 23 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
  • 24. 24 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 ► As expected, a greater number of tweets per users results in a better model, when the number of pairs of training users is fixed. ► However, a greater n implies a lower number of users since we have a limited collection of tweets. ► We investigate what is the best trade-off between the number of users and the number of tweets per user. ► The performance of models trained changing the number of tweets per user, including every user available varies. ► A peak around 20 tweets is the best trade-off. ► !!! this number is highly dependent on our collection since the number of downloaded tweets is high but finite (2 complete months).
  • 25. 25 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
  • 26. 26 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Other tasks ► Community analysis ► Polarization detection ► Outlier detection ► Fixed model: a hierarchical model with a frozen Stage-1 Twitter4SSE model and a Stage-2 ToBERT model with 2 layers, 0.1 dropout rate, MEAN pooling, trained using 20 tweets for each user for one epoch.
  • 27. 27 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Other tasks
  • 28. 28 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023
  • 29. 29 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Outliers ► Local Outlier Factor (LOF) algorithm on three lists of users and we manually inspected the results. ► On embeddings of technology list – Outlier on videogames ► On embeddings of chefs list – Outlier on cook talking about other stuff ► On embeddings of charity-ngo list – Outlier account of Charlize Theron
  • 30. 30 M. Di Giovanni, M. Brambilla. Hierarchical Transformers for User Semantic Similarity. ICWE 2023 Concluding ► Large unbiased dataset ready for user similarity analysis ► Selection and optimization of herarchical LLM model ► Validation of models on similarity ► Application to related problems ► Future and ongoing work: Impact of time and topic drift
  • 31. Hierarchical Transformers for User Semantic Similarity THANKS! Marco Di Giovanni MARCO BRAMBILLA http://datascience.deib.polimi.it/ https://marco-brambilla.com/ marco.brambilla@polimi.it @marcobrambi