In this lecture, we will look at why emoji are important and the reasons behind their increase in popularity, how emoji meanings are generated/assigned, how to calculate emoji similarity, and how to disambiguate emoji meanings.
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Analyzing Emoji in Text
1. Analyzing Emoji in Text
Research Scientist, Holler.io, San Mateo, CA.
sanjaya@holler.io | http://sanjw.org/ | @sanjrockz
SANJAYA WIJERATNE
BAX-423 Big Data Analytics
GUEST LECTURE AT THE GRADUATE SCHOOL OF MANAGEMENT OF THE UNIVERSITY OF CALIFORNIA, DAVIS, 24TH
/25TH
APRIL, 2020.
2. Meet Your Instructor
► Research Scientist at Holler.io
► Work on NLP
► Academic Background
► Education - Ph.D. in Computer Science and Engineering
► Research Interest - Emoji/Test Processing, NLU
► My Journey So Far
► I’m from Sri Lanka -> B.Sc. in IT (University of Moratuwa,
Sri Lanka) -> ~2 years as a Software Engineer, 7.5 years
as a GRA/TA at Wright State University
4/19/2020BAX-423 Big Data Analytics, UC Davis
2
3. Emoji Chain Gang Usage Non-Gang
Usage
32.25% 1.14%
53% 1.71%
How I Started Working with Emoji
Anthropology 189:001, UC Berkeley
3
Image Source – https://arxiv.org/pdf/1610.09516.pdf
4/19/2020
5. Emoji = Picture Character
5
► Introduced by Shigetaka Kurita in 1999
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Unicode staterted supporting emoji
character set in 2010
► Emoji are not emoticons. Eg. :-), :-(
6. Why Emoji Usage Increased?
4/19/2020BAX-423 Big Data Analytics, UC Davis
6
8. A Few Open Emoji Research
Problems related to Text Processing
► Challenges in interpreting the meaning of an
emoji in a message context
► Emoji similarity
► Emoji sense disambiguation
► Emoji prediction
► Emoji-based retrieval and search
4/19/2020BAX-423 Big Data Analytics, UC Davis
8
9. A Few Open Emoji Research
Problems related to Text Processing
► Challenges in interpreting the meaning of an
emoji in a message context
► Emoji similarity
► Emoji sense disambiguation
► Emoji prediction
► Emoji-based retrieval and search
4/19/2020BAX-423 Big Data Analytics, UC Davis
9
11. Emoji Semantics
► Emoji are inherently designed with no rigid
semantics
► Emoji does not have a grammar, thus, emoji cannot
be used as a language on its own
► How emoji meanings are assigned?
► Initially, by the emoji creators
► Later, by the users
11
4/19/2020BAX-423 Big Data Analytics, UC Davis
12. How Emoji get their meanings?
12
► Emoji creators submit possible emoji meanings in
their proposals
► Once accepted, these will be available in
Unicode Common Locale Data Repository
(CLDR) at
https://www.unicode.org/cldr/charts/latest/anno
tations/other.html
4/19/2020BAX-423 Big Data Analytics, UC Davis
13. How emoji get their meanings?
► When people replace words using emoji (logographic)
► Homonymy relations in languages (E.g., – eye & I)
13
Image Source – https://goo.gl/rjS1hX
I
*Actual social media content
4/19/2020BAX-423 Big Data Analytics, UC Davis
14. Getting the Emoji Meanings
14
Image Source – http://emojinet.knoesis.org
4/19/2020BAX-423 Big Data Analytics, UC Davis
15. EmojiNet
15
Image Source – https://arxiv.org/pdf/1707.04652.pdf
4/19/2020BAX-423 Big Data Analytics, UC Davis
17. Emoji Similarity Problem
17
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Measuring the semantic similarity of emoji such
that the measure reflects the likeness of their
meaning, interpretation or intended use.”
[Wijeratne et al., 2017]
18. Notion of Emoji Similarity
18
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Notion of emoji similarity is broad
► Pixel-based Emoji Similarity
► Meaning-based Emoji Similarity
20. Distributional Semantics
20
► Finds semantic properties of linguistic items (words)
based on their distribution in a large corpus
► Based on Distributional Hypothesis (Harris, 1954)
► Words that are used and occur in the same contexts tend to
purport similar meanings
► We use large text corpora with emoji to learn
distributional semantics of emoji, which reveals
relationships among emoji
4/19/2020BAX-423 Big Data Analytics, UC Davis
21. Learning Emoji Embeddings
► Learn distributional semantics of words as word
embeddings using two corpora (Tweets and
Google News)
► Convert the words in emoji meanings to vectors
using word embeddings (emoji embeddings)
► Evaluate the similarity (distance) of emoji in the
embedding space using EmoSim508, a new
dataset with 508 emoji pairs
21
4/19/2020BAX-423 Big Data Analytics, UC Davis
23. Ground Truth Data Creation
23
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Most frequently occuring
emoji pairs from a 110M
Twitter dataset with emoji
► Evaluated each emoji
pair for their similarity and
relatedness by 10 human
users
24. Intrinsic Evaluation
► Using four different emoji definitions
(Sense_Desc., Sense_Label, Sense_Def.,
Sense_All) and two corpora (Twitter and Google
News), we trained eight emoji embedding
models for each emoji
► We calculated emoji similarity of the 508 emoji
pairs using each embedding model
24
4/19/2020BAX-423 Big Data Analytics, UC Davis
25. Intrinsic Evaluation Cont.
► Using Spearman’s Rank Correlation Coefficient
(Spearman’s ρ), we compared the similarity
rankings of each model with ground truth data
25
4/19/2020BAX-423 Big Data Analytics, UC Davis
26. Extrinsic Evaluation
► We tested our emoji embedding models using a
sentiment analysis baseline
► Our baseline had 12,920 English tweets, and 2,295 of
them had emoji
► All words in the tweets were replaced with their
corresponding word embeddings and emoji were
replaced with emoji embeddings learned
26
4/19/2020BAX-423 Big Data Analytics, UC Davis
28. Key Takeaways
► Combining emoji sense knowledge with
distributional semantics could improve the emoji
embedding models
► Longer sense definitions are not suitable for emoji
similarity experiments
28
4/19/2020BAX-423 Big Data Analytics, UC Davis
30. Emoji Sense Disambiguation Problem
30
Image Source – https://goo.gl/rjS1hX 4/19/2020BAX-423 Big Data Analytics, UC Davis
*Actual social media contentI Look
► “The ability to identify the meaning of an emoji in the context of a
message in a computational manner” [Wijeratne et al., 2017].
31. Emoji Sense Disambiguation
► Currently, no labeled datasets available to solve the
emoji sense disambiguation in a supervised setting
31
4/19/2020BAX-423 Big Data Analytics, UC Davis
32. Emoji Sense Disambiguation Cont.
► We selected 25 most commonly misunderstood
emoji and selected 50 tweets for each emoji
► Used Simplified LESK algorithm for disambiguation
► Context words were learned for each emoji sense
definition using Twitter and Google News-based word
embedding models
► Twitter-based embeddings outperform others
32
4/19/2020BAX-423 Big Data Analytics, UC Davis
33. Results and Takeaways
33
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Tools designed for well-formed text processing will not
work well when used for ill-formatted text processing
► Sense disambiguation accuracy increases with the
increase of the number of context words used
35. Recap
35
4/19/2020BAX-423 Big Data Analytics, UC Davis
► We looked at
► Why it is important to do emoji analysis
► How emoji get their meanings
► How to calculate emoji similarity
► How to disambiguate the meaning of an emoji
37. References
► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. A Semantics-Based Measure of
Emoji Similarity. In 2017 IEEE/WIC/ACM International Conference on Web Intelligence (Web
Intelligence 2017). Leipzig, Germany; 2017. [PDF]
► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: An Open Service and
API for Emoji Sense Discovery. In 11th International AAAI Conference on Web and Social Media
(ICWSM 2017). Montreal, Canada; 2017. [PDF]
► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: Building a Machine
Readable Sense Inventory for Emoji. In 8th International Conference on Social Informatics (SocInfo
2016). Bellevue, WA, USA; 2016. [PDF]
► Lakshika Balasuriya, Sanjaya Wijeratne, Derek Doran, Amit Sheth. Finding Street Gang Members on
Twitter, In The 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis
and Mining (ASONAM 2016). San Francisco, CA, USA; 2016. [PDF]
37
4/19/2020BAX-423 Big Data Analytics, UC Davis