SlideShare une entreprise Scribd logo
1  sur  23
EmojiNet: Building a Machine Readable
Sense Inventory for Emoji
Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)
Wright State University, Dayton, OH, USA
Presented at the 8th International Conference on Social Informatics (SocInfo 2016)
Bellevue, WA, USA, 14th – 17th November, 2016
Lakshika Balasuriya
lakshika@knoesis.org
Sanjaya Wijeratne
sanjaya@knoesis.org
Derek Doran
derek@knoesis.org
Amit Sheth
amit@knoesis.org
2
What are Emoji
Emoji are pictographs
Invented by Shigetaka Kurita in late 1990s
Emoji are extremely popular
6B messages exchanged per day contain emoji1
Face with tears of joy was the word of the year in 2015
Eggplant emoji was the most notable emoji in 2015
Businesses extensively use emoji in their applications
777% increase of emoji use in marketing campaigns2
20% month over month increase in 20162
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
1Swift Key Report – http://bit.ly/2c5biPU
2Appboy Blog – https://www.appboy.com/blog/emojis-used-in-777-more-campaigns/
3
Emoji Usage in Social Media
People use emoji to:
Add color and whimsiness to their messages
To maintain conversational connections in a playful
manner [Kelly, 2015]
To replace emoticons [Pavalanathan, 2016]
Emoji are being used as a new language
Emoji were defined with no rigid semantics,
hence people assign meanings to them
Celebration hands are often used as prayer hands
Punching hand is often used to fist bump someone
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
4
Ambiguity in Emoji
Ambiguity in emoji occurs due to two reasons
Differences in rendering platforms [Miller, 2016]
People have assigned different meanings to emoji
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
Image Source – [Miller, 2016]
5
Disambiguating Emoji Senses
Emoji Sense Disambiguation requires:
A machine readable dictionary of emoji meanings
Algorithms for emoji sense disambiguation
Our contributions:
EmojiNet: A machine readable emoji sense inventory
Integrates four emoji resources on the web
Assigns sense definitions to emojis
Provides a web resource that is openly available at
http://emojinet.knoesis.org
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
6
Building EmojiNet
Representing an Emoji (ei)
ui – Unicode character
ci – Short code name
di – Emoji definition
Ki – Set of keywords
Ii – Set of images
Ri – Set of related emoji
Hi – Set of categories
Si – Set of senses with definitions
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
7
Building EmojiNet Cont.
Different emoji resources on the web carries
valuable information that can complement each
other when they are combined
Unicode.org and The Emoji Dictionary are
integrated based on the images of the emoji, and
the rest are integrated on the Unicode of emoji
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
8
Building EmojiNet Cont.
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
Steps involved in building EmojiNet
9
Integrating The Emoji Dictionary
A nearest neighborhood-based image processing
algorithm was used to integrate Unicode.org with
The Emoji Dictionary
Two images sets were used:
13,387 images downloaded from Unicode.org
representing 1,791 emoji
1,074 images downloaded from The Emoji Dictionary
representing 1,074 emoji
We use color intensities of each image to compute
similarities between the images
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
10
Integrating The Emoji Dictionary Cont.
Image processing algorithm in simple steps:
Re-size each image to 300 X 300 pixels and divide each image to
25 non-overlapping regions of size 25 X 25 pixels
Find average color intensity of each region by averaging R, G
and B pixel color values
Compare the color intensities of corresponding image regions
and calculate the dissimilarity between the images using L2
distance
Select the least dissimilar image as the match
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
11
Evaluation – Image Processing
The image processing algorithm achieved
98.42% accuracy when evaluated manually
Only 17 images were labeled incorrectly from 1,074
instances we checked
Error analysis revealed that the algorithm fails when
the two compared images represent two different
objects but similar in color
Eg. – Clocks with arms displaying different times,
Flags with slight changes
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
13
Extracting Sense Labels
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
Extracting Sense Labels from The Emoji Dictionary
face, joy, laugh, tear, cry, happy
Joy(N), laugh(N), tear(N),
cry(V), happy(A),
funny(V), funny(A)
Joy(N), laugh(N), tear(N),
cry(V), happy(A), funny(A)
14
Assigning BabelNet Sense IDs
A sense label can have multiple BabelNet sense
definitions
Eg. – Laugh(Noun) has 6 BabelNet senses
We use Manually Annotated Sub Corpus
(MASC) to assign the correct sense
Words in MASC is already sense disambiguated
We use MASC-based Most Frequent Sense (MFS)
baseline to assign senses to sense labels
When MFS fails, we use a Most Popular Sense
(MPS) based on BabelNet
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
15
Assigning BabelNet Sense IDs Cont.
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
Assigning BabelNet SenseIDs to Sense Labels extracted from The Emoji Dictionary
Laugh(N):
bn:00050198n (5)
Laugh(N):
bn:00050199n (3) Laugh(N) = bn:00050198n
Is Laugh(N) in
EmojiNet?
Gun(N):
bn:00042221n (6)
Gun(N):
bn:02379114n (1)
Gun(N) = bn:00042221n
Is Gun(N) in
EmojiNet?
16
Evaluation – Word Sense Disambiguation
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
Most Frequent Sense Baseline Results
# of total sense labels 3,206
# of disambiguated sense labels using MFS method 2,293
# of correctly disambiguated sense labels 2,031
# of incorrectly disambiguated sense labels 262
Accuracy of MFS-baseline method 88.57%
Most Popular Sense Baseline Results
# of total sense labels 3,206
# of disambiguated sense labels using MPS method 913
# of correctly disambiguated sense labels 700
# of incorrectly disambiguated sense labels 213
Accuracy of MFS-baseline method 76.67%
17
Evaluation – Word Sense Disambiguation
Cont.
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
Correct Incorrect Total
Nouns 1,217
(83.28%)
255
(16.71%)
1,526
Verbs 735
(84.00%)
140
(16.00%)
875
Adjectives 725
(90.06%)
80
(9.93%)
805
Total 2,731
(85.18%)
475
(14.81%)
3,206
Aggregated Word Sense Disambiguation Statistics
18
EmojiNet Statistics
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
EmojiNet Data Features
# of emoji
with feature
Amount of
data stored
for the feature
ui – Unicode character 1,074 1,074
ci – Short code name 845 845
di – Emoji definition 1,074 1,074
Ki – Set of keywords 1,074 8,069
Ii – Set of images 1,074 28,370
Ri – Set of related emoji 1,074 9,743
Hi – Set of categories 705 8
Si – Set of senses with definitions 875 3,206
19
EmojiNet at Work
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
Sense Context words extracted from EmojiNet for each Sense
Pray
(verb)
worship, thanksgiving, saint, pray, higher, god,
confession
High five
(noun)
Palm, high, hand, slide, celebrate, raise, person, head,
five
T1 – Pray for my family God gained an angel today
T2 – Hard to win, but we did it man Lets celebrate!
We use the Simplified LESK algorithm which is based on the word
overlap between the words in the sense definitions and tweets
20
Challenges and Future work
 Extend EmojiNet sense definitions with words
extracted from Tweets
 Word embedding models trained on tweets with
emoji
 Evaluate the usability of EmojiNet
 Emoji similarity and emoji sense disambiguation tasks
 Applying EmojiNet for real world tasks
 Sentiment analysis and Emoji understanding
Image Source – http://i.ytimg.com/vi/dqyYvIqjuFI/maxresdefault.jpg
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
21
Connect with me
sanjaya@knoesis.org
@sanjrockz
http://bit.do/sanjaya
Image Source – http://www.pcb.its.dot.gov/standardstraining/mod08/ppt/m08ppt23.jpg
SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
22SML @ IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
References
[Kelly, 2015] Kelly, R., Watts, L.: Characterizing the
inventive appropriation of emoji as relation-ally
meaningful in mediated close personal relationships.
Experiences of Technology Appropriation:
Unanticipated Users, Usage, Circumstances, and Design
(2015).
[Pavalanathan, 2016] Pavalanathan, Umashanthi, and
Jacob Eisenstein. "Emoticons vs. emojis on Twitter: A
causal inference approach." arXiv preprint
arXiv:1510.08480 (2015).
23SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
References Cont.
[Miller, 2016] Miller, Hannah, Jacob Thebault-Spieker,
Shuo Chang, Isaac Johnson, Loren Terveen, and Brent
Hecht. “Blissfully happy” or “ready to fight”: Varying
Interpretations of Emoji. ICWSM’16 (2016).
[Wijeratne, 2016] Sanjaya Wijeratne, Lakshika Balasuriya,
Amit Sheth, Derek Doran. EmojiNet: Building a
Machine Readable Sense Inventory for Emoji. In 8th
International Conference on Social Informatics (SocInfo
2016). Bellevue, WA, USA; (2016).
24SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji

Contenu connexe

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

En vedette

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software
 

En vedette (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

EmojiNet: Building a Machine Readable Sense Inventory for Emoji

  • 1. EmojiNet: Building a Machine Readable Sense Inventory for Emoji Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State University, Dayton, OH, USA Presented at the 8th International Conference on Social Informatics (SocInfo 2016) Bellevue, WA, USA, 14th – 17th November, 2016 Lakshika Balasuriya lakshika@knoesis.org Sanjaya Wijeratne sanjaya@knoesis.org Derek Doran derek@knoesis.org Amit Sheth amit@knoesis.org
  • 2. 2 What are Emoji Emoji are pictographs Invented by Shigetaka Kurita in late 1990s Emoji are extremely popular 6B messages exchanged per day contain emoji1 Face with tears of joy was the word of the year in 2015 Eggplant emoji was the most notable emoji in 2015 Businesses extensively use emoji in their applications 777% increase of emoji use in marketing campaigns2 20% month over month increase in 20162 SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji 1Swift Key Report – http://bit.ly/2c5biPU 2Appboy Blog – https://www.appboy.com/blog/emojis-used-in-777-more-campaigns/
  • 3. 3 Emoji Usage in Social Media People use emoji to: Add color and whimsiness to their messages To maintain conversational connections in a playful manner [Kelly, 2015] To replace emoticons [Pavalanathan, 2016] Emoji are being used as a new language Emoji were defined with no rigid semantics, hence people assign meanings to them Celebration hands are often used as prayer hands Punching hand is often used to fist bump someone SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 4. 4 Ambiguity in Emoji Ambiguity in emoji occurs due to two reasons Differences in rendering platforms [Miller, 2016] People have assigned different meanings to emoji SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji Image Source – [Miller, 2016]
  • 5. 5 Disambiguating Emoji Senses Emoji Sense Disambiguation requires: A machine readable dictionary of emoji meanings Algorithms for emoji sense disambiguation Our contributions: EmojiNet: A machine readable emoji sense inventory Integrates four emoji resources on the web Assigns sense definitions to emojis Provides a web resource that is openly available at http://emojinet.knoesis.org SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 6. 6 Building EmojiNet Representing an Emoji (ei) ui – Unicode character ci – Short code name di – Emoji definition Ki – Set of keywords Ii – Set of images Ri – Set of related emoji Hi – Set of categories Si – Set of senses with definitions SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 7. 7 Building EmojiNet Cont. Different emoji resources on the web carries valuable information that can complement each other when they are combined Unicode.org and The Emoji Dictionary are integrated based on the images of the emoji, and the rest are integrated on the Unicode of emoji SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 8. 8 Building EmojiNet Cont. SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji Steps involved in building EmojiNet
  • 9. 9 Integrating The Emoji Dictionary A nearest neighborhood-based image processing algorithm was used to integrate Unicode.org with The Emoji Dictionary Two images sets were used: 13,387 images downloaded from Unicode.org representing 1,791 emoji 1,074 images downloaded from The Emoji Dictionary representing 1,074 emoji We use color intensities of each image to compute similarities between the images SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 10. 10 Integrating The Emoji Dictionary Cont. Image processing algorithm in simple steps: Re-size each image to 300 X 300 pixels and divide each image to 25 non-overlapping regions of size 25 X 25 pixels Find average color intensity of each region by averaging R, G and B pixel color values Compare the color intensities of corresponding image regions and calculate the dissimilarity between the images using L2 distance Select the least dissimilar image as the match SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 11. 11 Evaluation – Image Processing The image processing algorithm achieved 98.42% accuracy when evaluated manually Only 17 images were labeled incorrectly from 1,074 instances we checked Error analysis revealed that the algorithm fails when the two compared images represent two different objects but similar in color Eg. – Clocks with arms displaying different times, Flags with slight changes SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 12. 13 Extracting Sense Labels SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji Extracting Sense Labels from The Emoji Dictionary face, joy, laugh, tear, cry, happy Joy(N), laugh(N), tear(N), cry(V), happy(A), funny(V), funny(A) Joy(N), laugh(N), tear(N), cry(V), happy(A), funny(A)
  • 13. 14 Assigning BabelNet Sense IDs A sense label can have multiple BabelNet sense definitions Eg. – Laugh(Noun) has 6 BabelNet senses We use Manually Annotated Sub Corpus (MASC) to assign the correct sense Words in MASC is already sense disambiguated We use MASC-based Most Frequent Sense (MFS) baseline to assign senses to sense labels When MFS fails, we use a Most Popular Sense (MPS) based on BabelNet SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 14. 15 Assigning BabelNet Sense IDs Cont. SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji Assigning BabelNet SenseIDs to Sense Labels extracted from The Emoji Dictionary Laugh(N): bn:00050198n (5) Laugh(N): bn:00050199n (3) Laugh(N) = bn:00050198n Is Laugh(N) in EmojiNet? Gun(N): bn:00042221n (6) Gun(N): bn:02379114n (1) Gun(N) = bn:00042221n Is Gun(N) in EmojiNet?
  • 15. 16 Evaluation – Word Sense Disambiguation SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji Most Frequent Sense Baseline Results # of total sense labels 3,206 # of disambiguated sense labels using MFS method 2,293 # of correctly disambiguated sense labels 2,031 # of incorrectly disambiguated sense labels 262 Accuracy of MFS-baseline method 88.57% Most Popular Sense Baseline Results # of total sense labels 3,206 # of disambiguated sense labels using MPS method 913 # of correctly disambiguated sense labels 700 # of incorrectly disambiguated sense labels 213 Accuracy of MFS-baseline method 76.67%
  • 16. 17 Evaluation – Word Sense Disambiguation Cont. SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji Correct Incorrect Total Nouns 1,217 (83.28%) 255 (16.71%) 1,526 Verbs 735 (84.00%) 140 (16.00%) 875 Adjectives 725 (90.06%) 80 (9.93%) 805 Total 2,731 (85.18%) 475 (14.81%) 3,206 Aggregated Word Sense Disambiguation Statistics
  • 17. 18 EmojiNet Statistics SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji EmojiNet Data Features # of emoji with feature Amount of data stored for the feature ui – Unicode character 1,074 1,074 ci – Short code name 845 845 di – Emoji definition 1,074 1,074 Ki – Set of keywords 1,074 8,069 Ii – Set of images 1,074 28,370 Ri – Set of related emoji 1,074 9,743 Hi – Set of categories 705 8 Si – Set of senses with definitions 875 3,206
  • 18. 19 EmojiNet at Work SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji Sense Context words extracted from EmojiNet for each Sense Pray (verb) worship, thanksgiving, saint, pray, higher, god, confession High five (noun) Palm, high, hand, slide, celebrate, raise, person, head, five T1 – Pray for my family God gained an angel today T2 – Hard to win, but we did it man Lets celebrate! We use the Simplified LESK algorithm which is based on the word overlap between the words in the sense definitions and tweets
  • 19. 20 Challenges and Future work  Extend EmojiNet sense definitions with words extracted from Tweets  Word embedding models trained on tweets with emoji  Evaluate the usability of EmojiNet  Emoji similarity and emoji sense disambiguation tasks  Applying EmojiNet for real world tasks  Sentiment analysis and Emoji understanding Image Source – http://i.ytimg.com/vi/dqyYvIqjuFI/maxresdefault.jpg SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 20. 21 Connect with me sanjaya@knoesis.org @sanjrockz http://bit.do/sanjaya Image Source – http://www.pcb.its.dot.gov/standardstraining/mod08/ppt/m08ppt23.jpg SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 21. 22SML @ IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
  • 22. References [Kelly, 2015] Kelly, R., Watts, L.: Characterizing the inventive appropriation of emoji as relation-ally meaningful in mediated close personal relationships. Experiences of Technology Appropriation: Unanticipated Users, Usage, Circumstances, and Design (2015). [Pavalanathan, 2016] Pavalanathan, Umashanthi, and Jacob Eisenstein. "Emoticons vs. emojis on Twitter: A causal inference approach." arXiv preprint arXiv:1510.08480 (2015). 23SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji
  • 23. References Cont. [Miller, 2016] Miller, Hannah, Jacob Thebault-Spieker, Shuo Chang, Isaac Johnson, Loren Terveen, and Brent Hecht. “Blissfully happy” or “ready to fight”: Varying Interpretations of Emoji. ICWSM’16 (2016). [Wijeratne, 2016] Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: Building a Machine Readable Sense Inventory for Emoji. In 8th International Conference on Social Informatics (SocInfo 2016). Bellevue, WA, USA; (2016). 24SocInfo 2016 Wijeratne, Sanjaya et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji

Notes de l'éditeur

  1. Image Source - http://i.ytimg.com/vi/dqyYvIqjuFI/maxresdefault.jpg
  2. Image Source - http://www.pcb.its.dot.gov/standardstraining/mod08/ppt/m08ppt23.jpg