SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Analyzing Textual Sources Attributes of Comics
Based on Word Frequency and Meaning
Kansai University
◎Ryota Higuchi, Ryosuke Yamanishi, Mitsunori Matsushita
Abstract
• This paper
-distinguished what vocabulary was common / different
1 /19
• The purpose of this research
- constructing a vocabulary set that characterizes comics.
-focused on two different textual sources : Explanations and Reviews
Introduction
2 /19
• Huge number of new comic books
• How to a user choose comics?
- retrieves using web services
- Typical queries = Meta-information
Meta-information is not sufficient for retrieving comics
based on user preferences.
action/adventure
ONE PIECE
Hero defeats
the villain
• A user can not retrieve based on a story
Differences of Textual Sources
• Multiple sources of information on the same topic
Ex.) Texts on the web
-Explanations :
-Reviews :
-Outlines :
-Q&A :
3 /19
Differences of Textual Sources
• Multiple sources of information on the same topic
Ex.) Texts on the web
-Explanations
-Reviews
-Outlines
-Q&A
4 /19
-The character’s features
• contain an overview of the work
-significant episodes
• From Wikipedia
Differences of Textual Sources
• Multiple sources of information on the same topic
Ex.) Texts on the web
-Explanations
-Reviews
-Outlines
-Q&A
5 /19
-What readers liked/disliked
• contain impressions and evaluations
-feedback
• From Amazon
Differences of Textual Sources
• Multiple sources of information on the same topic
Ex.) Texts on the web
-Explanations :
-Reviews :
-Outlines :
-Q&A :
6 /19
the textual details vary from each source.
Whilst these texts from different information sources represent the same content,
Overview of the work
Reader’s impressions and evaluations
Overview of the work
sharing the knowledge
Selecting Information Sources is Difficult.
Differences in textual sources in comics have received little attention.
7 /19
• We should conduct a study using suitable information sources.
Problem
• there are few cases
the sources are selected with quantitative reasons.
Selecting Information Sources
Are you selecting information sources
based on your experience?
We've found a new use for web text, using trendy AI!!!
Well, the results are so-so...but is this the correct data for input???
Information sources must be selected
by discussing quantitative reasons
8 /19
• We should conduct a study using suitable information sources.
Purpose
1. the different attributes : selecting appropriate information resources
2. the common attributes : accessing to a large amount of data
<Providing two advantages>
by combing different types of sources
distinguishing what vocabulary was common/different
between 2 textual sources
Analysis Method
1. Datasets Construction
-Two type of sources:Explanations and Reviews about comic books
2. Construction of Classification Dictionary
3. Classification of Words Semantically
- Calculating word frequencies by using the dictionary
9 /19
Analyzing how frequently the words with what meanings appear
in each textual sources
1. Datasets Construction : Explanations
10 /19
• Information sources : Internet encyclopedias
• Data size : 6,250 points, 2,067 comic characters
• Texts describing the comic characters in detail
Website Features
Wikipedia Famous online encyclopedias
Niconico Pedia
character dialogue and net slang
Pixiv encyclopedia
Aniotawiki some description rules
1. Datasets Construction : Reviews
11 /19
• The purpose of this website
• Information sources : review website “Sakuhin Database”
-Different from shopping website like Amazon
-Evaluating works and Collecting information
• Texts about popular comics were included.
• Data size : 6,250points
1. Datasets Construction : Data Cleaning
-only common nouns
-stop words based on Slothlib
-removed low frequency word
12 /19
• Data cleaning
• The total number of word differences
-Explanations : 7,136 words -Reviews : 3,092 words
• Train data : 10,000 points
• Test data : 2,500 points
2. Construction of Classification Dictionary
• To analyze “what mean of words exist,”
Class An example of words in the class
hard battle, comrades in arms, first game
black, white, brown, complexion
idol, shortcoming, gym, position
13 /19
Word class sets are obtained using word embedding and k-means clustering.
-The elbow method shows 63 classes.
(激戦) (戦友) (初戦)
(⿊) (褐⾊)
(⽩) (顔⾊)
(アイドル)(コンプレックス)(ジム)(ポジション)
The average number of words : 118.8points
The resulting class sets of word was
a class dictionary
to use for content analysis of comics.
3. Classification of Frequently Appearing Words
0 1 62
𝒃𝟎 1 1 0
𝑏" 1 0 0
𝑏#$%% 1 0 1
…
Input : test data
A B C
14 /19
Classifying frequent words using the class dictionary
𝑡! = [vitality, bravery, male]
𝑡" = [impression, vitality, anime]
𝑡#$%% = [smile, captain, sister]
…
0 1 62
vitality
bravery
vigor
smile
male
female
sex
affinity
sister
brother
cousin
parent
63 classes dictionary output:63-bits vectors
One test data 𝒕𝟎 contains
the word “male”. 𝒕𝟎 contains an element of class 1.
Class 1 of the dictionary also contains
the word “male”.
We put “1”
in the corresponding
location.
Discussion Points and Evaluation
15 /19
• Calculating the relative difference using the binary array
-It was defined as an absolute value
of the difference ratio in each source.
Ex.)
- If both sources have the same ratio...
- If the ratio is biased to one side...
the relative difference : 0%
the relative difference : 100%
• Discussion Points
-Which classes the frequent words correspond to in each source
-What the meanings words are included in the classes
with a big difference between the two sources
Results : Frequently Appearing Classes in Explanations
Class words
Relative
difference
body, body length, familiar
74.2
parent, brother, sister
63.7
16 /19
(⾝⻑)
(⾝) (⾝近)
(親) (姉)
(兄)
• In Wikipedia for Tanjiro Kamado,
“His body length is 165cm.”
• In Pixib encyclopedia for Sabo,
“He is Luffy's brother.”
There are many words that
describe a character's feature and the content of works.
Application examples :
constructing a vocabulary set
that characterizes comics
17 /19
Results : Frequently Appearing Classes in Reviews
Class words
Relative
difference
comic, movie, illustration
35.5
work, cartoonist, masterpiece
19.8
(映画)
(漫画) (イラスト)
(作家)
(作品) (傑作)
• I love the illustrations this cartoonist draws!
• This cartoonist style will have
a great influence on future generations.
There are many words that
represent meta-information about comic works.
Application examples : research on genre analysis, topic classification
hairstyle, check, plastic model,
character, animation, diet, etc.
Result : Common Attribute
18 /19
• Most of the test data corresponded to classes containing many foreign words.
(チェック)
(ヘアスタイル) (プラモ)
(アニメ)
(キャラ) (ダイエット)
73% of the total
-3 kinds of characters in Japanese : “Hiragana”, “Katakana”, “Kanji”
-Katakana is used to describe something from foreign countries.
• Japanese language
• The words in this class are written in Katakana.
Result : Common Attribute
19 /19
• The class is not a semantic set.
-There are new or unknown words
in explanation and reviews.
• The reason for this result
hairstyle, check, plastic model,
character, animation, diet, etc.
(チェック)
(ヘアスタイル) (プラモ)
(アニメ)
(キャラ) (ダイエット)
• Improvement Plan
Reconsidering word embedding models and training corpus
Summary
• Background :
• Problems :
• Purpose :
• Method :
• Conclusion :
-Explanations :
-Reviews :
the same content, but the textual details vary from each source
Semantic classification of frequently appearing words
describe the content of comics
represent meta-information about works
Differences of sources have received little attention
Thank you for your attention.
distinguishing what vocabulary was common/different
between 2 textual sources

Contenu connexe

Similaire à RyotaHiguchi_Manpu2022.pdf

Writing Assignment Comic Strip or Political Cartoon Analysis .docx
Writing Assignment Comic Strip or Political Cartoon Analysis .docxWriting Assignment Comic Strip or Political Cartoon Analysis .docx
Writing Assignment Comic Strip or Political Cartoon Analysis .docx
ambersalomon88660
 
Gondek- Curriculum Map-extended
Gondek- Curriculum Map-extendedGondek- Curriculum Map-extended
Gondek- Curriculum Map-extended
abby gondek
 
AP LanguageMrs. MathewUnit 3 Synthesis ProjectYou will .docx
AP LanguageMrs. MathewUnit 3 Synthesis ProjectYou will .docxAP LanguageMrs. MathewUnit 3 Synthesis ProjectYou will .docx
AP LanguageMrs. MathewUnit 3 Synthesis ProjectYou will .docx
jesuslightbody
 
ENG/IMS 224, January 29, 2013
ENG/IMS 224, January 29, 2013ENG/IMS 224, January 29, 2013
ENG/IMS 224, January 29, 2013
Miami University
 
Ethical and Social Issues Relating to Information Systems Gradin.docx
Ethical and Social Issues Relating to Information Systems Gradin.docxEthical and Social Issues Relating to Information Systems Gradin.docx
Ethical and Social Issues Relating to Information Systems Gradin.docx
SANSKAR20
 

Similaire à RyotaHiguchi_Manpu2022.pdf (20)

Writing Assignment Comic Strip or Political Cartoon Analysis .docx
Writing Assignment Comic Strip or Political Cartoon Analysis .docxWriting Assignment Comic Strip or Political Cartoon Analysis .docx
Writing Assignment Comic Strip or Political Cartoon Analysis .docx
 
Gondek- Curriculum Map-extended
Gondek- Curriculum Map-extendedGondek- Curriculum Map-extended
Gondek- Curriculum Map-extended
 
Assignment 1 compare - contrast
Assignment 1   compare - contrastAssignment 1   compare - contrast
Assignment 1 compare - contrast
 
AP LanguageMrs. MathewUnit 3 Synthesis ProjectYou will .docx
AP LanguageMrs. MathewUnit 3 Synthesis ProjectYou will .docxAP LanguageMrs. MathewUnit 3 Synthesis ProjectYou will .docx
AP LanguageMrs. MathewUnit 3 Synthesis ProjectYou will .docx
 
Information Literacy Award - Media Arts
Information Literacy Award - Media ArtsInformation Literacy Award - Media Arts
Information Literacy Award - Media Arts
 
Franklin university humn 240 assignment help
Franklin university humn 240 assignment helpFranklin university humn 240 assignment help
Franklin university humn 240 assignment help
 
Literature circles and dif instructbeta
Literature circles and dif instructbetaLiterature circles and dif instructbeta
Literature circles and dif instructbeta
 
Order #185993101 writers choice (5 pages, 4 slides)type of serv
Order #185993101 writers choice (5 pages, 4 slides)type of servOrder #185993101 writers choice (5 pages, 4 slides)type of serv
Order #185993101 writers choice (5 pages, 4 slides)type of serv
 
ENG/IMS 224, January 29, 2013
ENG/IMS 224, January 29, 2013ENG/IMS 224, January 29, 2013
ENG/IMS 224, January 29, 2013
 
Semantic engagement
Semantic engagementSemantic engagement
Semantic engagement
 
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
 
Mest 4
Mest 4Mest 4
Mest 4
 
Information Literacy Award - Drama, Theatre & Dance
Information Literacy Award - Drama, Theatre & DanceInformation Literacy Award - Drama, Theatre & Dance
Information Literacy Award - Drama, Theatre & Dance
 
Australian Curriculum English
Australian Curriculum EnglishAustralian Curriculum English
Australian Curriculum English
 
Basic requirements 
 type of writing expected comparison
Basic requirements 
    type of writing expected  comparisonBasic requirements 
    type of writing expected  comparison
Basic requirements 
 type of writing expected comparison
 
Ethical and Social Issues Relating to Information Systems Gradin.docx
Ethical and Social Issues Relating to Information Systems Gradin.docxEthical and Social Issues Relating to Information Systems Gradin.docx
Ethical and Social Issues Relating to Information Systems Gradin.docx
 
Lecture09
Lecture09Lecture09
Lecture09
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
Assignment 1 (fnbe jan)
Assignment 1 (fnbe jan)Assignment 1 (fnbe jan)
Assignment 1 (fnbe jan)
 
Assignment 1 (fnbe jan)
Assignment 1 (fnbe jan)Assignment 1 (fnbe jan)
Assignment 1 (fnbe jan)
 

Plus de Matsushita Laboratory

TaketoFujikawa_台本中の動作表現に基づくアニメーション原画システムの提案_SIGEC71.pdf
TaketoFujikawa_台本中の動作表現に基づくアニメーション原画システムの提案_SIGEC71.pdfTaketoFujikawa_台本中の動作表現に基づくアニメーション原画システムの提案_SIGEC71.pdf
TaketoFujikawa_台本中の動作表現に基づくアニメーション原画システムの提案_SIGEC71.pdf
Matsushita Laboratory
 
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_仕掛学2024.pdf
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_仕掛学2024.pdfHarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_仕掛学2024.pdf
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_仕掛学2024.pdf
Matsushita Laboratory
 
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_Shikake2024
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_Shikake2024HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_Shikake2024
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_Shikake2024
Matsushita Laboratory
 
ChinaTakahashi_AMPERE料理-器関係の双対性に着目した探索的な器選択の支援_HCI2024
ChinaTakahashi_AMPERE料理-器関係の双対性に着目した探索的な器選択の支援_HCI2024ChinaTakahashi_AMPERE料理-器関係の双対性に着目した探索的な器選択の支援_HCI2024
ChinaTakahashi_AMPERE料理-器関係の双対性に着目した探索的な器選択の支援_HCI2024
Matsushita Laboratory
 
KannaMiyagawa_HCG2023_A Visualization Method for Variation of Characters’ Rel...
KannaMiyagawa_HCG2023_A Visualization Method for Variation of Characters’ Rel...KannaMiyagawa_HCG2023_A Visualization Method for Variation of Characters’ Rel...
KannaMiyagawa_HCG2023_A Visualization Method for Variation of Characters’ Rel...
Matsushita Laboratory
 
松下研究室紹介_関西大学高槻キャンパスオープンキャンパス
松下研究室紹介_関西大学高槻キャンパスオープンキャンパス松下研究室紹介_関西大学高槻キャンパスオープンキャンパス
松下研究室紹介_関西大学高槻キャンパスオープンキャンパス
Matsushita Laboratory
 
ChinaTakahashi_Exploration cycle finding a better dining experience: a frame...
 ChinaTakahashi_Exploration cyclefinding a better dining experience:a frame... ChinaTakahashi_Exploration cyclefinding a better dining experience:a frame...
ChinaTakahashi_Exploration cycle finding a better dining experience: a frame...
Matsushita Laboratory
 

Plus de Matsushita Laboratory (20)

HarutakaTokumaru_アート観賞イベントと連動したインタラクティブな街歩き型ストーリーリーダーの一検討_deim2024
HarutakaTokumaru_アート観賞イベントと連動したインタラクティブな街歩き型ストーリーリーダーの一検討_deim2024HarutakaTokumaru_アート観賞イベントと連動したインタラクティブな街歩き型ストーリーリーダーの一検討_deim2024
HarutakaTokumaru_アート観賞イベントと連動したインタラクティブな街歩き型ストーリーリーダーの一検討_deim2024
 
TaketoFujikawa_台本中の動作表現に基づくアニメーション原画システムの提案_SIGEC71.pdf
TaketoFujikawa_台本中の動作表現に基づくアニメーション原画システムの提案_SIGEC71.pdfTaketoFujikawa_台本中の動作表現に基づくアニメーション原画システムの提案_SIGEC71.pdf
TaketoFujikawa_台本中の動作表現に基づくアニメーション原画システムの提案_SIGEC71.pdf
 
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_仕掛学2024.pdf
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_仕掛学2024.pdfHarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_仕掛学2024.pdf
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_仕掛学2024.pdf
 
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_Shikake2024
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_Shikake2024HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_Shikake2024
HarukiShinkawa_果樹農家が期待する行動への変容を促す仕掛け設計のための収穫作業体験者の行動観察とモデル化_Shikake2024
 
HataReon_便利の副作用に気づかせるための発想支援手法の基礎検討_HCI206
HataReon_便利の副作用に気づかせるための発想支援手法の基礎検討_HCI206HataReon_便利の副作用に気づかせるための発想支援手法の基礎検討_HCI206
HataReon_便利の副作用に気づかせるための発想支援手法の基礎検討_HCI206
 
ChinaTakahashi_AMPERE料理-器関係の双対性に着目した探索的な器選択の支援_HCI2024
ChinaTakahashi_AMPERE料理-器関係の双対性に着目した探索的な器選択の支援_HCI2024ChinaTakahashi_AMPERE料理-器関係の双対性に着目した探索的な器選択の支援_HCI2024
ChinaTakahashi_AMPERE料理-器関係の双対性に着目した探索的な器選択の支援_HCI2024
 
KokiSugihara_HCG2023_A method for visualizing causal relationships between to...
KokiSugihara_HCG2023_A method for visualizing causal relationships between to...KokiSugihara_HCG2023_A method for visualizing causal relationships between to...
KokiSugihara_HCG2023_A method for visualizing causal relationships between to...
 
KannaMiyagawa_HCG2023_A Visualization Method for Variation of Characters’ Rel...
KannaMiyagawa_HCG2023_A Visualization Method for Variation of Characters’ Rel...KannaMiyagawa_HCG2023_A Visualization Method for Variation of Characters’ Rel...
KannaMiyagawa_HCG2023_A Visualization Method for Variation of Characters’ Rel...
 
TaketoFujikawa_10thComicComputing2023
TaketoFujikawa_10thComicComputing2023TaketoFujikawa_10thComicComputing2023
TaketoFujikawa_10thComicComputing2023
 
SayakaHayashi_FIT2023
SayakaHayashi_FIT2023SayakaHayashi_FIT2023
SayakaHayashi_FIT2023
 
松下研究室紹介_関西大学高槻キャンパスオープンキャンパス
松下研究室紹介_関西大学高槻キャンパスオープンキャンパス松下研究室紹介_関西大学高槻キャンパスオープンキャンパス
松下研究室紹介_関西大学高槻キャンパスオープンキャンパス
 
ReonHata_JSAI2023
ReonHata_JSAI2023ReonHata_JSAI2023
ReonHata_JSAI2023
 
HarukiShinkawa_FIT2023
HarukiShinkawa_FIT2023HarukiShinkawa_FIT2023
HarukiShinkawa_FIT2023
 
ChinaTakahashi_Exploration cycle finding a better dining experience: a frame...
 ChinaTakahashi_Exploration cyclefinding a better dining experience:a frame... ChinaTakahashi_Exploration cyclefinding a better dining experience:a frame...
ChinaTakahashi_Exploration cycle finding a better dining experience: a frame...
 
TaketoFujikawa_KES2023
TaketoFujikawa_KES2023TaketoFujikawa_KES2023
TaketoFujikawa_KES2023
 
Unification of Terminology for Accurate Communication among Experts --- Basic...
Unification of Terminology for Accurate Communication among Experts --- Basic...Unification of Terminology for Accurate Communication among Experts --- Basic...
Unification of Terminology for Accurate Communication among Experts --- Basic...
 
JSAI2023_企画セッション(仕掛学)資料
JSAI2023_企画セッション(仕掛学)資料JSAI2023_企画セッション(仕掛学)資料
JSAI2023_企画セッション(仕掛学)資料
 
触感に関わる共感覚的表現と基本6感情の対応関係の検証
触感に関わる共感覚的表現と基本6感情の対応関係の検証触感に関わる共感覚的表現と基本6感情の対応関係の検証
触感に関わる共感覚的表現と基本6感情の対応関係の検証
 
レシピの手順に着目した 複数の器特徴の推定
レシピの手順に着目した 複数の器特徴の推定レシピの手順に着目した 複数の器特徴の推定
レシピの手順に着目した 複数の器特徴の推定
 
TaketoFujikawa_comic2023
TaketoFujikawa_comic2023TaketoFujikawa_comic2023
TaketoFujikawa_comic2023
 

Dernier

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

RyotaHiguchi_Manpu2022.pdf

  • 1. Analyzing Textual Sources Attributes of Comics Based on Word Frequency and Meaning Kansai University ◎Ryota Higuchi, Ryosuke Yamanishi, Mitsunori Matsushita
  • 2. Abstract • This paper -distinguished what vocabulary was common / different 1 /19 • The purpose of this research - constructing a vocabulary set that characterizes comics. -focused on two different textual sources : Explanations and Reviews
  • 3. Introduction 2 /19 • Huge number of new comic books • How to a user choose comics? - retrieves using web services - Typical queries = Meta-information Meta-information is not sufficient for retrieving comics based on user preferences. action/adventure ONE PIECE Hero defeats the villain • A user can not retrieve based on a story
  • 4. Differences of Textual Sources • Multiple sources of information on the same topic Ex.) Texts on the web -Explanations : -Reviews : -Outlines : -Q&A : 3 /19
  • 5. Differences of Textual Sources • Multiple sources of information on the same topic Ex.) Texts on the web -Explanations -Reviews -Outlines -Q&A 4 /19 -The character’s features • contain an overview of the work -significant episodes • From Wikipedia
  • 6. Differences of Textual Sources • Multiple sources of information on the same topic Ex.) Texts on the web -Explanations -Reviews -Outlines -Q&A 5 /19 -What readers liked/disliked • contain impressions and evaluations -feedback • From Amazon
  • 7. Differences of Textual Sources • Multiple sources of information on the same topic Ex.) Texts on the web -Explanations : -Reviews : -Outlines : -Q&A : 6 /19 the textual details vary from each source. Whilst these texts from different information sources represent the same content, Overview of the work Reader’s impressions and evaluations Overview of the work sharing the knowledge
  • 8. Selecting Information Sources is Difficult. Differences in textual sources in comics have received little attention. 7 /19 • We should conduct a study using suitable information sources. Problem • there are few cases the sources are selected with quantitative reasons.
  • 9. Selecting Information Sources Are you selecting information sources based on your experience? We've found a new use for web text, using trendy AI!!! Well, the results are so-so...but is this the correct data for input??? Information sources must be selected by discussing quantitative reasons 8 /19 • We should conduct a study using suitable information sources. Purpose 1. the different attributes : selecting appropriate information resources 2. the common attributes : accessing to a large amount of data <Providing two advantages> by combing different types of sources distinguishing what vocabulary was common/different between 2 textual sources
  • 10. Analysis Method 1. Datasets Construction -Two type of sources:Explanations and Reviews about comic books 2. Construction of Classification Dictionary 3. Classification of Words Semantically - Calculating word frequencies by using the dictionary 9 /19 Analyzing how frequently the words with what meanings appear in each textual sources
  • 11. 1. Datasets Construction : Explanations 10 /19 • Information sources : Internet encyclopedias • Data size : 6,250 points, 2,067 comic characters • Texts describing the comic characters in detail Website Features Wikipedia Famous online encyclopedias Niconico Pedia character dialogue and net slang Pixiv encyclopedia Aniotawiki some description rules
  • 12. 1. Datasets Construction : Reviews 11 /19 • The purpose of this website • Information sources : review website “Sakuhin Database” -Different from shopping website like Amazon -Evaluating works and Collecting information • Texts about popular comics were included. • Data size : 6,250points
  • 13. 1. Datasets Construction : Data Cleaning -only common nouns -stop words based on Slothlib -removed low frequency word 12 /19 • Data cleaning • The total number of word differences -Explanations : 7,136 words -Reviews : 3,092 words • Train data : 10,000 points • Test data : 2,500 points
  • 14. 2. Construction of Classification Dictionary • To analyze “what mean of words exist,” Class An example of words in the class hard battle, comrades in arms, first game black, white, brown, complexion idol, shortcoming, gym, position 13 /19 Word class sets are obtained using word embedding and k-means clustering. -The elbow method shows 63 classes. (激戦) (戦友) (初戦) (⿊) (褐⾊) (⽩) (顔⾊) (アイドル)(コンプレックス)(ジム)(ポジション) The average number of words : 118.8points The resulting class sets of word was a class dictionary to use for content analysis of comics.
  • 15. 3. Classification of Frequently Appearing Words 0 1 62 𝒃𝟎 1 1 0 𝑏" 1 0 0 𝑏#$%% 1 0 1 … Input : test data A B C 14 /19 Classifying frequent words using the class dictionary 𝑡! = [vitality, bravery, male] 𝑡" = [impression, vitality, anime] 𝑡#$%% = [smile, captain, sister] … 0 1 62 vitality bravery vigor smile male female sex affinity sister brother cousin parent 63 classes dictionary output:63-bits vectors One test data 𝒕𝟎 contains the word “male”. 𝒕𝟎 contains an element of class 1. Class 1 of the dictionary also contains the word “male”. We put “1” in the corresponding location.
  • 16. Discussion Points and Evaluation 15 /19 • Calculating the relative difference using the binary array -It was defined as an absolute value of the difference ratio in each source. Ex.) - If both sources have the same ratio... - If the ratio is biased to one side... the relative difference : 0% the relative difference : 100% • Discussion Points -Which classes the frequent words correspond to in each source -What the meanings words are included in the classes with a big difference between the two sources
  • 17. Results : Frequently Appearing Classes in Explanations Class words Relative difference body, body length, familiar 74.2 parent, brother, sister 63.7 16 /19 (⾝⻑) (⾝) (⾝近) (親) (姉) (兄) • In Wikipedia for Tanjiro Kamado, “His body length is 165cm.” • In Pixib encyclopedia for Sabo, “He is Luffy's brother.” There are many words that describe a character's feature and the content of works. Application examples : constructing a vocabulary set that characterizes comics
  • 18. 17 /19 Results : Frequently Appearing Classes in Reviews Class words Relative difference comic, movie, illustration 35.5 work, cartoonist, masterpiece 19.8 (映画) (漫画) (イラスト) (作家) (作品) (傑作) • I love the illustrations this cartoonist draws! • This cartoonist style will have a great influence on future generations. There are many words that represent meta-information about comic works. Application examples : research on genre analysis, topic classification
  • 19. hairstyle, check, plastic model, character, animation, diet, etc. Result : Common Attribute 18 /19 • Most of the test data corresponded to classes containing many foreign words. (チェック) (ヘアスタイル) (プラモ) (アニメ) (キャラ) (ダイエット) 73% of the total -3 kinds of characters in Japanese : “Hiragana”, “Katakana”, “Kanji” -Katakana is used to describe something from foreign countries. • Japanese language • The words in this class are written in Katakana.
  • 20. Result : Common Attribute 19 /19 • The class is not a semantic set. -There are new or unknown words in explanation and reviews. • The reason for this result hairstyle, check, plastic model, character, animation, diet, etc. (チェック) (ヘアスタイル) (プラモ) (アニメ) (キャラ) (ダイエット) • Improvement Plan Reconsidering word embedding models and training corpus
  • 21. Summary • Background : • Problems : • Purpose : • Method : • Conclusion : -Explanations : -Reviews : the same content, but the textual details vary from each source Semantic classification of frequently appearing words describe the content of comics represent meta-information about works Differences of sources have received little attention Thank you for your attention. distinguishing what vocabulary was common/different between 2 textual sources