Natural language processing (NLP) tools are commonly used in many day-to-day applications such as Siri and Google, but the effectiveness of these technologies is not thoroughly understood. I will present joint work with colleagues from the Vrij Universiteit Amsterdam in which we perform a thorough evaluation of four different name recognition tools on 40 popular novels (including A Game of Thrones). I will highlight why literary texts are so difficult for NLP tools as well as solutions for improving their performance.
Why language technology can’t handle Game of Thrones (yet)
1. Why Language Technology Can’t
Handle Game of Thrones (yet)
Marieke van Erp merpeltje
Joint work with:
Niels Dekker & Tobias Kuhn
Image source: https://anibundel.files.wordpress.com/2015/04/jonsnow-leaves-ygritte.jpg
2. This talk
• NLP 101
• Recognising named entities
in fiction
• Digital Humanities @KNAW
HuC
D I G I TA L H U M A N I T I E S L A B
Image source: https://vignette.wikia.nocookie.net/pirates/images/3/3c/
MediterraneanProfile.jpg/revision/latest?cb=20120312215230
6. NLP 101: What is Text Mining?
• Extracting knowledge and information from texts in natural language:
• metadata for a text: author, publisher, time of publication, topic, its language, URL, URLs to and
from a web text
• people mentioned in text, but also companies, organisations, places, dates → links to Wikipedia,
Wikification of text
• Amounts: prices, age, size, distance, weight
• Facts (statements), concepts (terms) and relations between concepts
• Sentiment (positive/negative), opinions
• Emotions, purpose, intention, humour, sarcasm, irony, threats, style (formal, informal), genre (blog,
news, science, tax form)
7. Types of Knowledge to Extract
• Conceptual relations: define possible relations between concepts in an ontology, e.g.
what things have weight, size, age, get born, eat, drink, get an education, work, marry,
do sports, live and die.
• Factual relations: actual instantiations of concepts and relations that are the case in
some world (time and place), Barack Obama was born on Augus 4, 1961, in Honolulu,
Hawaii.
• Factual relations need to fit the ontological model but the ontology does not predict
actual facts only the possible facts!!!
• Opinions: epistemic and modal relations (believe, wish, hope, fear, expect) between
source and target expressed as a private state of the source, e.g. I am a fan of Barack
Obama, I believe Barack Obama will help people.
8.
9.
10. Text Mining pipeline
• Analysis starts at token-level
• Moves up to phrases, sentences and
documents
• Performance goes down as analyses
becomes deeper
• Statistical methods mostly used, but hybrid
methods are a promising research topic
Tokenisation
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Pragmatic Analysis
Input text
Speaker's intended meaning
11. Companies want text mining
• From click logs they can see what people looked at on their site
• To know what they think about it they need to mine reviews, tweets etc: text
mining
• To stay ahead of their competitors, they need to obfuscate their patents, and
find relevant patents from competitors: text mining
• To aid their information departments, they need access to relevant
information: text mining
12. Humanities researchers want text mining
• To evaluate gender bias in large corpora http://
literaryquality.huygens.knaw.nl/
• To trace concepts through time: https://www.esciencecenter.nl/project/
evidence
• Detecting and modelling populist movements on social media https://
www.meertens.knaw.nl/cms/en/research/projects/259-het-dagelijks-leven/
145541-populisme-social-media-en-religie
• Analysis of church registers, letters, ship journals etc…
13. State-of-the-art
• POS tagging: 97%
• Sentiment Analysis: 95% (document level) / 54% (fine-grained sentence level)
• Named Entity Recognition: 90%
• Temporal information extraction: 77%
Note: this holds for English and on standardised datasets
16. Recognising named entities in fiction
Image source: https://wp-media.patheos.com/blogs/sites/1186/2019/04/mauricio-santos-503880-unsplash.jpg
17. Background
• Characters and relations are backbone of
stories
• Computational methods allow for scaling
up network extraction and analysis
• Relies on named entity recognition
• Most work thusfar focuses on 19th and
early 20th century novels
• Research question: how do these tools
perform on modern science fiction/fantasy
novels?
D I G I TA L H U M A N I T I E S L A B Image source: https://newleftreview.org/system/dragonfly/production/2019/03/09/9rcllsj7us_3020501.gif
18. Experimental setup
• Collect 20 ‘old’ and 20 ‘new’ novels
• Annotate first chapters for entities and
relationships between entities (gold
standard)
• Run 4 named entity recognisers on the sets
of ‘old’ and ‘new’ novels
• Compare system outputs to gold standard
annotations
• Bonus: compare network structures
Image source: delpher.nl
D I G I TA L H U M A N I T I E S L A B
Image source: https://cdn-images-1.medium.com/max/2400/1*QbCo9uE7jPbt1ttnMsqOog.jpeg
19. 19th and early 20th century novels, based on The Guardian’s Top 100 Classic novels +
availability through Project Gutenberg + used in earlier studies
21. Data preprocessing
• All books converted to plain text format
• Ensure all texts have the same character
encoding
• Pro tip: check whether there are no
odd or inconsistent quotation marks in
your documents
• Appendices, glossaries and reviews were
removed manually
D I G I TA L H U M A N I T I E S L A B
Image source: https://www.dataentryoutsourced.com/blog/wp-content/uploads/2015/03/
Post-091-640x200.jpg
22. Gold standard annotations
• Chapter lengths varied from 84 to 1,442
sentences
• An average of 300 sentences close to a
chapter boundary was selected
• e.g. the third chapter in Alice in
Wonderland ended after sentence
315, so for that book the first three
chapters were annotated
• 2 annotators (not the authors of the study)
D I G I TA L H U M A N I T I E S L A B
Image source: https://panmacmillan.azureedge.net/pmk11/panmacmillan/files/media/
panmacmillan/blogs/tws/august%202017/alice-in-wonderland-knowledge-quiz-header.png
23. Annotation Instructions
• For each sentence:
• Identify all characters in it
• Identify anaphoric references (e.g. she
refers to Alice)
• To speed up the process, annotators were
provided with a list of characters derived
automatically
• Missing characters could be added to the
list
• Ignore generic pronouns, exclamations,
generic noun phrases, non-human named
characters (Buckbeak)
D I G I TA L H U M A N I T I E S L A B
Image source: https://vignette.wikia.nocookie.net/p__/images/3/35/Erich_Mueller_and_Shannon_McGrath_are_glued_together_back_to_back_with_Tree_Resin.jpeg/revision/
latest?cb=20170331180847&path-prefix=protagonist
24. Named Entity Recognisers:
BookNLP
• NLP pipeline modified to deal with books
• POS tagging, dependency parsing, NER,
character name clustering, quotation
speaker identification, pronominal
coreference resolution, supersense tagging
• NER module based on Stanford NER, with
some modifications
• We focus on NER, character name
clustering and pronominal character
resolution modules in our evaluation
• https://github.com/dbamman/book-nlp
D I G I TA L H U M A N I T I E S L A B
Image source: https://cdn.aarp.net/content/dam/aarp/money/budgeting_savings/2016/04/1140-
yeager-sell-your-used-books.imgcache.rev6feda141288df73e8fd100822bb375ea.jpg
25. Named Entity Recognisers:
Stanford NER
• State-of-the-art CRF NER system
• Trained on CoNLL 2003 data (Reuters
newswire articles from 1996-08-20 to
1997-08-19)
• Cited 2,720 times
• F1 = 86.31 on CoNLL 2003 test set
• https://nlp.stanford.edu/software/CRF-
NER.html
D I G I TA L H U M A N I T I E S L A B
26. Named Entity Recognisers:
Illinois Tagger
• Perceptron-based classifier
• Includes contextual information
• 10,146 downloads
• F1 = 90.57 on CoNLL 2003 test set
• https://cogcomp.org/page/software_view/
NETagger
Image source: delpher.nl
D I G I TA L H U M A N I T I E S L A B
27. Named Entity Recognisers:
IXA-Pipe-NERC
• Perceptron model
• additional background information from
Brown clusters
• F1 = 91.36 on CoNLL 2003 test
• https://github.com/ixa-ehu/ixa-pipe-nerc
D I G I TA L H U M A N I T I E S L A B
28.
29.
30.
31.
32.
33.
34. JosethJoseth
Harys SerHarys Ser
BrackensBrackens
Lord RobbLord Robb
CoholloCohollo
Piper Ser MarqPiper Ser Marq
HullenHullen
Tommen PrinceTommen Prince
Trant Meryn SerTrant Meryn Ser
Hightower Ser GeroldHightower Ser Gerold
Lord VanceLord VanceDareonDareon
Arya HorsefaceArya Horseface
Lord HornwoodLord Hornwood
Robert BaratheonRobert BaratheonCotter PykeCotter Pyke
Caron Lord BryceCaron Lord Bryce
EliaElia
Stark SansaStark Sansa
Mott MasterMott Master
AggoAggo
Rodrik Cassel SerRodrik Cassel Ser ThorosThoros
LyannaLyanna
Ser DonnelSer Donnel
NymeriaNymeria
SherrerSherrer
Tarly SamTarly Sam
JhiquiJhiqui
Alyssa ArrynAlyssa Arryn
JyckJyck
YorenYoren
Frey LadyFrey Lady
Rayder ManceRayder Mance
PypPyp
Manderly Ser WylisManderly Ser Wylis
ChellaChella
JhogoJhogo
ChiggenChiggen
Dontos SerDontos Ser
Bronze Yohn RoyceBronze Yohn Royce
ChettChett
VisenyaVisenya
Cassel JoryCassel Jory
GrennGrenn
Lord SlyntLord Slynt
Hal MollenHal Mollen
Ned StarkNed Stark
Stark BrandonStark Brandon
MikkenMikken
Greyjoy BalonGreyjoy Balon
MorrecMorrec
TomardTomard
DanwellDanwell
Mya StoneMya Stone
HeartsbaneHeartsbane
Jaremy Ser RykkerJaremy Ser Rykker
Egen Ser VardisEgen Ser Vardis
GodwynGodwyn
Castle BlackCastle Black
Lord Dondarrion BericLord Dondarrion Beric
Brynden BlackfishBrynden Blackfish
Maester LuwinMaester Luwin
Maester AemonMaester Aemon
CravenCraven
MordMord
MattMatt
Clegane SandorClegane Sandor
ShaeShae
HarrenhalHarrenhal
Lord Nestor RoyceLord Nestor Royce
PentoshiPentoshi
ToadToad
PortherPorther
Lord lord TyrionLord lord Tyrion
MagoMago
Vargo HoatVargo Hoat
RickonRickon
EroehEroeh
Lord ArrynLord Arryn
QuaroQuaro
Lord PiperLord Piper
Lysa Lady ArrynLysa Lady Arryn
BraavosiBraavosi
MattharMatthar
Bracken Jonos LordBracken Jonos Lord
Lord StewardLord Steward
Manderly Ser WendelManderly Ser Wendel
TregarTregar
TimettTimett
Santagar Ser AronSantagar Ser Aron
Barristan Selmy SerBarristan Selmy Ser
Payne Ser IlynPayne Ser Ilyn
Boy MoonBoy Moon
Perwyn SerPerwyn Ser
Lord Mallister JasonLord Mallister Jason
Samwell TarlySamwell Tarly
Poole VayonPoole Vayon
JoffteyJofftey
BethBeth
GaredGared
MoreoMoreo
Whent Oswell SerWhent Oswell Ser
Forel SyrioForel Syrio
DanyDany
KurleketKurleket
GreatjonGreatjon
Lannister TyrionLannister Tyrion
Ser Moore MandonSer Moore Mandon
Lord WymanLord Wyman
HardinHardin
DorneDorne
Lord JonLord Jon
Stannis Baratheon LordStannis Baratheon Lord
JerenJeren
UlfUlf
Fat TomFat Tom
Jaime Ser LannisterJaime Ser Lannister
Ogo KhalOgo Khal
Moat CailinMoat Cailin
Cassel MartynCassel Martyn
Alliser Ser ThorneAlliser Ser Thorne
FarlenFarlen
Lord RobertLord Robert
LysLys
Lord RowanLord Rowan
Jeyne PooleJeyne Poole
TyroshiTyroshi
ConnConn
MaegorMaegor
HaggoHaggo
ValeVale
Edmure Ser TullyEdmure Ser Tully
HighgardenHighgarden
GageGage
Hill HornHill Horn
CorattCoratt
Heddle MashaHeddle Masha
Maege MormontMaege Mormont
Lady Catelyn StarkLady Catelyn Stark
CaynCayn
Ben StarkBen Stark
MarillionMarillion
Lady MormontLady Mormont
KingKing
Robert ArrynRobert Arryn
GendryGendry
Xho JalabharXho Jalabhar
KhaleesiKhaleesi
Lord Baratheon RenlyLord Baratheon Renly
AlynAlyn
Lord Baelish PetyrLord Baelish Petyr
Lady SansaLady Sansa
Mirri Maz DuurMirri Maz Duur
Lord Frey WalderLord Frey Walder
FatherFather
Ser Addam MarbrandSer Addam Marbrand
Hugh SerHugh Ser
Old NanOld Nan
LharysLharys
JacksJacks
Rhaegar TargaryenRhaegar Targaryen
Joffrey PrinceJoffrey Prince
Boros Ser BlountBoros Ser Blount
Vance KarylVance Karyl
JoffJoff
Arthur Dayne SerArthur Dayne Ser
Mordane SeptaMordane Septa
Ser Tallhart HelmanSer Tallhart Helman
Lord Tytos BlackwoodLord Tytos Blackwood
Tywin Lord LannisterTywin Lord Lannister
Yi TiYi Ti
Jen BenJen Ben
HalderHalder
ShaggaShagga
Arryn JonArryn Jon
DolfDolf
BaelorBaelor
GunthorGunthor
Tyrell Ser LorasTyrell Ser Loras
Lannister Ser KevanLannister Ser Kevan
Stevron Frey SerStevron Frey Ser
Tanda LadyTanda Lady
Raymun Darry SerRaymun Darry Ser
ShaggydogShaggydog
Lord Tully HosterLord Tully Hoster
Arys SerArys Ser
Flowers JaferFlowers Jafer
Willis Ser WodeWillis Ser Wode
DawnDawn
HewardHeward
Willem DarryWillem Darry
FogoFogo
MalleonMalleon
WillWill
Rhaggat KhalRhaggat Khal
MycahMycah
JaggotJaggot
Flement Brax SerFlement Brax Ser
UmarUmar
Robar SerRobar Ser
NaerysNaerys
CheykCheyk
Tobho MottTobho Mott
Benjen StarkBenjen Stark
MohorMohor
LittlefingerLittlefinger
Lord TyrellLord Tyrell
Brynden Ser TullyBrynden Ser Tully
HaliHali
MyrcellaMyrcella
StivStiv
Othell YarwyckOthell Yarwyck
Greyjoy TheonGreyjoy Theon
IrriIrri
Maester PycelleMaester Pycelle
Grey WindGrey Wind
Quorin HalfhandQuorin Halfhand
JaehaerysJaehaerys
Lord CerwynLord Cerwyn
ClydasClydas
RakharoRakharo
DywenDywen
Magister IllyrioMagister Illyrio
TorrhenTorrhen
Aegon TargaryenAegon Targaryen
Bowen MarshBowen Marsh
Daryn HornwoodDaryn Hornwood
RiverrunRiverrun
Clegane Gregor SerClegane Gregor Ser
Snow JonSnow Jon
RastRast
Aerys TargaryenAerys Targaryen
Drogo KhalDrogo Khal
Viserys TargaryenViserys Targaryen
QothoQotho
Whent LadyWhent Lady
Hobb Three-FingerHobb Three-Finger
DothrakiDothraki
Royce Ser AndarRoyce Ser Andar
Karyl SerKaryl Ser
HakeHake
LanceLance
HosteenHosteen
Mace TyrellMace Tyrell
Lord HunterLord Hunter
Hallis MollenHallis Mollen
Dothrak VaesDothrak Vaes
Daeren TargaryenDaeren Targaryen
Lord LeffordLord Lefford
VolantisVolantis
Glover GalbartGlover Galbart
RhaegoRhaego
Bolton RooseBolton Roose
Catelyn TullyCatelyn Tully
Lannister CerseiLannister Cersei
JossJoss
Waymar Ser RoyceWaymar Ser Royce
Lothor BruneLothor Brune
Lord Tarly RandyllLord Tarly Randyll
Derik LordDerik Lord
Jared Frey SerJared Frey Ser
TyroshTyrosh
Ser Swann BalonSer Swann Balon
Lord VarysLord Varys
BranBran
Harrion KarstarkHarrion Karstark
JhaqoJhaqo
DoreahDoreah
HaiderHaider
bushbush
Janos SlyntJanos Slynt
Brothers MoonBrothers Moon
Arya StarkArya Stark
Daenerys TargaryenDaenerys Targaryen
Corbray Lyn SerCorbray Lyn Ser
HodorHodor
Robett GloverRobett Glover
HarwinHarwin
Lord Karstark RickardLord Karstark Rickard
BronnBronn
Hobber SerHobber Ser
Khal JommoKhal Jommo
Horas SerHoras Ser
Lord MormontLord Mormont
DesmondDesmond
StarksStarks
Robb StarkRobb Stark
Lord Hand lordLord Hand lord
AlbettAlbett
Noye DonalNoye Donal
Jorah Ser MormontJorah Ser Mormont
35. CoholloCohollo
EliaElia
AggoAggo
JhiquiJhiqui
ChellaChella
JhogoJhogo
ShaeShae
PentoshiPentoshi
MagoMago
Vargo HoatVargo Hoat
EroehEroeh
QuaroQuaro
rdrd
TimettTimett
DanyDany
annister Tyrionannister Tyrion
DorneDorne
UlfUlf
Ogo KhalOgo Khal
LysLys
ConnConn
HaggoHaggo
HighgardenHighgarden
KingKing
KhaleesiKhaleesi
Mirri Maz DuurMirri Maz Duur
Rhaegar TargaryenRhaegar Targaryen
Vance KarylVance Karyl
Yi TiYi Ti
ShaggaShagga
DolfDolf
GunthorGunthor
Lannister Ser KevanLannister Ser Kevan
Raymun Darry SerRaymun Darry Ser
FogoFogo
Rhaggat KhalRhaggat Khal
Flement Brax SerFlement Brax Ser
UmarUmar
NaerysNaerys
CheykCheyk
Lord TyrellLord Tyrell
IrriIrri
RakharoRakharo
Magister IllyrioMagister Illyrio
Aegon TargaryenAegon Targaryen
Drogo KhalDrogo Khal
Viserys TargaryenViserys Targaryen
QothoQotho
DothrakiDothraki
Karyl SerKaryl Ser
Dothrak VaesDothrak Vaes
Daeren TargaryenDaeren Targaryen
Lord LeffordLord Lefford
RhaegoRhaego
Lannister CerseiLannister Cersei
JossJoss
Derik LordDerik Lord
TyroshTyrosh
JhaqoJhaqo
DoreahDoreah
MoonMoon
Daenerys TargaryenDaenerys Targaryen
onnonn
Khal JommoKhal Jommo
Lord MormontLord Mormont
Robb StarkRobb Stark
Jorah Ser MormontJorah Ser Mormont
39. Discussion
• No difference between ‘old’ and ‘new’
books
• Within categories, great variety in entity
distributions and results
• If a central entity is missed, the
performance suffers greatly (e.g.
Brave New World)
• Coreference resolution particularly difficult
in this domain
D I G I TA L H U M A N I T I E S L A B
Image source: https://www.nuffoodsspectrum.in/uploads/articles/quarterly_results_bg-4192.jpg
40. Why is fiction hard for NLP?
• Fiction writers don’t have to abide by
conventions: they can use language more
creatively than newspaper journalists
• mix languages
• make up languages
• use nicknames
• Narratives written from first-person
perspective confuse the software
D I G I TA L H U M A N I T I E S L A B
Image source: https://steamuserimages-a.akamaihd.net/ugc/859477733475369907/F34770D6EFEC30A70A84BEFE93C2C522C0B4A902/
41. ChalaisChalais
M. BonacieuxM. Bonacieux
de M. Busignyde M. Busigny
Houdiniere LaHoudiniere La
John FeltonJohn Felton
Bois-Tracy de Ma...Bois-Tracy de Ma...
de M. Schombergde M. Schomberg
LubinLubin
Porthos MonsieurPorthos Monsieur
la Harpe de Ruela Harpe de Rue
RochellaisRochellais
Richelieu deRichelieu de
de Busigny Monsi...de Busigny Monsi...
Milady ClarikMilady Clarik
RochefortRochefort
Grimaud MonsieurGrimaud Monsieur M. CoquenardM. Coquenard
de Treville Mons...de Treville Mons...
Mr. FeltonMr. Felton
MontagueMontague
dâArtagnan Mon...dâArtagnan Mon...
Buckingham de Mo...Buckingham de Mo...
de Monsieur Voit...de Monsieur Voit...
Monsieur Bernajo...Monsieur Bernajo...
III HenryIII Henry
Monsieur Dessess...Monsieur Dessess...
de Chevreuse Mad...de Chevreuse Mad...
Donna EstafaniaDonna Estafania
Lord DukeLord Duke
Quixote DonQuixote Don
Lorme de MarionLorme de Marion
de Cahusac Monsi...de Cahusac Monsi...
BazinBazin
Chevalier Monsie...Chevalier Monsie...
MusketeerMusketeer
Constance Bonaci...Constance Bonaci...
M. DessessartM. Dessessart
GermainGermain
de M. Cavoisde M. Cavois
JudithJudith
GasconGascon
MousquetonMousqueton
Monsieur AthosMonsieur Athos
Duke MonsieurDuke Monsieur
Charlotte BacksonCharlotte Backson
BethuneBethune
Planchet MonsieurPlanchet Monsieur
Louis XIIILouis XIII
Bonacieux MadameBonacieux Madame
de Benserade Mon...de Benserade Mon...
GervaisGervais
MeungMeung
Chesnaye LaChesnaye La
Bonacieux Monsie...Bonacieux Monsie...
ChrysostomChrysostom
Wardes de De M.Wardes de De M.
Coquenard Monsie...Coquenard Monsie...
PatrickPatrick
BerryBerry
MandeMande
Laporte M.Laporte M.
de M. Laffemasde M. Laffemas
Laporte MonsieurLaporte Monsieur
Louis XIVLouis XIV
AnneAnne
de M. Tremouille...de M. Tremouille...
NormanNorman
de M. Bassompier...de M. Bassompier...
IV HenryIV Henry
Villiers GeorgeVilliers George
BearnaisBearnais
I CharlesI Charles
PierrePierre
monsieur Aramis ...monsieur Aramis ...
JussacJussac
DenisDenis
GasconsGascons
Coquenard MadameCoquenard Madame
CrevecoeurCrevecoeur
PicardPicard
pope Popepope Pope
de M. Trevillede M. Treville
de Marie Medicisde Marie Medicis
LorraineLorraine
#N/A#N/A
Cardinal MonsieurCardinal Monsieur
FourreauFourreau
BicaratBicarat
Marie Michon MAR...Marie Michon MAR...
Lord de WinterLord de Winter
Milady de De Win...Milady de De Win...
M. dâArtagnanM. dâArtagnan
DukeDuke
Messieurs PorthosMessieurs Porthos
KittyKitty
The Three Musketeers: F1 32 - 48
42. ChalaisChalais
M. BonacieuxM. Bonacieux
de M. Busignyde M. Busigny
Houdiniere LaHoudiniere La
John FeltonJohn Felton
Bois-Tracy de Ma...Bois-Tracy de Ma...
de M. Schombergde M. Schomberg
LubinLubin
Porthos MonsieurPorthos Monsieur
la Harpe de Ruela Harpe de Rue
RochellaisRochellais
de Marie Medicisde Marie Medicis
de Busigny Monsi...de Busigny Monsi...
Milady ClarikMilady Clarik
RochefortRochefort
Grimaud MonsieurGrimaud Monsieur
M. CoquenardM. Coquenard
de Treville Mons...de Treville Mons...
Commissary Monsi...Commissary Monsi...
Mr. FeltonMr. Felton
MontagueMontague
Buckingham de Mo...Buckingham de Mo...
de Monsieur Voit...de Monsieur Voit...
M. DartagnanM. Dartagnan
Monsieur Bernajo...Monsieur Bernajo...
III HenryIII Henry
Monsieur Dessess...Monsieur Dessess...
de Chevreuse Mad...de Chevreuse Mad...
Donna EstafaniaDonna Estafania
Lord DukeLord Duke
Quixote DonQuixote Don
Lorme de MarionLorme de Marion
de Cahusac Monsi...de Cahusac Monsi...
BazinBazin
Chevalier Monsie...Chevalier Monsie...
MusketeerMusketeer
M. DessessartM. Dessessart
GermainGermain
de M. Cavoisde M. Cavois
JudithJudith
Monsieur Dartagn...Monsieur Dartagn...
GasconGascon
MousquetonMousqueton
Monsieur AthosMonsieur Athos
Duke MonsieurDuke Monsieur
Charlotte BacksonCharlotte Backson
BethuneBethune
Planchet MonsieurPlanchet Monsieur
Louis XIIILouis XIII
Milady de WinterMilady de Winter
Bonacieux MadameBonacieux Madame
de Benserade Mon...de Benserade Mon...
GervaisGervais
MeungMeung
Chesnaye LaChesnaye La
Bonacieux Monsie...Bonacieux Monsie...
ChrysostomChrysostom
Wardes de De M.Wardes de De M.
Coquenard Monsie...Coquenard Monsie...
PatrickPatrick
Lord de De WinterLord de De Winter
BerryBerry
MandeMande
Laporte M.Laporte M.
Richelieu deRichelieu de
GodeauGodeau
Laporte MonsieurLaporte Monsieur
Louis XIVLouis XIV
AnneAnne
de M. Tremouille...de M. Tremouille...
NormanNorman
de M. Bassompier...de M. Bassompier...
IV HenryIV Henry
Villiers GeorgeVilliers George
de M. Laffemasde M. Laffemas
BearnaisBearnais
PierrePierre
monsieur Aramis ...monsieur Aramis ...
JussacJussac
DenisDenis
GasconsGascons
CrevecoeurCrevecoeur
PicardPicard
pope Popepope Pope
de M. Trevillede M. Treville
de Monsieur Cavo...de Monsieur Cavo...
LorraineLorraine
Dangouleme DucDangouleme Duc
#N/A#N/A
Cardinal MonsieurCardinal Monsieur
FourreauFourreau
BicaratBicarat
Marie Michon MAR...Marie Michon MAR...
I CharlesI CharlesDukeDuke
VilleroyVilleroy
Messieurs PorthosMessieurs Porthos
KittyKitty
Bonacieux Consta...Bonacieux Consta...
The Three Musketeers after rewriting d’Artagnan to Dartagnan
44. Performance fixes
• Replace word names with generic names
• Remove apostrophes from names
• But:
• Requires manual intervention
• Doesn’t scale
D I G I TA L H U M A N I T I E S L A B
45. Where to go from here?
• Robuster NLP tools are necessary to better
understand novels (and other non-newspaper
texts)
• Background knowledge can help (e.g. GoT
Wiki lists all Danaerys’ nicknames)
• But: not all books are that popular
• Also: different names are used in different
contexts, you may not want to collapse them!
• Always: don’t just assume it works, look into
your data!
• Full paper at: http://peerj.com/articles/cs-189
D I G I TA L H U M A N I T I E S L A B Image source: https://news.images.itv.com/image/file/1232718/stream_img.jpg
49. Cultural Artificial Intelligence
Making AI culturally aware
Appreciate the user
Being contextually appropriate
Understand the issues
What do you get when you invert
“Digital Humanities”?
Slide by Antal van den Bosch
50. Applications of Cultural AI: Filters and flags
• Toxicity
• Protective filters (like spam filters and
ad blockers)
• Gender
• Linguistic filters and helpers
• Fake news
• Meme detectors, explanations
Slide by Antal van den Bosch
51. Theory of Cultural AI: Understanding & nuance
• Understanding concepts
• Changes over time
• Perspectives
• Evolution
• Knowing the origins of digital
stories
• Understanding viral potential
• Language is “social and
cultural data” (Nguyen, 2017)
Slide by Antal van den Bosch
52. Some DHLab projects
• Food culture via newspaper recipes
(Meertens and IISH)
• Analysing online debates: refugee vs
migrant (with EUR)
• Amsterdam Time Machine (with many
partners)
• Tracing 18th century career trajectories
(with HuC-DI & Huygens Institute)
• Analysing the concept ‘violence’ through
time (with NLeSc, OU & NIOD)
D I G I TA L H U M A N I T I E S L A B
53. Debates on the refugee crisis
• From 2015 on, wider use of both
‘European refugee crisis’ and ‘European
migrant crisis’ in the news and social
media
• “Framing labels” (Knoll, Redlawsk, &
Sanborn, 2011) imply two different frames:
• ‘Refugee’ – people fleeing conflict or
persecution
• ‘Migrant’ – improving economic situation
• Mixed usage and mislabeling have
implications for refugees, e.g., negative
influence on perceptions of host countries
D I G I TA L H U M A N I T I E S L A B
54.
55. DHLab@HuC:
Advancing the humanities through digital methods
• DHLabHuC / adinanerghes / melvinwevers
/ merpeltje
• https://dhlab.nl (under construction)
Melvin WeversAdina NerghesMarieke van Erp