This document discusses various ways in which language changes over time. It provides examples of phonological changes like sound shifts as well as grammaticalization where words change categories. Semantic changes are also covered, including narrowing and widening of meaning, metaphor, metonymy, and other types of semantic shifts. Social and cultural factors that influence language change are examined. Two main types of language change are identified: cultural shift driven by conceptual changes in society and linguistic drift involving regular sound changes. The document analyzes the phenomenon of "concept creep" where concepts relating to harm broaden and weaken over time through processes like vertical and horizontal expansion. It uses corpus methods and distributional semantic models to study how concepts like addiction,
Unleash Your Potential - Namagunga Girls Coding Club
Diachronic Analysis of Language Change and Concept Creep
1. Diachronic Analysis of Language Change
Ekaterina Vylomova
Vylomova, Ekaterina Diachronic Analysis of Language Change 1 / 63
2. Language changes in many ways!
Phonologically: The First Germanic Sound Shift
bh > b > p > Φ
dh > d > t > Θ
gh > g > k > x
Vylomova, Ekaterina Diachronic Analysis of Language Change 2 / 63
3. Language changes in many ways!
Grammaticalization: Synchronic variation and diachronic change
Old English “willan” (to want) → Modern English auxiliary “will”
Vylomova, Ekaterina Diachronic Analysis of Language Change 3 / 63
4. Language changes in many ways!
Language, 1933: Semantically:
Narrowing as in Old English “mete” (food) > "meat" (edible flesh)
Widening as in Middle English "bridde" (young birdling) > "bird"
Metaphor: Germanic "biting" > "bitter" (harsh of taste)
Metonymy when a thing is referred by the name of something closely associated with it:
Old French "joue" (cheek) > "jaw"
Vylomova, Ekaterina Diachronic Analysis of Language Change 4 / 63
5. Language changes in many ways!
Language, 1933: Semantically:
Synecdoche when the meaning are related as part and whole: Proto-Germanic "*t¯uną"
(fence) > "town"
Hyperbole, i.e. from weaker to stronger meaning: Pre-French "*extonare" (to strike with
thunder) > "astonish"
Litotes, i.e. from stronger to weaker meaning: pre-English "*kwalljan" (to torment) > Old
English "cwellan" (to kill)
Degeneration: Old English "cnafa" (boy, servant) > "knave"
Elevation: Old English "cnight"(boy, servant) > "knight"
Vylomova, Ekaterina Diachronic Analysis of Language Change 5 / 63
6. What causes language to change?
Social evolution and language change:
Interpersonal communication → the efficiency of communication serves as a driving force in
such processes
Sociolinguistic research suggests that much of language change and variation can be attributed
to social structure
Kutuzov (2018) proposes that causes of the processes leading to semantic shifts vary and
comprise linguistic, social, sociolinguistic, cultural, and psychological
Vylomova, Ekaterina Diachronic Analysis of Language Change 6 / 63
7. Two types of Change
Cultural shift and Linguistic Drift
Vylomova, Ekaterina Diachronic Analysis of Language Change 7 / 63
8. Two types of Change
Cultural shift and Linguistic Drift
Vylomova, Ekaterina Diachronic Analysis of Language Change 8 / 63
Slow and regular changes Only affect the word’s nearest neighbors!
9. Cultural Shifts
Culturomics (Mitchel et al., 2011) – the study of cultural and historical changes using
large corpora
Vylomova, Ekaterina Diachronic Analysis of Language Change 9 / 63
10. Concept Creep (Haslam, 2016)
Harm-related concepts become broader and milder over time
Vylomova, Ekaterina Diachronic Analysis of Language Change 10 / 63
11. Concept Creep (Haslam, 2016)
Vertical extension: a concept’s meaning becomes less stringent, extending to quantitatively
milder variants of the phenomenon
Horizontal expansion: a concept extends to a qualitatively new class of phenomena
Vylomova, Ekaterina Diachronic Analysis of Language Change 11 / 63
12. Concept Creep. Pinker, 2011: Rate of deaths in genocides, 1900-2008
Vylomova, Ekaterina Diachronic Analysis of Language Change 12 / 63
13. Concept Creep. Pinker, 2011: Uxoricide & Mariticide, 1976-2005
Vylomova, Ekaterina Diachronic Analysis of Language Change 13 / 63
14. Concept Creep. Pinker, 2011: Child Abuse, 1990-2006
Vylomova, Ekaterina Diachronic Analysis of Language Change 14 / 63
15. Concept Creep. Addiction
Physiological dependency on an ingested substance → psychological compulsion to engage in
non-ingestive behaviors such as gambling or shopping
Vylomova, Ekaterina Diachronic Analysis of Language Change 15 / 63
16. Concept Creep. Bullying
Peer aggression between children that was repeated, intentional, and perpetrated→ adult
workplace, relaxing the repetition, intentionality, and power imbalance criteria
Vylomova, Ekaterina Diachronic Analysis of Language Change 16 / 63
17. Concept Creep. Harassment
Inappropriate sexual approaches → nonsexual forms of unwanted attention
Vylomova, Ekaterina Diachronic Analysis of Language Change 17 / 63
18. Concept Creep. Prejudice
Overt animosity towards ethnic or racial outgroups → non-racial groups, allowing for covert or
non-conscious prejudice
Vylomova, Ekaterina Diachronic Analysis of Language Change 18 / 63
19. Concept Creep. Trauma
Life-threatening events that are outside the realm of normal experience → vicarious or indirect
experiences of stressful events, including those that are relatively prevalent
Vylomova, Ekaterina Diachronic Analysis of Language Change 19 / 63
20. Concept Creep. Corpus of Psychology Journals (Vylomova et al., 2019)
Timeline: 1930 – 2017
PubMed
E-Research
871, 340 abstracts from 875 journals resulting in 133, 082, 240 tokens
0
20000
40000
1950 1975 2000
Vylomova, Ekaterina Diachronic Analysis of Language Change 20 / 63
21. Concept Creep. Concept Frequency Analysis
Unigram frequency distribution over time: a “moving average” smoothing with window size of
1, i.e. f1972 = (f1971 + f1972 + f1973)/3.
0.00
0.02
0.04
1970 1980 1990 2000 2010 2020
harassment trauma addiction bullying prejudice
Vylomova, Ekaterina Diachronic Analysis of Language Change 21 / 63
22. Concept Creep. Vector-Space Methods
LSA-based
Neural
Vylomova, Ekaterina Diachronic Analysis of Language Change 22 / 63
23. Concept Creep. Vector-Space Methods
LSA-based
Neural
Vylomova, Ekaterina Diachronic Analysis of Language Change 23 / 63
Also: recall research on various types of bias in that WEs: they may reflect many stereotypes
24. Concept Creep. LSA-based (Sagi et al., 2009)
Step 1 (embeddings over all periods):
40,000 most frequent terms
TF-IDF matrix with logarithmic smoothing
factorize by SVD to 200 dimensions
Step 2 (diachronic embeddings (1980-2017)):
sample 50 sentential occurrences for each period T
extract contextual words within window size = 7
Average embeddings (BoW)
Repeat 10 times
Vylomova, Ekaterina Diachronic Analysis of Language Change 24 / 63
25. Concept Creep. Semantic Breadth as Cosine Similarity
0.10
0.15
0.20
198x 199x 200x 201x
harassment trauma addiction bullying prejudice
Average cosine similarities over five decades.
Vylomova, Ekaterina Diachronic Analysis of Language Change 25 / 63
26. Concept Creep. Diachronic Embeddings
Train separate epoch-specific models and align
Train word2vec models for each decade
Project embeddings into shared space (e.g., using Procrustes)
Vylomova, Ekaterina Diachronic Analysis of Language Change 26 / 63
27. Concept Creep. Diachronic Embeddings
Sequentially train for each epoch
Train embeddings for epoch t
Initialize embeddigns for epoch t + 1 with embeddings for epoch t and train
Vylomova, Ekaterina Diachronic Analysis of Language Change 27 / 63
28. Concept Creep. Diachronic Embeddings
Pre-train globally and then train pre epoch
Pre-train on the whole corpus
Initialize epoch-specific embeddings with global pre-trained embeddings and train for each
epoch
Vylomova, Ekaterina Diachronic Analysis of Language Change 28 / 63
29. Concept Creep. Used in the paper: Diachronic embeddings from Hamilton et
al.(2016)
train word2vec CBoW for each decade:
J =
1
T
T
i=1
log
exp wi
j∈[−c,+c],j=0
˜wi+j
V
k=1 exp wk
j∈[−c,+c],j=0
˜wi+j
align embeddings using orthogonal Procrustes: argminθT θ=I θW t − W t+1
F
Vylomova, Ekaterina Diachronic Analysis of Language Change 29 / 63
36. Concept Creep. Conclusions
Since the 1990s Addiction, Bullying, Harassment have broadened, as the theory of concept
creep would suggest, but the breadth of Trauma has been relatively static and Prejudice
has somewhat narrowed
The analysis of pairwise similarities demonstrated changing patterns of co-occurrence for
each concept that clarified how its meanings have shifted and expanded over four decades
Some concepts have acquired entirely new associations (e.g., cyber-harassment), some
have added new semantic domains (e.g., Addiction incorporating non-ingestive behaviors
such as gaming and smartphone use), and others have shifted emphasis (e.g.,Trauma
becoming associated less with physical injury and more with psychological stress)
Vylomova, Ekaterina Diachronic Analysis of Language Change 36 / 63
37. Dehumanization
Dehumanization: the act of perceiving or treating people as less
than human
negative evaluations of a target group
denial of agency
moral disgust
likening members of a target group to non-human entities (such as vermin;via metaphors)
Vylomova, Ekaterina Diachronic Analysis of Language Change 37 / 63
38. Dehumanization
A computational linguistic framework for analyzing dehumanizing
language,with a focus on lexical signals of dehumanization
negative evaluations of a target group
denial of agency
moral disgust
likening members of a target group to non-human entities (such as vermin;via metaphors)
Vylomova, Ekaterina Diachronic Analysis of Language Change 38 / 63
40. Dehumanization
Data
“New York Times”: from 1986 – 2015, collected by Fast and Horvitz (2016)
Extracted paragraphs containing any word from the predetermined list (+American(s) as a
control variable) → LGBTQ corpus
The LGBTQ corpus consists of 93,977 paragraphs and 7.36 million tokens
Vylomova, Ekaterina Diachronic Analysis of Language Change 40 / 63
41. Dehumanization
Word Embeddings
Contain biases but might be useful to study social stereotypes (Garg et al., 2018)
Used Word2vec skip-gram (100 dimensions): 1) Train on the whole corpus; 2) use the
resulting vectors to initialize word2vec models for each year of data; 3) zero-center and
normalize all embeddings to alleviate the hubness problem (Dinu et al., 2015)
Vylomova, Ekaterina Diachronic Analysis of Language Change 41 / 63
44. Dehumanization
Negative evaluations of a target group: valence
Valence: negative/unpleasant to positive/pleasant
Data: the NRC VAD lexicon (valence: 0(neg.;toxic, nightmare) – 1(pos.; love, happy)) for
20,000 English words
Paragraph-level scores: the average valence score over all words (from NRC VAD) in the
paragraph
Biases in NRC VAD: transsexual(0.264),homosexual(0.333), esbian(0.385),gay(0.388) vs.
heterosexual(0.561),person(0.646),human(0.767),man(0.688), woman(0.865) (these terms
were excluded)
Vylomova, Ekaterina Diachronic Analysis of Language Change 44 / 63
46. Dehumanization
Negative evaluations of a target group: valence
Vylomova, Ekaterina Diachronic Analysis of Language Change 46 / 63
Homosexual more negative than gay
LGBTQ groups have become increasingly positively evaluated
47. Dehumanization
Negative evaluations of a target group: connotation frames
Understand the sentiment directed towards the target group: “A violently attacked B".
Assign subject “A”, the attacker, as -0.6 (strongly negative) and the object “B” as 0.23
(weakly positive).
Extract SVO tuples (using dependency parser); apply connotation frames
Data: Connotation Frames from (Rashkin et al., 2016); 900 English verbs
Vylomova, Ekaterina Diachronic Analysis of Language Change 47 / 63
49. Dehumanization
Negative evaluations of a target group: word embeddings
Average valence scores of the 1000 nearest neighbors to the vector representations of
gay,homosexual, all LGBTQ terms, and American for each year
Vylomova, Ekaterina Diachronic Analysis of Language Change 49 / 63
51. Dehumanization
Denial of agency
Denial of agency refers to the lack of attributing a target group member with the ability to
control their own actions or decisions (Tipler and Ruscher, 2014).
Connotation frames of agency
Word embedding neighbor dominance
Vylomova, Ekaterina Diachronic Analysis of Language Change 51 / 63
52. Dehumanization
Denial of agency: connotation frames
Let’s quantify the amount of agency attributed to a target group
"X searched for Y"and "X waited for Y": the verb "searched" gives X high agency and
"waited" gives X low agency (binary)
Data: Connotation Frames for agency (Sap et al., 2017); 2000 transitive and intransitive
verbs
Extract head verbs and their corr. subject NP’s containing any LGBTQ terms
Vylomova, Ekaterina Diachronic Analysis of Language Change 52 / 63
54. Dehumanization
Denial of agency: connotation frames
Vylomova, Ekaterina Diachronic Analysis of Language Change 54 / 63
LGBTQ groups experience greater denial of agency than "American"
Denial of agency increased over time for all LGBTQ groups
55. Dehumanization
Denial of agency: dominance
Quantify the amount of dominance: dominance lexicon primarily captures power, which is
distinct from but closely related to agency
Data: NRC VAD lexicon’s dominance annotations (Mohammad et al., 2018), 0–1, 20,000
English words
Highest dominance words: powerful,leadership,success, and govern; Lowest dominance
words: weak,frail,empty, and penniless.
Calculate the average dominance score of the 1000 nearest neighbors to each group label
vector representation
Vylomova, Ekaterina Diachronic Analysis of Language Change 55 / 63
57. Dehumanization
Moral disgust
Vector similarity to “disgust”
Moral Foundations theory, which postulates that there are five dimensions of moral
intuitions: care, fairness/proportionality, loyalty/in-group, authority/respect, and
sanctity/purity(Haidt and Graham, 2007).
Moral Disgust: the negative end of the sanctity/purity (dictionary from Graham et al.,
2009: disgust*,sin,pervert)
“Disgust”:average of the term from the “Moral Disgust” dictionary, weighted by word
frequency
calculate the cosine similarity between the group label’s vector and the Moral Disgust
concept vector
Vylomova, Ekaterina Diachronic Analysis of Language Change 57 / 63
59. Dehumanization
Moral Disgust
Vylomova, Ekaterina Diachronic Analysis of Language Change 59 / 63
All LGBTQ group labels are more closely associated with Moral Disgust than "American"
These associations weaken over time which suggests increased humanization
60. Dehumanization
Vermin as a Dehumanizing Metaphor
"Vermin" concept: the average of the following vectors, weighted by
frequency:vermin,rodent(s),rat(s) mice,cockroaches,termite(s),bedbug(s),fleas
Calculate cosine similarity between each social group label and the Vermin concept vector
Vylomova, Ekaterina Diachronic Analysis of Language Change 60 / 63
62. Dehumanization
Conclusions
Increasingly humanizing descriptions of LGBTQ people over time & LGBTQ people have
become more associated with positive emotional language
The labels “gay” and “homosexual” exhibit strikingly different patterns
Future: contextualized word embeddings
Vylomova, Ekaterina Diachronic Analysis of Language Change 62 / 63
63. Diachronic Language Change: Further reading
Some nice overview papers
Diachronic word embeddings and semantic shifts: a survey
Survey of Computational Approaches to Lexical Semantic Change
Vylomova, Ekaterina Diachronic Analysis of Language Change 63 / 63