Jahna Otterbacher -Describing him, describing her: Linguistic biases in crowdsourced metadata for images of people (Talk at 2nd Annual KNOWeSCAPE Scientific Meeting, http://knowescape.org/knowescape2014-2/)
4. Linguistic biases in metadata?
• The manner in which we use language plays a key role in the
transmission of social stereotypes [Maass et al., 1989; Rubin et al., 2013]
• Linguistic Bias:
Systematic asymmetry in the way that one uses language, as a
function of the social group of the person(s) being described
[Beukeboom, 2013]
• RQ: Do we observe linguistic biases in labels of images, with respect
to gender?
– Use of adjectives [Fiedler & Semin, 1988]
– Use of strongly subjective adjectives [Wilson et al., 2005]
– Use of labels that describe:
• Physical appearance
• Disposition or character
• Occupation
5. Linguistic biases
Abstract / positive:
He is intelligent, successful, helpful.
Concrete / neutral:
She is studying, listening, thinking.
6. Analysis
ESP Game Dataset
100k images
LIWC categories:
“humans, friends,
family”
Use hyponyms of
“man” / “male” and
“woman” / “female”
to label gender
STEP 1:
Find images
of men and
women
Part-of-speech
tagging
CLAWS C5
Manual error analysis:
Adjectives (1.45%)
Women are more
often described with
adjectives
STEP 2:
Find labels
that are
adjectives
Subjectivity Lexicon
(Wilson et al. 2005)
Women are more
often described with
subjective adjectives
STEP 3:
Find
subjective
adjectives
Identify images with
labels concerning 6
occupations
For each label/image:
Does it describe
appearance,
disposition,
occupation
STEP 4:
Manual
analysis
Women are associated
with more labels
concerning appearance;
fewer concerning
occupation
7. Strongly subjective adjectives
Men (N = 18,916) Women (N = 14,628)
Happy (385) Sexy (2,425)
Ugly (225) Happy (549)
Sad (201) Ugly (254)
Angry (132) Sad (241)
Drunk (124) Cute (117)
Scary (107) Beautiful (84)
Funny (103) Fun (67)
Cute (88) Drunk (58)
Mad (74) Scary (51)
Fun (63) Little (40)
8. Implications & future work
• Exposing biases brings up issues for
– Designers of systems
– Those who train algorithms
• Controlled experiment
– Stimulus
– Social cues
– Output (linguistic biases)