DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
Locky's Asialex 2015
1. 1
by Locky, Law
PhD Candidate
The Hong Kong Polytechnic University
Email: Lx3h@yahoo.com
The 9th International Conference of ASIALEX
Words, Dictionaries and Corpora: Innovation in reference science
25-27 June 2015 | Hong Kong
2. Television drama, despite its enormous popularity across the globe, has rarely received
attentions from the linguistics field. The dearth of research into television drama dialogue is
further exposed by the thriving contributions from various other fields such as philosophy,
psychology, cultural studies and media studies.
This paper seeks to promote research interest in this unique mediated text by selecting
renowned medical dramedy House M.D. as research subject and comparing its 927,922-word
House M.D. pure dialogue corpus (HMDC) to both the 450-million-word Corpus of
Contemporary American English (COCA) and its 95-million-word spoken subcorpus (COCA
Spoken) using an adaptation of Bednerak’s (2011) ranked frequency list method.
Using WordSmith Tools in the calculation of n-gram (n = 1, 2, 3) at the words/clusters level,
the findings indicate that HMDC is more interpersonal than COCA and has a closer
resemblance to COCA Spoken than to COCA. HMDC also contains 3.4 times more
negativity than COCA Spoken and 2.8 times more than COCA. As such, viewers are
presented with English far more interpersonal, as well as involving significantly more
disagreement than one will encounter in the real world. This study not only shows
similarities and differences between House M.D. and contemporary American English, but
also provides a preview of the huge potential in television drama-related research.
2
4. A spin-off from my PhD project, titled “
and Creativity: a Corpus Linguistic Systemic
Functional Multimodal Discourse Analysis Approach”
No lack of interest and research in literary texts; films are
gaining popularity, but
TV drama has not attracted considerable attention from
the linguistic field, so much to a point that it is “marked
equally as popular” as it is “devalued” (Bignell and Lacey
2005:3).
an “urgent need … for a treatment of fictional cinema
and television from various linguistic perspectives.”
Piazza, Bednarek and Rossi (2011, p. 2)
4
5. Interested in how language in drama differs
from/resembles language in “reality”
Bearing in mind that “reality” represented by any
corpora is always limited by the scope of its data, no
matter it is by genre, region, gender, race or time.
Therefore, there can never be a complete
representation of any actual realities.
5
6. FOX: 8 years, 8 seasons, 177 episodes
David Shore -- Primetime Emmy Awards
Outstanding Writing for a Drama Series winner
Bryan Singer -- executive producer (film director of
Valkyrie and X-Men)
Hugh Laurie -- twice winner of the Golden Globe
Best Performance by an Actor in a Television
Series – Drama
it has received an 8.9 / 10 rating from 237,068
users on IMDb.com as of November 2014
6
7. In 2008, it was one of the top-ten rated shows in
the United States & the most watched television
program in the world
By 2011, it had been viewed by a spectacular 81.8
million in 66 countries
since 2011, Hugh Laurie has been the world’s
Most-Watched (Leading) Man On Television on the
Guinness Book of Records
2nd on Forbes’s list of the Highest-Paid TV Actors
in 2012 at $400,000 (£247,230) per episode
7
8. 1. How do dialogues in House M.D. differ
from/resemble contemporary American
English?
2. What differences/similarities can be drawn
from a comparison between COCA and
COCA spoken corpus with respect to a House
M.D. dialogue corpus?
3. What can be unveiled about House M.D.
through this corpus linguistic approach?
8
10. Construct a House M.D. Corpus (927,922 words)
from fan scripts
Remove all non-dialogue elements such as fade-
ins, scene headings, action sequences, scene
transitions, mood brackets, parentheticals,
commercial tags and character name tags
Repeated manual check against internet sources
10
12. COCA contains more than 450 million words in
189,431 texts equally divided in 5 genres:
spoken, fiction, popular magazines, newspapers
and academic journals
“the largest freely-available corpus of English,
and the only large and balanced corpus of
American English” (Davies, 2008)
12
13. The spoken part of COCA (hereafter referred
to as COCA Spoken) contains 95 million words
[95,385,672] of transcripts of unscripted
conversation from more than 150 different TV
and radio programs such as All Things Considered
(NPR), Newshour (PBS), Good Morning America
(ABC), Today Show (NBC), 60 Minutes (CBS),
Hannity and Colmes (Fox), Jerry Springer, etc
(Davies, 2008).
13
14. PhD project on creative language requires the
use of large, balanced and up-to-date corpus of
American English
Spoken corpus larger than 1 million is rare, eg.
SBCSAE has 249k words
The “reality” concerned is not of any
specialized purposes, but has to include casual
conversations as well as some medical topics
(i.e. a “reality” which includes a doctor’s daily
spoken English – social and medical discourse)
14
15. Ngram comparison & Rank Frequency Lists
1. HMDC 1-gram, 2-gram and 3-gram (hereafter
referred to as 1-to-3-grams) with respect to
COCA’s 1-to-3-grams and vice versa,
2. HMDC 1-to-3-gram with respect to COCA
Spoken’s 1-to-3-grams and vice versa,
3. Negativity / positivity of 3-grams HMDC with
COCA and COCA Spoken, and vice versa.
15
16. 1. Rank sum / difference forms basis of Mann-
Whiteney U test
2. Does not assume normal distribution
3. Works well with small observed frequencies
as well as large ones
4. Does not exaggerate at low frequency count
5. Simplifies large numbers, reveals underlying
patterns
6. Indicates how different two sets of data are
16
18. 18
among the shared
thirteen 1-grams
double digit rank
difference
House M.D. appears to be
•more interpersonal
•focused more on 1st and 2nd person singular
than the norm in general written and spoken American English, but
higher token ngrams must be considered.
20. 20
Issue 1: Contraction alternatives:
I am (rank 170),You are, is not,etc.
Issue 2: 2-gram There—’s andThat—’s
not found but
2-grams There’s there and That’s
that/i/you/Mr/it are found.
Bug in algorithm?
Divergence from COCA
21. 21
Issue of Contraction alternatives continues:
•It isn’t (rank 532),
•I am not (rank 638),
•It is a (rank 64),
•You are not (rank 2,663),
•You aren’t (rank 6,009),
•There is no (rank 52)
17 Negativity 6 Negativity
18 contractions 6 contractions
22. Contraction alternatives & (possible) bug affect
results
HMDC is not well-reflected by COCA
Should try comparing with COCA Spoken
22
25. 25
Lower rank difference, lower negativity in COCA
Spoken
Issue of contraction alternatives &
acronyms
26. Such decrease in the frequency of negativity in COCA
Spoken with respect to COCA is a result of an increase
in the positivity in spoken American English.
Therefore considering the top twenty 3-grams, COCA
Spoken contains 5% more positivity than COCA
HDMC contains 3.4 times more negativity than COCA
Spoken and 2.8 times more than COCA.
In a way, House M.D. has brutally intervened in viewers’
perception of the norm of American English.
26
27. This study has discussed how the language used in House M.D. is
related to contemporary spoken American English
Has listed differences and similarities drawn from a comparison
between COCA and COCA Spoken with respect to HMDC
Has showed how House M.D. can be identified as a dramedy far more
interpersonal, 1st and 2nd person-addressed and disagreeing than one
would encounter in “reality”.
In addition to the original research questions, it has demonstrated the
strengths and weaknesses of using 1-to-3-grams rank difference in
comparing HMDC with COCA and COCA Spoken
Has addressed potential methodological issue of contraction
alternatives and acronyms affecting ngram ranking and rank difference
27
29. Judging by the results obtained from this simple
analysis, further studies along the line of
television drama are worthy researchers’
attention, interest and devotion.
29
31. Allen, R. C. (2004). Frequently asked questions. A general introduction to the reader. In R. C. Allen, & A. Hill (Eds.), The Television Studies Reader (pp. 1-26). New
York: Routledge.
Androutsopoulos, J. (2012). Introduction: Language and society in cinematic discourse. Multilingua , 31, 139-154.
Bednarek, M. (2011). The language of fictional television: a case study of the ‘dramedy’Gilmore Girls. English Text Construction , 4 (1), 54-83.
Bednarek, M. (2010). The Language of Fictional Television: Drama and Identity. London: Continuum International Publishing Group.
Biber, D. (2009). A corpus-driven approach to formulaic language in English. InternationalJournal of Corpus Linguistics , 14 (3), 275-311.
Bignell, J., & Lacey, S. (2005). Popular television drama : critical perspectives. (J. Bignell, & S. Lacey, Eds.) Manchester: Manchester University Press.
Brock, A. (2004). Analyzing scripts in humorous communication. Humor: InternationalJournal of Humor Research , 17 (4), 353-360.
Bubel, C. (2006). The linguistic construction of character relations in TV drama: Doing friendship in Sex and the City. Retrieved April 4, 2013, from SciDok-Datenbank:
http://scidok.sulb.uni-saarland.de/volltexte/2006/598/pdf/Diss_Bubel_publ.pdf
Chamber, S. A. (2003). Language and Structure in The West Wing. In P. C. Rollins, & J. E. O'Connor (Eds.), The West Wing: The American Presidency As Television
Drama (pp. 83-100). New York: Syracuse University Press.
Chua, B. H. (2008). Structure of identification and distancing in watching East Asian television drama. In B. H. Chua, & K. Iwabuchi, East Asian Pop Culture:
Analysing the Korean Wave (pp. 73-90). Hong Kong: Hong Kong University Press.
Cover, R. (2004). From Butler to Buffy: Notes towards a strategy for identity analysis in contemporary television narrative. Reconstruction: Studies in Contemporary
Culture , 4 (2).
Davies, M. (2014, December 1). CoRD | The Corpus of Contemporary American English (COCA). Retrieved March 21, 2011, from VARIENG: CoRD | The Corpus of
Contemporary American English (COCA)
Davies, M. (2011). N-grams data from the Corpus of Contemporary American English (COCA). Retrieved November 20, 2014, from http://www.ngrams.info
Goodier, B. C., & Arrington, M. I. (2007). Physicians, patients, and medical dialogue in the NYPD Blue prostate cancer story. Journal of Medical Humanities , 28 (1),
45-58. IMDb. (n.d.).
Jacoby, H., & Irwin, W. (Eds.). (2008). House and Philosophy: Everybody Lies. John Wiley & Sons.
Jamieson, D. (2011, September). Does TV accurately portray psychology? Retrieved April 20, 2013, from American Psychology Association:
http://www.apa.org/gradpsych/2011/09/psychology-shows.aspx
Munt, S. R. (2006). A queer undertaking: Anxiety and reparation in the HBO television drama series Six Feet Under. Feminist Media Studies , 263-279.
O’Keeffe, A., McCarthy, M., & Carter, R. (2007). From Corpus to Classroom: Language Use and Language Teaching. Cambridge: CUP.
Piazza, R., Bednarek, M., & Rossi, F. (Eds.). (2011). Telecinematic Discourse: Approaches to the Language of Films and Television Series. Philadelphia: John Benjamins
B.V.
Quaglio, P. (2008). Television dialogue and natural conversation: Linguistic similarities and functional differences. In A. Ädel, & R. Reppen (Eds.), Corpora and
Discourse: The challenges of different settings (Vol. vi, pp. 189-210). John Benjamins.
Richardson, K. (2010). Television Dramatic Dialogue: A Sociolinguistic Study. Oxford: Oxford University Press.
Scott, M. (2014). WordSmith Tools Manual . Retrieved May 20, 2014, from Lexical Analysis Software:
http://www.lexically.net/downloads/version6/HTML/index.html
Wild, D. K. (2005a, October 24). Constructing House: An Interview with House, M.D. writer Lawrence Kaplow. Retrieved September 4, 2014, from Blogcritics:
http://blogcritics.org/constructing-house-an-interview-with-house/
Woznicki, K. (2005, September 27). A doctor/writer in the 'House'. Retrieved September 4, 2014, from CNN.com:
http://edition.cnn.com/2005/HEALTH/09/27/profile.writer.foster/index.html 31
32. 32
by Locky, Law
PhD Candidate
The Hong Kong Polytechnic University
Email: Lx3h@yahoo.com
The 9th International Conference of ASIALEX
Words, Dictionaries and Corpora: Innovation in reference science
25-27 June 2015 | Hong Kong