A paper presentation made by me for the paper 'A Pendulum Swung Too Far' by Kenneth Church at IIT Bombay as a part of preparation for the MTech Seminar.
Get the paper on which this presentation is based here: http://languagelog.ldc.upenn.edu/myl/ldc/swung-too-far.pdf
2. Roadmap
● Introduction
● History of NLP
● Objections to Empiricism
○ Chomsky
○ Minsky
○ Pierce
● Reasons for the Problem and Solutions
3. Roadmap: We Are Here
● Introduction
● History of NLP
● Objections to Empiricism
○ Chomsky
○ Minsky
○ Pierce
● Reasons for the Problem and Solutions
4. Introduction
● The paper deals with the oscillation between
the predominance of theory-driven
approaches vs data-driven approaches in
the history of NLP and its reasons.
● Specifically, it predicts a surge in rationalism
in the 2010s and explains why and how
researchers need to be prepared for it.
5. Rationalism vs. Empiricism
Rationalism
Empiricism
1. Emphasizes on theory
2. Assumes an “innate
language faculty”
3. Aims at discovering the
language of the human
mind (linguistic
competence)
4. Assigns categories to
language units
5. Major advocates:
Chomsky, Minsky
1. Emphasizes on data
2. Assumes all knowledge
gathered only via senses
3. Aims at analysing
language as it actually
occurs (linguistic
performance)
4. Assigns probabilities to
language units
5. Major advocates: Shannon,
Norvig
6. Rationalism vs. Empiricism
Rationalism
Empiricism
1. Emphasizes on theory
2. Assumes an “innate
language faculty”
3. Aims at discovering the
language of the human
mind (linguistic
competence)
4. Assigns categories to
language units
5. Major advocates:
Chomsky, Minsky
1. Emphasizes on data
2. Assumes all knowledge
gathered only via senses
3. Aims at analysing
language as it actually
occurs (linguistic
performance)
4. Assigns probabilities to
language units
5. Major advocates: Shannon,
Norvig
7. Rationalism vs. Empiricism
Rationalism
Empiricism
1. Emphasizes on theory
2. Assumes an “innate
language faculty”
3. Aims at discovering the
language of the human
mind (linguistic
competence)
4. Assigns categories to
language units
5. Major advocates:
Chomsky, Minsky
1. Emphasizes on data
2. Assumes all knowledge
gathered only via senses
3. Aims at analysing
language as it actually
occurs (linguistic
performance)
4. Assigns probabilities to
language units
5. Major advocates: Shannon,
Norvig
8. Rationalism vs. Empiricism
Rationalism
Empiricism
1. Emphasizes on theory
2. Assumes an “innate
language faculty”
3. Aims at discovering the
language of the human
mind (linguistic
competence)
4. Assigns categories to
language units
5. Major advocates:
Chomsky, Minsky
1. Emphasizes on data
2. Assumes all knowledge
gathered only via senses
3. Aims at analysing
language as it actually
occurs (linguistic
performance)
4. Assigns probabilities to
language units
5. Major advocates: Shannon,
Norvig
9. Rationalism vs. Empiricism
Rationalism
Empiricism
1. Emphasizes on theory
2. Assumes an “innate
language faculty”
3. Aims at discovering the
language of the human
mind (linguistic
competence)
4. Assigns categories to
language units
5. Major advocates:
Chomsky, Minsky
1. Emphasizes on data
2. Assumes all knowledge
gathered only via senses
3. Aims at analysing
language as it actually
occurs (linguistic
performance)
4. Assigns probabilities to
language units
5. Major advocates: Shannon,
Norvig
10. Roadmap: We Are Here
● Introduction
● History of NLP
● Objections to Empiricism
○ Chomsky
○ Minsky
○ Pierce
● Reasons for the Problem and Solutions
12. 1950s: Empiricism
● Empiricism dominated across several fields
● Words were classified on the basis of their
co-occurrence with other words (“You shall
know a word by the company it keeps” Firth, 1957)
13. 1970s: Rationalism
● Several authors such as Chomsky, Minsky,
etc criticized the Empirical approach
● Failure of the Empirical approach led to
funding cutbacks (“winters”)
○ 1966: Machine Translation Failure
○ 1970: The abandonment of connectionism
○ 1971-75: Speech Recognition Failure
14. 1990s: Empiricism
● Large amounts of data became available
● Several specialized problems could be
solved by statistical frameworks, without
concentration on the general problems
15. 2010s: Rationalism?
● Most of the low-hanging fruit has been
picked up
● But the original criticisms of the empirical
approach are still as valid
16. Roadmap: We Are Here
● Introduction
● History of NLP
● Objections to Empiricism
○ Chomsky
○ Minsky
○ Pierce
● Reasons for the Problem and Solutions
17. Objections to Empiricism
● Several common empirical frameworks were
opposed by rationalists in the 70s, including:
○
○
○
○
Linear Separators (Machine Learning)
Vector Space Model (Information Retrieval)
n-grams (Language Modeling)
HMMs (Speech Recognition)
● Many of these are mere approximations of
complex phenomena
19. Chomsky’s Objections:
n-gram Language Modeling
● Chomsky showed that n-grams cannot learn
long-distance dependencies (dependencies
spanning more than n words)
● For practical purposes ‘n’ needs to be a
small value (3 or 5)
● However, such small values fail to capture
several interesting facts
20. Chomsky’s Objections:
Finite State Methods
● Examples of Finite State Methods include
○ Hidden Markov Models (HMMs)
○ Conditional Random Fields (CRFs)
● Finite State Methods can capture
dependencies beyond n words
● However, they may require infinite memory
to process certain sentences
21. Chomsky’s Objections:
Center Embedded Grammars
● A center embedded grammar is of the form:
○ A -> x A y
● Chomsky proved that a center embedded
grammar will require infinite memory and
thus cannot be handled by finite state
methods
● Center embedding is common in English, for
example:
○ A man that a woman that a child that a bird that I
heard saw knows loves
23. Minsky’s Objections:
Perceptrons
● Minsky showed that perceptrons (and linear
separators in general) cannot learn functions
that are not linearly separable such as XOR.
24. Minsky’s Objections:
Perceptrons
● This has implications for several tasks
including:
○
○
○
○
Word Sense Disambiguation
Information Retrieval
Author Identification
Sentiment Analysis
● For instance, this is the reason why
sentiment analysis ignores loaded terms
25. Minsky’s Objections:
Sentiment Analysis
● Loaded terms can be either positive or
negative depending on whom it is addressed
to. This is an XOR dependency:
Loaded Term
Addressed to us
Sentiment
Positive
Y
Positive
Positive
N
Negative
Negative
Y
Negative
Negative
N
Positive
27. Pierce’s Objections:
Evaluation by Demos
● According to Pierce, evaluation of projects
should be based on scientific principles
rather than laboratory demos.
● Projects give good results in laboratory
conditions, but have much higher error rates
in real-world conditions.
28. Pierce’s Objections:
Pattern Matching
● Pierce stated that pattern matching is “artful
deception”, i.e. it is based on heuristics
rather than scientific theory.
● Examples:
○ The ELIZA effect
○ The Chinese Room argument
29. Pierce’s Objections:
Pattern Matching
● While pattern matching produces better
results in the short term, it does so only by
ignoring real scientific questions.
● While ambitious approaches may require
time to deliver, they are backed by hard
science.
30. Roadmap: We Are Here
● Introduction
● History of NLP
● Objections to Empiricism
○ Chomsky
○ Minsky
○ Pierce
● Reasons for the Problem and Solutions
31. Reason for the Oscillations:
Gaps in Teaching
● The “losing” side of the debate (currently
Rationalism) is never mentioned in
textbooks/courses
● Leads to “reinventing the wheel” by each
generation of NLP researchers
32. Reason for the Oscillations:
Gaps in Teaching
● Currently most courses concentrate on
Statistical methods, ignoring linguistic and
scientific questions
● This prepares students only for “low-hanging
fruit” but not the real scientific questions
33. Solution
● Introduce the following in NLP courses:
○
○
○
○
○
○
Syntax
Morphology
Phonology
Phonetics
Historical Linguistics
Language Universals
● Create parallels between computational
linguistics and formal linguistics
34. Solution
● Teach both sides of the rationalism vs.
empiricism debate
● Educate students about the challenges
ahead of the “low-hanging fruit”
36. Other References
●
●
●
●
Papers In Linguistics 1934-1951 by JR Firth, 1957
Syntactic Structures by Noam Chomsky, 1957
Whither Speech Recognition by John Pierce, 1969
ELIZA - A Computer Program for the Study of Natural
Language Communication between Man and Machine
by Joseph Weizenbaum, 1966
● Minds, Brains and Programs by John Searle, 1980