SlideShare une entreprise Scribd logo
1  sur  37
Natural Language Processing
+ Python
by Ann C. Tan-Pohlmann

February 22, 2014
Outline
• NLP Basics
• NLTK
– Text Processing

• Gensim (really, really short )
– Text Classification

2
Natural Language Processing
• computer science, artificial intelligence, and
linguistics
• human–computer interaction
• natural language understanding
• natural language generation
- Wikipedia

3
Star Trek's Universal Translator

http://www.youtube.com/watch?v=EaeSKU
V2zp0
Spoken Dialog Systems

5
NLP Basics
• Morphology
– study of word formation
– how word forms vary in a sentence

• Syntax
– branch of grammar
– how words are arranged in a sentence to show
connections of meaning

• Semantics
– study of meaning of words, phrases and sentences
6
NLTK: Getting Started
• Natural Language Took Kit
– for symbolic and statistical NLP
– teaching tool, study tool and as a platform for prototyping

• Python 2.7 is a prerequisite
>>> import nltk
>>> nltk.download()

7
Some NLTK methods
•
•
•
•
•

Frequency Distribution

text.similar(str)
concordance(str)
len(text)
len(set(text))
lexical_diversity

•
•
•
•
•

– len(text)/
len(set(text))

fd = FreqDist(text)
fd.inc(str)
fd[str]
fd.N()
fd.max()

• text.collocations()
- sequence of words that occur
together often

MORPHOLOGY > Syntax > Semantics

8
Frequency Distribution
•
•
•
•
•

fd = FreqDist(text)
fd.inc(str) – increment count
fd[str] – returns the number of occurrence for sample str
fd.N() – total number of samples
fd.max() – sample with the greatest count

9
Corpus
• large collection of raw or categorized text on
one or more domain
• Examples: Gutenberg, Brown, Reuters, Web &
Chat Txt
>>> from nltk.corpus import brown
>>> brown.categories()
['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies', '
humor', 'learned', 'lore', 'mystery', 'news', 'religion', 'reviews', 'romance',
'science_fiction']
>>> adventure_text = brown.words(categories='adventure')

10
Corpora in Other Languages
>>> from nltk.corpus import udhr
>>> languages = nltk.corpus.udhr.fileids()
>>> languages.index('Filipino_Tagalog-Latin1')
>>> tagalog = nltk.corpus.udhr.raw('Filipino_Tagalog-Latin1')
>>> tagalog_words = nltk.corpus.udhr.words('Filipino_Tagalog-Latin1')
>>> tagalog_tokens = nltk.word_tokenize(tagalog)
>>> tagalog_text = nltk.Text(tagalog_tokens)
>>> fd = FreqDist(tagalog_text)
>>> for sample in fd:
... print sample

11
Using Corpus from Palito
Corpus
– large collection of raw or categorized text
>>> import nltk
>>> from nltk.corpus import PlaintextCorpusReader
>>> corpus_dir = '/Users/ann/Downloads'
>>> tagalog = PlaintextCorpusReader(corpus_dir,
'Tagalog_Literary_Text.txt')
>>> raw = tagalog.raw()
>>> sentences = tagalog.sents()
>>> words = tagalog.words()
>>> tokens = nltk.word_tokenize(raw)
>>> tagalog_text = nltk.Text(tokens)
12
Spoken Dialog Systems

MORPHOLOGY > Syntax > Semantics

13
Tokenization
Tokenization
– breaking up of string into words and punctuations

>>> tokens = nltk.word_tokenize(raw)
>>> tagalog_tokens = nltk.Text(tokens)
>>> tagalog_tokens = set(sample.lower() for sample in tagalog_tokens)

MORPHOLOGY > Syntax > Semantics

14
Stemming
Stemming
– normalize words into its base form, result may not be the 'root' word
>>> def stem(word):
... for suffix in ['ing', 'ly', 'ed', 'ious', 'ies', 'ive', 'es', 's', 'ment']:
...
if word.endswith(suffix):
...
return word[:-len(suffix)]
... return word
...
>>> stem('reading')
'read'
>>> stem('moment')
'mo'

MORPHOLOGY > Syntax > Semantics

15
Lemmatization
Lemmatization
– uses vocabulary list and morphological analysis (uses POS of a word)
>>> def stem(word):
... for suffix in ['ing', 'ly', 'ed', 'ious', 'ies', 'ive', 'es', 's', 'ment']:
...
if word.endswith(suffix) and word[:-len(suffix)] in brown.words():
...
return word[:-len(suffix)]
... return word
...
>>> stem('reading')
'read'
>>> stem('moment')
'moment'

MORPHOLOGY > Syntax > Semantics

16
NLTK Stemmers & Lemmatizer
• Porter Stemmer and Lancaster Stemmer
>>> porter = nltk.PorterStemmer()
>>> lancaster = nltk.LancasterStemmer()
>>> [porter.stem(w) for w in brown.words()[:100]]

• Word Net Lemmatizer
>>> wnl = nltk.WordNetLemmatizer()
>>> [wnl.lemmatize(w) for w in brown.words()[:100]]

• Comparison
>>> [wnl.lemmatize(w) for w in ['investigation', 'women']]
>>> [porter.stem(w) for w in ['investigation', 'women']]
>>> [lancaster.stem(w) for w in ['investigation', 'women']]

MORPHOLOGY > Syntax > Semantics

17
Using Regular Expression
Operator
.
^abc
abc$
[abc]
[A-Z0-9]
ed|ing|s
*
+
?
{n}
{n,}
{,n}
{m,n}
a(b|c)+

Behavior
Wildcard, matches any character
Matches some pattern abc at the start of a string
Matches some pattern abc at the end of a string
Matches one of a set of characters
Matches one of a range of characters
Matches one of the specified strings (disjunction)
Zero or more of previous item, e.g. a*, [a-z]* (also known as Kleene Closure)
One or more of previous item, e.g. a+, [a-z]+
Zero or one of the previous item (i.e. optional), e.g. a?, [a-z]?
Exactly n repeats where n is a non-negative integer
At least n repeats
No more than n repeats
At least m and no more than n repeats
Parentheses that indicate the scope of the operators

MORPHOLOGY > Syntax > Semantics

18
Using Regular Expression
>>> import re
>>> re.findall(r'^(.*?)(ing|ly|ed|ious|ies|ive|es|s|ment)?$', 'reading')
[('read', 'ing')]
>>> def stem(word):
... regexp = r'^(.*?)(ing|ly|ed|ious|ies|ive|es|s|ment)?$'
... stem, suffix = re.findall(regexp, word)[0]
... return stem
...
>>> stem('reading')
'read'
>>> stem('moment')
'moment'

MORPHOLOGY > Syntax > Semantics

19
Spoken Dialog Systems

Morphology > SYNTAX > Semantics

20
Lexical Resources
• collection of words with association information (annotation)
• Ex: stopwords – high-frequency words with little lexical
content
>>> from nltk.corpus import stopwords
>>> stopwords.words('english')
>>> stopwords.words('german')

MORPHOLOGY > Syntax > Semantics

21
Part-of-Speech (POS) Tagging
• the process of labeling and classifying words
to a particular part of speech based on its
definition and context

Morphology > SYNTAX > Semantics

22
NLTKs POS Tag Sets* – 1/2
Tag
ADJ
ADV
CNJ
DET
EX
FW
MOD
N
NP

Meaning
adjective
adverb
conjunction
determiner
existential
foreign word
modal verb
noun
proper noun

Examples
new, good, high, special, big, local
really, already, still, early, now
and, or, but, if, while, although
the, a, some, most, every, no
there, there's
dolce, ersatz, esprit, quo, maitre
will, can, would, may, must, should
year, home, costs, time, education
Alison, Africa, April, Washington

*simplified
Morphology > SYNTAX > Semantics

23
NLTKs POS Tag Sets* – 2/2
Tag
NUM
PRO
P
TO
UH
V
VD
VG
VN
WH

Meaning
number
pronoun
preposition
the word to
interjection
verb
past tense
present participle
past participle
wh determiner

Examples
twenty-four, fourth, 1991, 14:24
he, their, her, its, my, I, us
on, of, at, with, by, into, under
to
ah, bang, ha, whee, hmpf, oops
is, has, get, do, make, see, run
said, took, told, made, asked
making, going, playing, working
given, taken, begun, sung
who, which, when, what, where, how

*simplified
Morphology > SYNTAX > Semantics

24
NLTK POS Tagger (Brown)
>>> nltk.pos_tag(brown.words()[:30])
[('The', 'DT'), ('Fulton', 'NNP'), ('County', 'NNP'), ('Grand', 'NNP'),
('Jury', 'NNP'), ('said', 'VBD'), ('Friday', 'NNP'), ('an', 'DT'),
('investigation', 'NN'), ('of', 'IN'), ("Atlanta's", 'JJ'), ('recent', 'JJ'),
('primary', 'JJ'), ('election', 'NN'), ('produced', 'VBN'), ('``', '``'), ('no',
'DT'), ('evidence', 'NN'), ("''", "''"), ('that', 'WDT'), ('any', 'DT'),
('irregularities', 'NNS'), ('took', 'VBD'), ('place', 'NN'), ('.', '.'), ('The',
'DT'), ('jury', 'NN'), ('further', 'RB'), ('said', 'VBD'), ('in', 'IN')]
>>> brown.tagged_words(simplify_tags=True)
[('The', 'DET'), ('Fulton', 'NP'), ('County', 'N'), ...]

Morphology > SYNTAX > Semantics

25
NLTK POS Tagger (German)
>>> german = nltk.corpus.europarl_raw.german
>>> nltk.pos_tag(german.words()[:30])
[(u'Wiederaufnahme', 'NNP'), (u'der', 'NN'), (u'Sitzungsperiode', 'NNP'),
(u'Ich', 'NNP'), (u'erklxe4re', 'NNP'), (u'die', 'VB'), (u'am', 'NN'), (u'Freita
g', 'NNP'), (u',', ','), (u'dem', 'NN'), (u'17.', 'CD'), (u'Dezember', 'NNP'), (u'
unterbrochene', 'NN'), (u'Sitzungsperiode', 'NNP'), (u'des', 'VBZ'), (u'Eur
opxe4ischen', 'JJ'), (u'Parlaments', 'NNS'), (u'fxfcr', 'JJ'), (u'wiederaufg
enommen', 'NNS'), (u',', ','), (u'wxfcnsche', 'NNP'), (u'Ihnen', 'NNP'), (u'
nochmals', 'NNS'), (u'alles', 'VBZ'), (u'Gute', 'NNP'), (u'zum', 'NN'), (u'Ja
hreswechsel', 'NNP'), (u'und', 'NN'), (u'hoffe', 'NN'), (u',', ',')]

xe4 = ä xfc = ü
!!! DOES NOT WORK FOR GERMAN

Morphology > SYNTAX > Semantics

26
NLTK POS Dictionary
>>> pos = nltk.defaultdict(lambda:'N')
>>> pos['eat']
'N'
>>> pos.items()
[('eat', 'N')]
>>> for (word, tag) in brown.tagged_words(simplify_tags=True):
... if word in pos:
...
if isinstance(pos[word], str):
...
new_list = [pos[word]]
...
pos[word] = new_list
...
if tag not in pos[word]:
...
pos[word].append(tag)
... else:
...
pos[word] = [tag]
...
>>> pos['eat']
['N', 'V']
Morphology > SYNTAX > Semantics

27
What else can you do with NLTK?
• Other Taggers
– Unigram Tagging
• nltk.UnigramTagger()
• train tagger using tagged sentence data

– N-gram Tagging

• Text classification using machine learning
techniques
– decision trees
– naïve Bayes classification (supervised)
– Markov Models
Morphology > SYNTAX > SEMANTICS

28
Gensim
• Tool that extracts semantic structure of
documents, by examining word statistical cooccurrence patterns within a corpus of
training documents.
• Algorithms:
1. Latent Semantic Analysis (LSA)
2. Latent Dirichlet Allocation (LDA) or Random
Projections
Morphology > Syntax > SEMANTICS

29
Gensim
• Features
– memory independent
– wrappers/converters for several data formats

• Vector
– representation of the document as an array of features or
question-answer pair
1.
2.
3.

(word occurrence, count)
(paragraph, count)
(font, count)

• Model
– transformation from one vector to another
– learned from a training corpus without supervision
Morphology > Syntax > SEMANTICS

30
Wiki document classification

http://radimrehurek.com/gensim/wiki.html

31
Other NLP tools for Python
• TextBlob
– part-of-speech tagging, noun phrase extraction,
sentiment analysis, classification, translation
– https://pypi.python.org/pypi/textblob

• Pattern
– part-of-speech taggers, n-gram search, sentiment
analysis, WordNet, machine learning
– http://www.clips.ua.ac.be/pattern
32
Star Trek technology that became a reality

http://www.youtube.com/watch?v=sRZxwR
IH9RI
Installation Guides
• NLTK
– http://www.nltk.org/install.html
– http://www.nltk.org/data.html

• Gensim
– http://radimrehurek.com/gensim/install.html

• Palito
– http://ccs.dlsu.edu.ph:8086/Palito/find_project.js
p
34
Using iPython
• http://ipython.org/install.html
>>> documents = ["Human machine interface for lab abc computer applications",
>>>
"A survey of user opinion of computer system response time",
>>>
"The EPS user interface management system",
>>>
"System and human system engineering testing of EPS",
>>>
"Relation of user perceived response time to error measurement",
>>>
"The generation of random binary unordered trees",
>>>
"The intersection graph of paths in trees",
>>>
"Graph minors IV Widths of trees and well quasi ordering",
>>>
"Graph minors A survey"]

35
References
• Natural Language Processing with Python By
Steven Bird, Ewan Klein, Edward Loper
• http://www.nltk.org/book/
• http://radimrehurek.com/gensim/tutorial.htm
l

36
Thank You!
• For questions and comments:
- ann at auberonsolutions dot com

37

Contenu connexe

Tendances

Python and sysadmin I
Python and sysadmin IPython and sysadmin I
Python and sysadmin IGuixing Bai
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational BiologyAtreyiB
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analyticsshanbady
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - IJaganadh Gopinadhan
 
Large scale nlp using python's nltk on azure
Large scale nlp using python's nltk on azureLarge scale nlp using python's nltk on azure
Large scale nlp using python's nltk on azurecloudbeatsch
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rVivian S. Zhang
 
Migrating to Puppet 4.0
Migrating to Puppet 4.0Migrating to Puppet 4.0
Migrating to Puppet 4.0Puppet
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout source{d}
 
Python Workshop - Learn Python the Hard Way
Python Workshop - Learn Python the Hard WayPython Workshop - Learn Python the Hard Way
Python Workshop - Learn Python the Hard WayUtkarsh Sengar
 
An introduction to Python for absolute beginners
An introduction to Python for absolute beginnersAn introduction to Python for absolute beginners
An introduction to Python for absolute beginnersKálmán "KAMI" Szalai
 
The Ring programming language version 1.7 book - Part 43 of 196
The Ring programming language version 1.7 book - Part 43 of 196The Ring programming language version 1.7 book - Part 43 of 196
The Ring programming language version 1.7 book - Part 43 of 196Mahmoud Samir Fayed
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?lichtkind
 
Designing with Groovy Traits - Gr8Conf India
Designing with Groovy Traits - Gr8Conf IndiaDesigning with Groovy Traits - Gr8Conf India
Designing with Groovy Traits - Gr8Conf IndiaNaresha K
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric SystemErin Dees
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easyGopi Krishnan Nambiar
 
Streams, sockets and filters oh my!
Streams, sockets and filters oh my!Streams, sockets and filters oh my!
Streams, sockets and filters oh my!Elizabeth Smith
 

Tendances (20)

Python and sysadmin I
Python and sysadmin IPython and sysadmin I
Python and sysadmin I
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analytics
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Large scale nlp using python's nltk on azure
Large scale nlp using python's nltk on azureLarge scale nlp using python's nltk on azure
Large scale nlp using python's nltk on azure
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with r
 
Python for Penetration testers
Python for Penetration testersPython for Penetration testers
Python for Penetration testers
 
Python build your security tools.pdf
Python build your security tools.pdfPython build your security tools.pdf
Python build your security tools.pdf
 
Migrating to Puppet 4.0
Migrating to Puppet 4.0Migrating to Puppet 4.0
Migrating to Puppet 4.0
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
IO Streams, Files and Directories
IO Streams, Files and DirectoriesIO Streams, Files and Directories
IO Streams, Files and Directories
 
Python Workshop - Learn Python the Hard Way
Python Workshop - Learn Python the Hard WayPython Workshop - Learn Python the Hard Way
Python Workshop - Learn Python the Hard Way
 
Biopython
BiopythonBiopython
Biopython
 
An introduction to Python for absolute beginners
An introduction to Python for absolute beginnersAn introduction to Python for absolute beginners
An introduction to Python for absolute beginners
 
The Ring programming language version 1.7 book - Part 43 of 196
The Ring programming language version 1.7 book - Part 43 of 196The Ring programming language version 1.7 book - Part 43 of 196
The Ring programming language version 1.7 book - Part 43 of 196
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?
 
Designing with Groovy Traits - Gr8Conf India
Designing with Groovy Traits - Gr8Conf IndiaDesigning with Groovy Traits - Gr8Conf India
Designing with Groovy Traits - Gr8Conf India
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric System
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
 
Streams, sockets and filters oh my!
Streams, sockets and filters oh my!Streams, sockets and filters oh my!
Streams, sockets and filters oh my!
 

Similaire à Natural Language Processing and Python

한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남Eunjeong (Lucy) Park
 
Ejercicios de estilo en la programación
Ejercicios de estilo en la programaciónEjercicios de estilo en la programación
Ejercicios de estilo en la programaciónSoftware Guru
 
Declare Your Language: Type Checking
Declare Your Language: Type CheckingDeclare Your Language: Type Checking
Declare Your Language: Type CheckingEelco Visser
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learningtelss09
 
CS4200 2019 | Lecture 3 | Parsing
CS4200 2019 | Lecture 3 | ParsingCS4200 2019 | Lecture 3 | Parsing
CS4200 2019 | Lecture 3 | ParsingEelco Visser
 
Perl 6 in Context
Perl 6 in ContextPerl 6 in Context
Perl 6 in Contextlichtkind
 
Separation of Concerns in Language Definition
Separation of Concerns in Language DefinitionSeparation of Concerns in Language Definition
Separation of Concerns in Language DefinitionEelco Visser
 
Compiler Construction | Lecture 5 | Transformation by Term Rewriting
Compiler Construction | Lecture 5 | Transformation by Term RewritingCompiler Construction | Lecture 5 | Transformation by Term Rewriting
Compiler Construction | Lecture 5 | Transformation by Term RewritingEelco Visser
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Open course(programming languages) 20150225
Open course(programming languages) 20150225Open course(programming languages) 20150225
Open course(programming languages) 20150225JangChulho
 
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)codin9cafe
 
Declare Your Language (at DLS)
Declare Your Language (at DLS)Declare Your Language (at DLS)
Declare Your Language (at DLS)Eelco Visser
 
Basics of Python programming (part 2)
Basics of Python programming (part 2)Basics of Python programming (part 2)
Basics of Python programming (part 2)Pedro Rodrigues
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014PyData
 

Similaire à Natural Language Processing and Python (20)

한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남
 
Term Rewriting
Term RewritingTerm Rewriting
Term Rewriting
 
Ejercicios de estilo en la programación
Ejercicios de estilo en la programaciónEjercicios de estilo en la programación
Ejercicios de estilo en la programación
 
Declare Your Language: Type Checking
Declare Your Language: Type CheckingDeclare Your Language: Type Checking
Declare Your Language: Type Checking
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
CS4200 2019 | Lecture 3 | Parsing
CS4200 2019 | Lecture 3 | ParsingCS4200 2019 | Lecture 3 | Parsing
CS4200 2019 | Lecture 3 | Parsing
 
Music as data
Music as dataMusic as data
Music as data
 
Perl 6 in Context
Perl 6 in ContextPerl 6 in Context
Perl 6 in Context
 
Separation of Concerns in Language Definition
Separation of Concerns in Language DefinitionSeparation of Concerns in Language Definition
Separation of Concerns in Language Definition
 
Compiler Construction | Lecture 5 | Transformation by Term Rewriting
Compiler Construction | Lecture 5 | Transformation by Term RewritingCompiler Construction | Lecture 5 | Transformation by Term Rewriting
Compiler Construction | Lecture 5 | Transformation by Term Rewriting
 
Ch2
Ch2Ch2
Ch2
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Open course(programming languages) 20150225
Open course(programming languages) 20150225Open course(programming languages) 20150225
Open course(programming languages) 20150225
 
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
 
Declare Your Language (at DLS)
Declare Your Language (at DLS)Declare Your Language (at DLS)
Declare Your Language (at DLS)
 
Basics of Python programming (part 2)
Basics of Python programming (part 2)Basics of Python programming (part 2)
Basics of Python programming (part 2)
 
Alastair Butler - 2015 - Round trips with meaning stopovers
Alastair Butler - 2015 - Round trips with meaning stopoversAlastair Butler - 2015 - Round trips with meaning stopovers
Alastair Butler - 2015 - Round trips with meaning stopovers
 
Poetic APIs
Poetic APIsPoetic APIs
Poetic APIs
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
 
Quepy
QuepyQuepy
Quepy
 

Dernier

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Natural Language Processing and Python

  • 1. Natural Language Processing + Python by Ann C. Tan-Pohlmann February 22, 2014
  • 2. Outline • NLP Basics • NLTK – Text Processing • Gensim (really, really short ) – Text Classification 2
  • 3. Natural Language Processing • computer science, artificial intelligence, and linguistics • human–computer interaction • natural language understanding • natural language generation - Wikipedia 3
  • 4. Star Trek's Universal Translator http://www.youtube.com/watch?v=EaeSKU V2zp0
  • 6. NLP Basics • Morphology – study of word formation – how word forms vary in a sentence • Syntax – branch of grammar – how words are arranged in a sentence to show connections of meaning • Semantics – study of meaning of words, phrases and sentences 6
  • 7. NLTK: Getting Started • Natural Language Took Kit – for symbolic and statistical NLP – teaching tool, study tool and as a platform for prototyping • Python 2.7 is a prerequisite >>> import nltk >>> nltk.download() 7
  • 8. Some NLTK methods • • • • • Frequency Distribution text.similar(str) concordance(str) len(text) len(set(text)) lexical_diversity • • • • • – len(text)/ len(set(text)) fd = FreqDist(text) fd.inc(str) fd[str] fd.N() fd.max() • text.collocations() - sequence of words that occur together often MORPHOLOGY > Syntax > Semantics 8
  • 9. Frequency Distribution • • • • • fd = FreqDist(text) fd.inc(str) – increment count fd[str] – returns the number of occurrence for sample str fd.N() – total number of samples fd.max() – sample with the greatest count 9
  • 10. Corpus • large collection of raw or categorized text on one or more domain • Examples: Gutenberg, Brown, Reuters, Web & Chat Txt >>> from nltk.corpus import brown >>> brown.categories() ['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies', ' humor', 'learned', 'lore', 'mystery', 'news', 'religion', 'reviews', 'romance', 'science_fiction'] >>> adventure_text = brown.words(categories='adventure') 10
  • 11. Corpora in Other Languages >>> from nltk.corpus import udhr >>> languages = nltk.corpus.udhr.fileids() >>> languages.index('Filipino_Tagalog-Latin1') >>> tagalog = nltk.corpus.udhr.raw('Filipino_Tagalog-Latin1') >>> tagalog_words = nltk.corpus.udhr.words('Filipino_Tagalog-Latin1') >>> tagalog_tokens = nltk.word_tokenize(tagalog) >>> tagalog_text = nltk.Text(tagalog_tokens) >>> fd = FreqDist(tagalog_text) >>> for sample in fd: ... print sample 11
  • 12. Using Corpus from Palito Corpus – large collection of raw or categorized text >>> import nltk >>> from nltk.corpus import PlaintextCorpusReader >>> corpus_dir = '/Users/ann/Downloads' >>> tagalog = PlaintextCorpusReader(corpus_dir, 'Tagalog_Literary_Text.txt') >>> raw = tagalog.raw() >>> sentences = tagalog.sents() >>> words = tagalog.words() >>> tokens = nltk.word_tokenize(raw) >>> tagalog_text = nltk.Text(tokens) 12
  • 13. Spoken Dialog Systems MORPHOLOGY > Syntax > Semantics 13
  • 14. Tokenization Tokenization – breaking up of string into words and punctuations >>> tokens = nltk.word_tokenize(raw) >>> tagalog_tokens = nltk.Text(tokens) >>> tagalog_tokens = set(sample.lower() for sample in tagalog_tokens) MORPHOLOGY > Syntax > Semantics 14
  • 15. Stemming Stemming – normalize words into its base form, result may not be the 'root' word >>> def stem(word): ... for suffix in ['ing', 'ly', 'ed', 'ious', 'ies', 'ive', 'es', 's', 'ment']: ... if word.endswith(suffix): ... return word[:-len(suffix)] ... return word ... >>> stem('reading') 'read' >>> stem('moment') 'mo' MORPHOLOGY > Syntax > Semantics 15
  • 16. Lemmatization Lemmatization – uses vocabulary list and morphological analysis (uses POS of a word) >>> def stem(word): ... for suffix in ['ing', 'ly', 'ed', 'ious', 'ies', 'ive', 'es', 's', 'ment']: ... if word.endswith(suffix) and word[:-len(suffix)] in brown.words(): ... return word[:-len(suffix)] ... return word ... >>> stem('reading') 'read' >>> stem('moment') 'moment' MORPHOLOGY > Syntax > Semantics 16
  • 17. NLTK Stemmers & Lemmatizer • Porter Stemmer and Lancaster Stemmer >>> porter = nltk.PorterStemmer() >>> lancaster = nltk.LancasterStemmer() >>> [porter.stem(w) for w in brown.words()[:100]] • Word Net Lemmatizer >>> wnl = nltk.WordNetLemmatizer() >>> [wnl.lemmatize(w) for w in brown.words()[:100]] • Comparison >>> [wnl.lemmatize(w) for w in ['investigation', 'women']] >>> [porter.stem(w) for w in ['investigation', 'women']] >>> [lancaster.stem(w) for w in ['investigation', 'women']] MORPHOLOGY > Syntax > Semantics 17
  • 18. Using Regular Expression Operator . ^abc abc$ [abc] [A-Z0-9] ed|ing|s * + ? {n} {n,} {,n} {m,n} a(b|c)+ Behavior Wildcard, matches any character Matches some pattern abc at the start of a string Matches some pattern abc at the end of a string Matches one of a set of characters Matches one of a range of characters Matches one of the specified strings (disjunction) Zero or more of previous item, e.g. a*, [a-z]* (also known as Kleene Closure) One or more of previous item, e.g. a+, [a-z]+ Zero or one of the previous item (i.e. optional), e.g. a?, [a-z]? Exactly n repeats where n is a non-negative integer At least n repeats No more than n repeats At least m and no more than n repeats Parentheses that indicate the scope of the operators MORPHOLOGY > Syntax > Semantics 18
  • 19. Using Regular Expression >>> import re >>> re.findall(r'^(.*?)(ing|ly|ed|ious|ies|ive|es|s|ment)?$', 'reading') [('read', 'ing')] >>> def stem(word): ... regexp = r'^(.*?)(ing|ly|ed|ious|ies|ive|es|s|ment)?$' ... stem, suffix = re.findall(regexp, word)[0] ... return stem ... >>> stem('reading') 'read' >>> stem('moment') 'moment' MORPHOLOGY > Syntax > Semantics 19
  • 20. Spoken Dialog Systems Morphology > SYNTAX > Semantics 20
  • 21. Lexical Resources • collection of words with association information (annotation) • Ex: stopwords – high-frequency words with little lexical content >>> from nltk.corpus import stopwords >>> stopwords.words('english') >>> stopwords.words('german') MORPHOLOGY > Syntax > Semantics 21
  • 22. Part-of-Speech (POS) Tagging • the process of labeling and classifying words to a particular part of speech based on its definition and context Morphology > SYNTAX > Semantics 22
  • 23. NLTKs POS Tag Sets* – 1/2 Tag ADJ ADV CNJ DET EX FW MOD N NP Meaning adjective adverb conjunction determiner existential foreign word modal verb noun proper noun Examples new, good, high, special, big, local really, already, still, early, now and, or, but, if, while, although the, a, some, most, every, no there, there's dolce, ersatz, esprit, quo, maitre will, can, would, may, must, should year, home, costs, time, education Alison, Africa, April, Washington *simplified Morphology > SYNTAX > Semantics 23
  • 24. NLTKs POS Tag Sets* – 2/2 Tag NUM PRO P TO UH V VD VG VN WH Meaning number pronoun preposition the word to interjection verb past tense present participle past participle wh determiner Examples twenty-four, fourth, 1991, 14:24 he, their, her, its, my, I, us on, of, at, with, by, into, under to ah, bang, ha, whee, hmpf, oops is, has, get, do, make, see, run said, took, told, made, asked making, going, playing, working given, taken, begun, sung who, which, when, what, where, how *simplified Morphology > SYNTAX > Semantics 24
  • 25. NLTK POS Tagger (Brown) >>> nltk.pos_tag(brown.words()[:30]) [('The', 'DT'), ('Fulton', 'NNP'), ('County', 'NNP'), ('Grand', 'NNP'), ('Jury', 'NNP'), ('said', 'VBD'), ('Friday', 'NNP'), ('an', 'DT'), ('investigation', 'NN'), ('of', 'IN'), ("Atlanta's", 'JJ'), ('recent', 'JJ'), ('primary', 'JJ'), ('election', 'NN'), ('produced', 'VBN'), ('``', '``'), ('no', 'DT'), ('evidence', 'NN'), ("''", "''"), ('that', 'WDT'), ('any', 'DT'), ('irregularities', 'NNS'), ('took', 'VBD'), ('place', 'NN'), ('.', '.'), ('The', 'DT'), ('jury', 'NN'), ('further', 'RB'), ('said', 'VBD'), ('in', 'IN')] >>> brown.tagged_words(simplify_tags=True) [('The', 'DET'), ('Fulton', 'NP'), ('County', 'N'), ...] Morphology > SYNTAX > Semantics 25
  • 26. NLTK POS Tagger (German) >>> german = nltk.corpus.europarl_raw.german >>> nltk.pos_tag(german.words()[:30]) [(u'Wiederaufnahme', 'NNP'), (u'der', 'NN'), (u'Sitzungsperiode', 'NNP'), (u'Ich', 'NNP'), (u'erklxe4re', 'NNP'), (u'die', 'VB'), (u'am', 'NN'), (u'Freita g', 'NNP'), (u',', ','), (u'dem', 'NN'), (u'17.', 'CD'), (u'Dezember', 'NNP'), (u' unterbrochene', 'NN'), (u'Sitzungsperiode', 'NNP'), (u'des', 'VBZ'), (u'Eur opxe4ischen', 'JJ'), (u'Parlaments', 'NNS'), (u'fxfcr', 'JJ'), (u'wiederaufg enommen', 'NNS'), (u',', ','), (u'wxfcnsche', 'NNP'), (u'Ihnen', 'NNP'), (u' nochmals', 'NNS'), (u'alles', 'VBZ'), (u'Gute', 'NNP'), (u'zum', 'NN'), (u'Ja hreswechsel', 'NNP'), (u'und', 'NN'), (u'hoffe', 'NN'), (u',', ',')] xe4 = ä xfc = ü !!! DOES NOT WORK FOR GERMAN Morphology > SYNTAX > Semantics 26
  • 27. NLTK POS Dictionary >>> pos = nltk.defaultdict(lambda:'N') >>> pos['eat'] 'N' >>> pos.items() [('eat', 'N')] >>> for (word, tag) in brown.tagged_words(simplify_tags=True): ... if word in pos: ... if isinstance(pos[word], str): ... new_list = [pos[word]] ... pos[word] = new_list ... if tag not in pos[word]: ... pos[word].append(tag) ... else: ... pos[word] = [tag] ... >>> pos['eat'] ['N', 'V'] Morphology > SYNTAX > Semantics 27
  • 28. What else can you do with NLTK? • Other Taggers – Unigram Tagging • nltk.UnigramTagger() • train tagger using tagged sentence data – N-gram Tagging • Text classification using machine learning techniques – decision trees – naïve Bayes classification (supervised) – Markov Models Morphology > SYNTAX > SEMANTICS 28
  • 29. Gensim • Tool that extracts semantic structure of documents, by examining word statistical cooccurrence patterns within a corpus of training documents. • Algorithms: 1. Latent Semantic Analysis (LSA) 2. Latent Dirichlet Allocation (LDA) or Random Projections Morphology > Syntax > SEMANTICS 29
  • 30. Gensim • Features – memory independent – wrappers/converters for several data formats • Vector – representation of the document as an array of features or question-answer pair 1. 2. 3. (word occurrence, count) (paragraph, count) (font, count) • Model – transformation from one vector to another – learned from a training corpus without supervision Morphology > Syntax > SEMANTICS 30
  • 32. Other NLP tools for Python • TextBlob – part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation – https://pypi.python.org/pypi/textblob • Pattern – part-of-speech taggers, n-gram search, sentiment analysis, WordNet, machine learning – http://www.clips.ua.ac.be/pattern 32
  • 33. Star Trek technology that became a reality http://www.youtube.com/watch?v=sRZxwR IH9RI
  • 34. Installation Guides • NLTK – http://www.nltk.org/install.html – http://www.nltk.org/data.html • Gensim – http://radimrehurek.com/gensim/install.html • Palito – http://ccs.dlsu.edu.ph:8086/Palito/find_project.js p 34
  • 35. Using iPython • http://ipython.org/install.html >>> documents = ["Human machine interface for lab abc computer applications", >>> "A survey of user opinion of computer system response time", >>> "The EPS user interface management system", >>> "System and human system engineering testing of EPS", >>> "Relation of user perceived response time to error measurement", >>> "The generation of random binary unordered trees", >>> "The intersection graph of paths in trees", >>> "Graph minors IV Widths of trees and well quasi ordering", >>> "Graph minors A survey"] 35
  • 36. References • Natural Language Processing with Python By Steven Bird, Ewan Klein, Edward Loper • http://www.nltk.org/book/ • http://radimrehurek.com/gensim/tutorial.htm l 36
  • 37. Thank You! • For questions and comments: - ann at auberonsolutions dot com 37