Natural Language Processing (NLP) is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language.
2. www.decideo.fr/bruley
Natural Language Processing (NLP)Natural Language Processing (NLP)
NLP is the branch of computer science focused on developing systems
that allow computers to communicate with people using everyday
language
NLP is considered as a sub-field of artificial intelligence and has
significant overlap with the field of computational linguistics. It is
concerned with the interactions between computers and human (natural)
languages.
– Natural language generation systems convert information from
computer databases into readable human language
– Natural language understanding systems convert human language
into representations that are easier for computer programs to
manipulate.
NLP encompasses both text and speech, but work on speech processing
has evolved into a separate field
3. www.decideo.fr/bruley
Where does it fit in the CS*Where does it fit in the CS*
taxonomy?taxonomy?
Computers
Artificial Intelligence AlgorithmsDatabases Networking
Robotics SearchNatural Language Processing
Information
Retrieval
Machine
Translation
Language
Analysis
Semantics Parsing* CS = Computer Science
4. www.decideo.fr/bruley
Why Natural Language Processing?Why Natural Language Processing?
Applications for processing large amounts of texts require NLP expertise
Classify text into categories, index and search large texts: Classify documents
by topics, language, author, spam filtering, information retrieval (relevant, not
relevant), sentiment classification (positive, negative)
Extracting data from text: converting unstructured text into structure data
Information extraction: discover names of people and events they participate in,
from a document, …
Automatic summarization: Condense 1 book into 1 page, …
Speech processing, artificial voice: get flight information or book a hotel over
the phone, …
Question answering: find answers to natural language questions in a text
collection or database
Spelling & Grammar Corrections
Plagiarism detection
Automatic translation
Etc.
5. www.decideo.fr/bruley
The problemThe problem
When people see text, they understand its meaning (by and large)
According to research, it deosn’t mttaer in what oredr the ltteers in a
wrod are, the olny iprmoetnt tihng is that the frist and lsat ltteer are in
the rghit pclae. The rset can be a toatl mses and you can sitll raed it
wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by islelf
but the wrod as a wlohe.
When computers see text, they get only character strings (and perhaps
HTML tags)
We'd like computer agents to see meanings and be able to intelligently
process text
These desires have led to many proposals for structured, semantically
marked up formats
But often human beings still resolutely make use of text in human
languages
This problem isn’t likely to just go away
6. www.decideo.fr/bruley
Example: Natural languageExample: Natural language
understandingunderstanding
Raw speech signal
• Speech recognition
Sequence of words spoken
• Syntactic analysis using knowledge of the grammar
Structure of the sentence
• Semantic analysis using info. about meaning of words
Partial representation of meaning of sentence
• Pragmatic analysis using info. about context
Final representation of meaning of sentence
Natural language understanding process – Prof. Carolina Ruiz
7. www.decideo.fr/bruley
Example detail: Syntactic AnalysisExample detail: Syntactic Analysis
The big cat is drinking milk
Noun Phrase Verb Phrase
Determiner Adjective
Phrase
Noun Auxiliary Verb Noun
Phrase
The big cat is drinking milk
• Syntactic analysis involves isolating phrases and sentences into a
hierarchical structure, allowing the study of its constituents.
• For example the sentence “the big cat is drinking milk” can be broken
up into the following constituents:
8. www.decideo.fr/bruley
Why NLP is difficultWhy NLP is difficult
Language is flexible
– New words, new meanings
– Different meanings in different contexts
Language is subtle
– He arrived at the lecture
– He chuckled at the lecture
– He chuckled his way through the lecture
– **He arrived his way through the lecture
Language is complex!
9. www.decideo.fr/bruley
Why NLP is difficultWhy NLP is difficult
MANY hidden variables
– Knowledge about the world
– Knowledge about the context
– Knowledge about human communication techniques
• Can you tell me the time?
Problem of scale
– Many (infinite?) possible words, meanings, context
Problem of sparsity
– Very difficult to do statistical analysis, most things (words,
concepts) are never seen before
Long range correlations
10. www.decideo.fr/bruley
Why NLP is difficultWhy NLP is difficult
Key problems:
– Representation of meaning
– Language presupposes knowledge about the world
– Language only reflects the surface of meaning
– Language presupposes communication between people
11. www.decideo.fr/bruley
Patented Natural Language Processing (NLP)Patented Natural Language Processing (NLP)
“Reads” Every Communication“Reads” Every Communication
Each data feed is parsed
through one or more of the 7
NLP engines
…it is then deconstructed to
provide context, subject, and
other information regarding
the customer (gender, name
etc.)
Finally each identified
customer is matched back to
the Discovery platform data
to gain a full view
Natural language processing (NLP) is the study of the
interactions between computers and natural languages
(e.g., English, Polish). The crucial challenge that NLP
addresses is in deriving meaning from human or natural
language input and allowing consumers to analyze
parsed meanings in large volumes.
12. www.decideo.fr/bruley
For Example….For Example….
I bought an iPad2 for my mom last week. She loves the weight, but doesn’t like the color. She
wishes it came in blue. She says if it came in blue, then she’d buy one for all her friends
Entities (brands, people, locations, times, products…)
Events and relationships (purchasing event, my mom…)
Sentiment (product specifications)
Suggestions (feature specifications)
Intent (to purchase, to leave)
Geo/Temporal
QUESTION: Why is this a big deal?
NLP takes a simple English statement, parses them into the categories above (and more categories)
and VOILA…we got STRUCTURED DATA
14. www.decideo.fr/bruley
This integration provides types, subtypes, super types (“Savings”, “Checking”,
“Investment”)
Inclusion of the Anaphora: Connecting a subject (George Harrison) without
repeating the full name (“He”, “Him”)
Includes other languages besides English
Attensity’s Semantic Annotation Server (ASAS) capabilities
Entity Extraction: Automatic detection and extraction of more than 35 entities such as Name,
Place
Uses Attensity Triples to create context on entities and identify verbs, relationships, actions
Auto Classification: Uses custom classification rules to classify articles by content, sort by
relevance, and discovers repeated information
Exhaustive Extraction: Application of linguistic principles to extract context, entities, and
relationships similar to how the human mind would
Voice Tags: to identify types of statements and auto classify them (Question, Intent,
Conditional)
Creates a unique identifier for each entity for cross reference
Aster + Attensity = CompetitiveAster + Attensity = Competitive
AdvantageAdvantage
16. www.decideo.fr/bruley
New Table: Customer Reactions
Database Record from a Customer Survey
date
10-02-06
region
0006
rec?
4
source
telephone
Why would you recommend/not recommend?
The flight was delayed and flight attendant would
not give us any new information.
Who/What
flight
Behavior
delay
Fact/Triple
flight : delay
Same Record with Relational Facts
Extracted from Notes Field
date region source rec? who-what Behavior Fact/Triple
10-2-12 0006 telephone 4 flight delay flight : delay
10-2-12 0006 telephone 4 information give [not]
information :
give [not]
1-1-13 0007 e-mail 8 i happy [not] i : happy [not]
1-1-13 0007 e-mail 8 rep rude rep : rude
1-1-13 0007 e-mail 8 flight cancel flight : cancel
Original Structured Data
Newly Structured Data
Provided by Attensity
How Triples are Extracted &How Triples are Extracted &
StructuredStructured
Extract
Extract relational facts & Triples
from Notes field
Then Fuse
Populate new table with
attribute values and fuse with
structured data.
Here’s an example of how this process works – you can see in the upper right some of the feedback captured by a call center agent taking a complaint call from a customer – in this sentence – the facts about the flight and the details about the customer’s opinions are extracted into the relational table below. The newly structured facts are FUSED with the available structured data (customer id/segment, date, flight number, etc.) So that any of these facts can be analyzed along with the structured data.