Big Data and Natural Language Processing

Michel Bruley
Michel BruleyMarketing Director à Michel Bruley
www.decideo.fr/bruley
Natural Language ProcessingNatural Language Processing
June 2013
Michel Bruley
www.decideo.fr/bruley
Natural Language Processing (NLP)Natural Language Processing (NLP)
NLP is the branch of computer science focused on developing systems
that allow computers to communicate with people using everyday
language
NLP is considered as a sub-field of artificial intelligence and has
significant overlap with the field of computational linguistics. It is
concerned with the interactions between computers and human (natural)
languages.
– Natural language generation systems convert information from
computer databases into readable human language
– Natural language understanding systems convert human language
into representations that are easier for computer programs to
manipulate.
NLP encompasses both text and speech, but work on speech processing
has evolved into a separate field
www.decideo.fr/bruley
Where does it fit in the CS*Where does it fit in the CS*
taxonomy?taxonomy?
Computers
Artificial Intelligence AlgorithmsDatabases Networking
Robotics SearchNatural Language Processing
Information
Retrieval
Machine
Translation
Language
Analysis
Semantics Parsing* CS = Computer Science
www.decideo.fr/bruley
Why Natural Language Processing?Why Natural Language Processing?
Applications for processing large amounts of texts require NLP expertise
Classify text into categories, index and search large texts: Classify documents
by topics, language, author, spam filtering, information retrieval (relevant, not
relevant), sentiment classification (positive, negative)
Extracting data from text: converting unstructured text into structure data
Information extraction: discover names of people and events they participate in,
from a document, …
Automatic summarization: Condense 1 book into 1 page, …
Speech processing, artificial voice: get flight information or book a hotel over
the phone, …
Question answering: find answers to natural language questions in a text
collection or database
Spelling & Grammar Corrections
Plagiarism detection
Automatic translation
Etc.
www.decideo.fr/bruley
The problemThe problem
When people see text, they understand its meaning (by and large)
According to research, it deosn’t mttaer in what oredr the ltteers in a
wrod are, the olny iprmoetnt tihng is that the frist and lsat ltteer are in
the rghit pclae. The rset can be a toatl mses and you can sitll raed it
wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by islelf
but the wrod as a wlohe.
When computers see text, they get only character strings (and perhaps
HTML tags)
We'd like computer agents to see meanings and be able to intelligently
process text
These desires have led to many proposals for structured, semantically
marked up formats
But often human beings still resolutely make use of text in human
languages
This problem isn’t likely to just go away
www.decideo.fr/bruley
Example: Natural languageExample: Natural language
understandingunderstanding
Raw speech signal
• Speech recognition
Sequence of words spoken
• Syntactic analysis using knowledge of the grammar
Structure of the sentence
• Semantic analysis using info. about meaning of words
Partial representation of meaning of sentence
• Pragmatic analysis using info. about context
Final representation of meaning of sentence
Natural language understanding process – Prof. Carolina Ruiz
www.decideo.fr/bruley
Example detail: Syntactic AnalysisExample detail: Syntactic Analysis
The big cat is drinking milk
Noun Phrase Verb Phrase
Determiner Adjective
Phrase
Noun Auxiliary Verb Noun
Phrase
The big cat is drinking milk
• Syntactic analysis involves isolating phrases and sentences into a
hierarchical structure, allowing the study of its constituents.
• For example the sentence “the big cat is drinking milk” can be broken
up into the following constituents:
www.decideo.fr/bruley
Why NLP is difficultWhy NLP is difficult
Language is flexible
– New words, new meanings
– Different meanings in different contexts
Language is subtle
– He arrived at the lecture
– He chuckled at the lecture
– He chuckled his way through the lecture
– **He arrived his way through the lecture
Language is complex!
www.decideo.fr/bruley
Why NLP is difficultWhy NLP is difficult
MANY hidden variables
– Knowledge about the world
– Knowledge about the context
– Knowledge about human communication techniques
• Can you tell me the time?
Problem of scale
– Many (infinite?) possible words, meanings, context
Problem of sparsity
– Very difficult to do statistical analysis, most things (words,
concepts) are never seen before
Long range correlations
www.decideo.fr/bruley
Why NLP is difficultWhy NLP is difficult
Key problems:
– Representation of meaning
– Language presupposes knowledge about the world
– Language only reflects the surface of meaning
– Language presupposes communication between people
www.decideo.fr/bruley
Patented Natural Language Processing (NLP)Patented Natural Language Processing (NLP)
“Reads” Every Communication“Reads” Every Communication
 Each data feed is parsed
through one or more of the 7
NLP engines
 …it is then deconstructed to
provide context, subject, and
other information regarding
the customer (gender, name
etc.)
 Finally each identified
customer is matched back to
the Discovery platform data
to gain a full view
Natural language processing (NLP) is the study of the
interactions between computers and natural languages
(e.g., English, Polish). The crucial challenge that NLP
addresses is in deriving meaning from human or natural
language input and allowing consumers to analyze
parsed meanings in large volumes.
www.decideo.fr/bruley
For Example….For Example….
I bought an iPad2 for my mom last week. She loves the weight, but doesn’t like the color. She
wishes it came in blue. She says if it came in blue, then she’d buy one for all her friends
Entities (brands, people, locations, times, products…)
Events and relationships (purchasing event, my mom…)
Sentiment (product specifications)
Suggestions (feature specifications)
Intent (to purchase, to leave)
Geo/Temporal
QUESTION: Why is this a big deal?
NLP takes a simple English statement, parses them into the categories above (and more categories)
and VOILA…we got STRUCTURED DATA
www.decideo.fr/bruley
Aster
ASTER DISCOVERY
PLATFORM
“Now-
structured”
data
“Now-
structured”
data
ArchitectureArchitecture
Customers /
Sales / Other
data
Customers /
Sales / Other
data
Churn Score
SQL MR
Churn Score
SQL MR
Attensity Pipeline
Real-time
annotated
social media
data feed:
150+ million
social and
online sources
Other Unstructured Data
Emails; Surveys;
CRM Notes….
Pipeline Connector
ASAS
Wrapper
SQL MR
ASAS
Wrapper
SQL MR
NLP
ETL
Visualization
(e.g., Tableau,
MSTR)
Predictive
www.decideo.fr/bruley
 This integration provides types, subtypes, super types (“Savings”, “Checking”,
“Investment”)
 Inclusion of the Anaphora: Connecting a subject (George Harrison) without
repeating the full name (“He”, “Him”)
 Includes other languages besides English
 Attensity’s Semantic Annotation Server (ASAS) capabilities
 Entity Extraction: Automatic detection and extraction of more than 35 entities such as Name,
Place
 Uses Attensity Triples to create context on entities and identify verbs, relationships, actions
 Auto Classification: Uses custom classification rules to classify articles by content, sort by
relevance, and discovers repeated information
 Exhaustive Extraction: Application of linguistic principles to extract context, entities, and
relationships similar to how the human mind would
 Voice Tags: to identify types of statements and auto classify them (Question, Intent,
Conditional)
 Creates a unique identifier for each entity for cross reference
Aster + Attensity = CompetitiveAster + Attensity = Competitive
AdvantageAdvantage
www.decideo.fr/bruley
Structuring Unstructured Data: ProcessStructuring Unstructured Data: Process
FlowFlow
The flight was delayed and flight attendant would not give us
any new information.
www.decideo.fr/bruley
New Table: Customer Reactions
Database Record from a Customer Survey
date
10-02-06
region
0006
rec?
4
source
telephone
Why would you recommend/not recommend?
The flight was delayed and flight attendant would
not give us any new information.
Who/What
flight
Behavior
delay
Fact/Triple
flight : delay
Same Record with Relational Facts
Extracted from Notes Field
date region source rec? who-what Behavior Fact/Triple
10-2-12 0006 telephone 4 flight delay flight : delay
10-2-12 0006 telephone 4 information give [not]
information :
give [not]
1-1-13 0007 e-mail 8 i happy [not] i : happy [not]
1-1-13 0007 e-mail 8 rep rude rep : rude
1-1-13 0007 e-mail 8 flight cancel flight : cancel
Original Structured Data
Newly Structured Data
Provided by Attensity
How Triples are Extracted &How Triples are Extracted &
StructuredStructured
Extract
Extract relational facts & Triples
from Notes field
Then Fuse
Populate new table with
attribute values and fuse with
structured data.
www.decideo.fr/bruley
Team PowerTeam Power
1 sur 17

Recommandé

Introduction to natural language processing (NLP) par
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
509 vues32 diapositives
Natural language processing (NLP) introduction par
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
11.5K vues35 diapositives
Natural Language Processing par
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
4K vues54 diapositives
Natural Language Processing par
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
779 vues24 diapositives
Natural lanaguage processing par
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
848 vues34 diapositives
Nlp par
NlpNlp
NlpNishanthini Mary
2.1K vues43 diapositives

Contenu connexe

Tendances

Natural language processing par
Natural language processingNatural language processing
Natural language processingAbash shah
1K vues14 diapositives
Natural language processing par
Natural language processingNatural language processing
Natural language processingYogendra Tamang
10K vues23 diapositives
Natural Language Processing (NLP) par
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
26K vues61 diapositives
Natural language processing par
Natural language processingNatural language processing
Natural language processingprashantdahake
20.8K vues12 diapositives
Introduction to Natural Language Processing par
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
17.7K vues24 diapositives
Natural Language Processing par
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
5.8K vues17 diapositives

Tendances(20)

Natural language processing par Abash shah
Natural language processingNatural language processing
Natural language processing
Abash shah1K vues
Natural Language Processing (NLP) par Yuriy Guts
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts26K vues
Natural language processing par prashantdahake
Natural language processingNatural language processing
Natural language processing
prashantdahake20.8K vues
Introduction to Natural Language Processing par rohitnayak
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
rohitnayak17.7K vues
Natural Language Processing par Rishikese MR
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Rishikese MR5.8K vues
Introduction to Natural Language Processing par Pranav Gupta
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta31.9K vues
Natural language processing (nlp) par Kuppusamy P
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P3.4K vues
Introduction to Natural Language Processing par Mercy Rani
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Mercy Rani540 vues
Natural Language Processing par saurabhnarhe
Natural Language ProcessingNatural Language Processing
Natural Language Processing
saurabhnarhe1.2K vues
Introduction to Natural Language Processing (NLP) par VenkateshMurugadas
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
Natural language processing PPT presentation par Sai Mohith
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
Sai Mohith816 vues
Introduction to natural language processing, history and origin par Shubhankar Mohan
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
Shubhankar Mohan357 vues
Natural Language Processing seminar review par Jayneel Vora
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
Jayneel Vora4K vues
Recent trends in natural language processing par Balayogi G
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processing
Balayogi G222 vues

En vedette

Natural language processing par
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
72.6K vues28 diapositives
Practical Natural Language Processing par
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
7.8K vues110 diapositives
RDBMS & noSQL: Mixed for best performance par
RDBMS & noSQL: Mixed for best performanceRDBMS & noSQL: Mixed for best performance
RDBMS & noSQL: Mixed for best performanceJavier Tomas Zon
423 vues19 diapositives
NOVA Data Science Meetup 1/19/2017 - Presentation 2 par
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA DATASCIENCE
404 vues81 diapositives
معرفی روش‌های تحقیق در شبکه های اجتماعی par
معرفی روش‌های تحقیق در شبکه های اجتماعیمعرفی روش‌های تحقیق در شبکه های اجتماعی
معرفی روش‌های تحقیق در شبکه های اجتماعیEbrahim Eskandari
2.5K vues35 diapositives
تحلیل احساسات در شبکه های اجتماعی par
تحلیل احساسات در شبکه های اجتماعیتحلیل احساسات در شبکه های اجتماعی
تحلیل احساسات در شبکه های اجتماعیHamed Azizi
4.2K vues7 diapositives

En vedette(20)

RDBMS & noSQL: Mixed for best performance par Javier Tomas Zon
RDBMS & noSQL: Mixed for best performanceRDBMS & noSQL: Mixed for best performance
RDBMS & noSQL: Mixed for best performance
Javier Tomas Zon423 vues
NOVA Data Science Meetup 1/19/2017 - Presentation 2 par NOVA DATASCIENCE
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA DATASCIENCE404 vues
معرفی روش‌های تحقیق در شبکه های اجتماعی par Ebrahim Eskandari
معرفی روش‌های تحقیق در شبکه های اجتماعیمعرفی روش‌های تحقیق در شبکه های اجتماعی
معرفی روش‌های تحقیق در شبکه های اجتماعی
Ebrahim Eskandari2.5K vues
تحلیل احساسات در شبکه های اجتماعی par Hamed Azizi
تحلیل احساسات در شبکه های اجتماعیتحلیل احساسات در شبکه های اجتماعی
تحلیل احساسات در شبکه های اجتماعی
Hamed Azizi4.2K vues
LEXICAL RELATIONS AND ITS APPLICATION ON "THE KITE" par Hameel Khan
LEXICAL RELATIONS AND ITS APPLICATION ON "THE KITE"LEXICAL RELATIONS AND ITS APPLICATION ON "THE KITE"
LEXICAL RELATIONS AND ITS APPLICATION ON "THE KITE"
Hameel Khan2.5K vues
Natural Language Processing (NLP), Search and Wearable Technology par pixelbuilders
Natural Language Processing (NLP), Search and Wearable TechnologyNatural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable Technology
pixelbuilders3K vues
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2... par Nicolas Kourtellis
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
Big Data and Visualization par Michel Bruley
Big Data and VisualizationBig Data and Visualization
Big Data and Visualization
Michel Bruley3.9K vues
Big Data and Marketing Attribution par Michel Bruley
Big Data and Marketing AttributionBig Data and Marketing Attribution
Big Data and Marketing Attribution
Michel Bruley1.5K vues
Human Computer Interaction HCI par Gaditek
Human Computer Interaction HCI Human Computer Interaction HCI
Human Computer Interaction HCI
Gaditek3.9K vues
Tip from IBM Connect 2014: Socialytics = Social Business, Big Social Data and... par SocialBiz UserGroup
Tip from IBM Connect 2014: Socialytics = Social Business, Big Social Data and...Tip from IBM Connect 2014: Socialytics = Social Business, Big Social Data and...
Tip from IBM Connect 2014: Socialytics = Social Business, Big Social Data and...

Similaire à Big Data and Natural Language Processing

The impact of standardized terminologies and domain-ontologies in multilingua... par
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...AIMS (Agricultural Information Management Standards)
901 vues21 diapositives
Corpora, Blogs and Linguistic Variation (Paderborn) par
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Cornelius Puschmann
869 vues30 diapositives
Nature Vs Nurture Physical Development par
Nature Vs Nurture Physical DevelopmentNature Vs Nurture Physical Development
Nature Vs Nurture Physical DevelopmentKatherine Alexander
3 vues79 diapositives
FinalReport par
FinalReportFinalReport
FinalReportVinh Xuan Ho
138 vues4 diapositives
Language Processing And Memory Retrieval par
Language Processing And Memory RetrievalLanguage Processing And Memory Retrieval
Language Processing And Memory RetrievalSusan Tullis
2 vues80 diapositives
Natural Language Processing par
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
139 vues11 diapositives

Similaire à Big Data and Natural Language Processing(20)

Language Processing And Memory Retrieval par Susan Tullis
Language Processing And Memory RetrievalLanguage Processing And Memory Retrieval
Language Processing And Memory Retrieval
Susan Tullis2 vues
Natural Language Processing par punedevscom
Natural Language ProcessingNatural Language Processing
Natural Language Processing
punedevscom139 vues
IS-EUD-2015, Madrid, Spain, 27 May 2015 par Charith Perera
IS-EUD-2015, Madrid, Spain, 27 May 2015IS-EUD-2015, Madrid, Spain, 27 May 2015
IS-EUD-2015, Madrid, Spain, 27 May 2015
Charith Perera1.5K vues
Using construction grammar in conversational systems par CJ Jenkins
Using construction grammar in conversational systemsUsing construction grammar in conversational systems
Using construction grammar in conversational systems
CJ Jenkins2.7K vues
An Overview Of Natural Language Processing par Scott Faria
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language Processing
Scott Faria5 vues
Teachbot teaching robot_using_artificial par CamillaTonanzi
Teachbot teaching robot_using_artificialTeachbot teaching robot_using_artificial
Teachbot teaching robot_using_artificial
CamillaTonanzi1.1K vues
Text-mining and Automation par benosteen
Text-mining and AutomationText-mining and Automation
Text-mining and Automation
benosteen678 vues
Wireless Voice Controlled (Natural Language Processing) par Tracy Medellin
Wireless Voice Controlled (Natural Language Processing)Wireless Voice Controlled (Natural Language Processing)
Wireless Voice Controlled (Natural Language Processing)
Domain Specific Terminology Extraction (ICICT 2006) par IT Industry
Domain Specific Terminology Extraction (ICICT 2006)Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)
IT Industry494 vues
Gadgets pwn us? A pattern language for CALL par Lawrie Hunter
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALL
Lawrie Hunter616 vues
Metaphic or the art of looking another way. par Suresh Manian
Metaphic or the art of looking another way.Metaphic or the art of looking another way.
Metaphic or the art of looking another way.
Suresh Manian1.1K vues

Plus de Michel Bruley

La chute de l'Empire romain comme modèle.pdf par
La chute de l'Empire romain comme modèle.pdfLa chute de l'Empire romain comme modèle.pdf
La chute de l'Empire romain comme modèle.pdfMichel Bruley
51 vues16 diapositives
Synthèse sur Neuville.pdf par
Synthèse sur Neuville.pdfSynthèse sur Neuville.pdf
Synthèse sur Neuville.pdfMichel Bruley
4 vues6 diapositives
Propos sur des sujets qui m'ont titillé.pdf par
Propos sur des sujets qui m'ont titillé.pdfPropos sur des sujets qui m'ont titillé.pdf
Propos sur des sujets qui m'ont titillé.pdfMichel Bruley
7 vues46 diapositives
Propos sur les Big Data.pdf par
Propos sur les Big Data.pdfPropos sur les Big Data.pdf
Propos sur les Big Data.pdfMichel Bruley
12 vues36 diapositives
Sun tzu par
Sun tzuSun tzu
Sun tzuMichel Bruley
228 vues10 diapositives
Georges Anselmi - 1914 - 1918 Campagnes de France et d'Orient par
Georges Anselmi - 1914 - 1918 Campagnes de France et d'OrientGeorges Anselmi - 1914 - 1918 Campagnes de France et d'Orient
Georges Anselmi - 1914 - 1918 Campagnes de France et d'OrientMichel Bruley
419 vues27 diapositives

Plus de Michel Bruley(20)

La chute de l'Empire romain comme modèle.pdf par Michel Bruley
La chute de l'Empire romain comme modèle.pdfLa chute de l'Empire romain comme modèle.pdf
La chute de l'Empire romain comme modèle.pdf
Michel Bruley51 vues
Propos sur des sujets qui m'ont titillé.pdf par Michel Bruley
Propos sur des sujets qui m'ont titillé.pdfPropos sur des sujets qui m'ont titillé.pdf
Propos sur des sujets qui m'ont titillé.pdf
Michel Bruley7 vues
Georges Anselmi - 1914 - 1918 Campagnes de France et d'Orient par Michel Bruley
Georges Anselmi - 1914 - 1918 Campagnes de France et d'OrientGeorges Anselmi - 1914 - 1918 Campagnes de France et d'Orient
Georges Anselmi - 1914 - 1918 Campagnes de France et d'Orient
Michel Bruley419 vues
Big Data POC in communication industry par Michel Bruley
Big Data POC in communication industryBig Data POC in communication industry
Big Data POC in communication industry
Michel Bruley265 vues
Compilation d'autres textes de famille par Michel Bruley
Compilation d'autres textes de familleCompilation d'autres textes de famille
Compilation d'autres textes de famille
Michel Bruley845 vues
Textes de famille concernant les guerres (1814 - 1944) par Michel Bruley
Textes de famille concernant les guerres (1814 - 1944)Textes de famille concernant les guerres (1814 - 1944)
Textes de famille concernant les guerres (1814 - 1944)
Michel Bruley414 vues
Recette de la dinde au whisky par Michel Bruley
Recette de la dinde au whiskyRecette de la dinde au whisky
Recette de la dinde au whisky
Michel Bruley1.2K vues
Irfm mini guide de mauvaise conduite par Michel Bruley
Irfm mini guide de mauvaise  conduiteIrfm mini guide de mauvaise  conduite
Irfm mini guide de mauvaise conduite
Michel Bruley352 vues
Guerre, captivité & evasion de jb v3 par Michel Bruley
Guerre, captivité & evasion de jb   v3Guerre, captivité & evasion de jb   v3
Guerre, captivité & evasion de jb v3
Michel Bruley813 vues
Big Data and GeoMarketing, Geolocation, Geotargeting, Geomatic,… par Michel Bruley
Big Data and GeoMarketing, Geolocation, Geotargeting, Geomatic,…Big Data and GeoMarketing, Geolocation, Geotargeting, Geomatic,…
Big Data and GeoMarketing, Geolocation, Geotargeting, Geomatic,…
Michel Bruley3.5K vues

Dernier

Basic of Air Ticketing & IATA Geography par
Basic of Air Ticketing & IATA GeographyBasic of Air Ticketing & IATA Geography
Basic of Air Ticketing & IATA GeographyMd Shaifullar Rabbi
67 vues27 diapositives
davood_keshavarz_david_keshavarz_criminal_conviction_prison_sentence_judgemen... par
davood_keshavarz_david_keshavarz_criminal_conviction_prison_sentence_judgemen...davood_keshavarz_david_keshavarz_criminal_conviction_prison_sentence_judgemen...
davood_keshavarz_david_keshavarz_criminal_conviction_prison_sentence_judgemen...morshedislam3
17 vues5 diapositives
terms_2.pdf par
terms_2.pdfterms_2.pdf
terms_2.pdfJAWADIQBAL40
18 vues8 diapositives
bookmyshow-1.pptx par
bookmyshow-1.pptxbookmyshow-1.pptx
bookmyshow-1.pptx125071035
15 vues11 diapositives
Nevigating Sucess.pdf par
Nevigating Sucess.pdfNevigating Sucess.pdf
Nevigating Sucess.pdfTEWMAGAZINE
24 vues4 diapositives
Accounts Class 12 project cash flow statement and ratio analysis par
Accounts Class 12 project cash flow statement and ratio analysisAccounts Class 12 project cash flow statement and ratio analysis
Accounts Class 12 project cash flow statement and ratio analysisJinendraPamecha
35 vues42 diapositives

Dernier(20)

davood_keshavarz_david_keshavarz_criminal_conviction_prison_sentence_judgemen... par morshedislam3
davood_keshavarz_david_keshavarz_criminal_conviction_prison_sentence_judgemen...davood_keshavarz_david_keshavarz_criminal_conviction_prison_sentence_judgemen...
davood_keshavarz_david_keshavarz_criminal_conviction_prison_sentence_judgemen...
morshedislam317 vues
bookmyshow-1.pptx par 125071035
bookmyshow-1.pptxbookmyshow-1.pptx
bookmyshow-1.pptx
12507103515 vues
Accounts Class 12 project cash flow statement and ratio analysis par JinendraPamecha
Accounts Class 12 project cash flow statement and ratio analysisAccounts Class 12 project cash flow statement and ratio analysis
Accounts Class 12 project cash flow statement and ratio analysis
JinendraPamecha35 vues
On the Concept of Discovery Power of Enterprise Modeling Languages and its Re... par Ilia Bider
On the Concept of Discovery Power of Enterprise Modeling Languages and its Re...On the Concept of Discovery Power of Enterprise Modeling Languages and its Re...
On the Concept of Discovery Power of Enterprise Modeling Languages and its Re...
Ilia Bider15 vues
Bloomerang Thank Yous Dec 2023.pdf par Bloomerang
Bloomerang Thank Yous Dec 2023.pdfBloomerang Thank Yous Dec 2023.pdf
Bloomerang Thank Yous Dec 2023.pdf
Bloomerang123 vues
Presentation on proposed acquisition of leading European asset manager Aermon... par KeppelCorporation
Presentation on proposed acquisition of leading European asset manager Aermon...Presentation on proposed acquisition of leading European asset manager Aermon...
Presentation on proposed acquisition of leading European asset manager Aermon...
Bloomerang_Forecasting Your Fundraising Revenue 2024.pptx.pdf par Bloomerang
Bloomerang_Forecasting Your Fundraising Revenue 2024.pptx.pdfBloomerang_Forecasting Your Fundraising Revenue 2024.pptx.pdf
Bloomerang_Forecasting Your Fundraising Revenue 2024.pptx.pdf
Bloomerang146 vues
SUGAR cosmetics ppt par shafrinn5
SUGAR cosmetics pptSUGAR cosmetics ppt
SUGAR cosmetics ppt
shafrinn597 vues
Integrating Talent Management Practices par Seta Wicaksana
Integrating Talent Management PracticesIntegrating Talent Management Practices
Integrating Talent Management Practices
Seta Wicaksana134 vues
See the new MTN tariffs effected November 28, 2023 par Kweku Zurek
See the new MTN tariffs effected November 28, 2023See the new MTN tariffs effected November 28, 2023
See the new MTN tariffs effected November 28, 2023
Kweku Zurek29.5K vues
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck par HajeJanKamps
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deckPitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
HajeJanKamps597 vues

Big Data and Natural Language Processing

  • 1. www.decideo.fr/bruley Natural Language ProcessingNatural Language Processing June 2013 Michel Bruley
  • 2. www.decideo.fr/bruley Natural Language Processing (NLP)Natural Language Processing (NLP) NLP is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language NLP is considered as a sub-field of artificial intelligence and has significant overlap with the field of computational linguistics. It is concerned with the interactions between computers and human (natural) languages. – Natural language generation systems convert information from computer databases into readable human language – Natural language understanding systems convert human language into representations that are easier for computer programs to manipulate. NLP encompasses both text and speech, but work on speech processing has evolved into a separate field
  • 3. www.decideo.fr/bruley Where does it fit in the CS*Where does it fit in the CS* taxonomy?taxonomy? Computers Artificial Intelligence AlgorithmsDatabases Networking Robotics SearchNatural Language Processing Information Retrieval Machine Translation Language Analysis Semantics Parsing* CS = Computer Science
  • 4. www.decideo.fr/bruley Why Natural Language Processing?Why Natural Language Processing? Applications for processing large amounts of texts require NLP expertise Classify text into categories, index and search large texts: Classify documents by topics, language, author, spam filtering, information retrieval (relevant, not relevant), sentiment classification (positive, negative) Extracting data from text: converting unstructured text into structure data Information extraction: discover names of people and events they participate in, from a document, … Automatic summarization: Condense 1 book into 1 page, … Speech processing, artificial voice: get flight information or book a hotel over the phone, … Question answering: find answers to natural language questions in a text collection or database Spelling & Grammar Corrections Plagiarism detection Automatic translation Etc.
  • 5. www.decideo.fr/bruley The problemThe problem When people see text, they understand its meaning (by and large) According to research, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt tihng is that the frist and lsat ltteer are in the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by islelf but the wrod as a wlohe. When computers see text, they get only character strings (and perhaps HTML tags) We'd like computer agents to see meanings and be able to intelligently process text These desires have led to many proposals for structured, semantically marked up formats But often human beings still resolutely make use of text in human languages This problem isn’t likely to just go away
  • 6. www.decideo.fr/bruley Example: Natural languageExample: Natural language understandingunderstanding Raw speech signal • Speech recognition Sequence of words spoken • Syntactic analysis using knowledge of the grammar Structure of the sentence • Semantic analysis using info. about meaning of words Partial representation of meaning of sentence • Pragmatic analysis using info. about context Final representation of meaning of sentence Natural language understanding process – Prof. Carolina Ruiz
  • 7. www.decideo.fr/bruley Example detail: Syntactic AnalysisExample detail: Syntactic Analysis The big cat is drinking milk Noun Phrase Verb Phrase Determiner Adjective Phrase Noun Auxiliary Verb Noun Phrase The big cat is drinking milk • Syntactic analysis involves isolating phrases and sentences into a hierarchical structure, allowing the study of its constituents. • For example the sentence “the big cat is drinking milk” can be broken up into the following constituents:
  • 8. www.decideo.fr/bruley Why NLP is difficultWhy NLP is difficult Language is flexible – New words, new meanings – Different meanings in different contexts Language is subtle – He arrived at the lecture – He chuckled at the lecture – He chuckled his way through the lecture – **He arrived his way through the lecture Language is complex!
  • 9. www.decideo.fr/bruley Why NLP is difficultWhy NLP is difficult MANY hidden variables – Knowledge about the world – Knowledge about the context – Knowledge about human communication techniques • Can you tell me the time? Problem of scale – Many (infinite?) possible words, meanings, context Problem of sparsity – Very difficult to do statistical analysis, most things (words, concepts) are never seen before Long range correlations
  • 10. www.decideo.fr/bruley Why NLP is difficultWhy NLP is difficult Key problems: – Representation of meaning – Language presupposes knowledge about the world – Language only reflects the surface of meaning – Language presupposes communication between people
  • 11. www.decideo.fr/bruley Patented Natural Language Processing (NLP)Patented Natural Language Processing (NLP) “Reads” Every Communication“Reads” Every Communication  Each data feed is parsed through one or more of the 7 NLP engines  …it is then deconstructed to provide context, subject, and other information regarding the customer (gender, name etc.)  Finally each identified customer is matched back to the Discovery platform data to gain a full view Natural language processing (NLP) is the study of the interactions between computers and natural languages (e.g., English, Polish). The crucial challenge that NLP addresses is in deriving meaning from human or natural language input and allowing consumers to analyze parsed meanings in large volumes.
  • 12. www.decideo.fr/bruley For Example….For Example…. I bought an iPad2 for my mom last week. She loves the weight, but doesn’t like the color. She wishes it came in blue. She says if it came in blue, then she’d buy one for all her friends Entities (brands, people, locations, times, products…) Events and relationships (purchasing event, my mom…) Sentiment (product specifications) Suggestions (feature specifications) Intent (to purchase, to leave) Geo/Temporal QUESTION: Why is this a big deal? NLP takes a simple English statement, parses them into the categories above (and more categories) and VOILA…we got STRUCTURED DATA
  • 13. www.decideo.fr/bruley Aster ASTER DISCOVERY PLATFORM “Now- structured” data “Now- structured” data ArchitectureArchitecture Customers / Sales / Other data Customers / Sales / Other data Churn Score SQL MR Churn Score SQL MR Attensity Pipeline Real-time annotated social media data feed: 150+ million social and online sources Other Unstructured Data Emails; Surveys; CRM Notes…. Pipeline Connector ASAS Wrapper SQL MR ASAS Wrapper SQL MR NLP ETL Visualization (e.g., Tableau, MSTR) Predictive
  • 14. www.decideo.fr/bruley  This integration provides types, subtypes, super types (“Savings”, “Checking”, “Investment”)  Inclusion of the Anaphora: Connecting a subject (George Harrison) without repeating the full name (“He”, “Him”)  Includes other languages besides English  Attensity’s Semantic Annotation Server (ASAS) capabilities  Entity Extraction: Automatic detection and extraction of more than 35 entities such as Name, Place  Uses Attensity Triples to create context on entities and identify verbs, relationships, actions  Auto Classification: Uses custom classification rules to classify articles by content, sort by relevance, and discovers repeated information  Exhaustive Extraction: Application of linguistic principles to extract context, entities, and relationships similar to how the human mind would  Voice Tags: to identify types of statements and auto classify them (Question, Intent, Conditional)  Creates a unique identifier for each entity for cross reference Aster + Attensity = CompetitiveAster + Attensity = Competitive AdvantageAdvantage
  • 15. www.decideo.fr/bruley Structuring Unstructured Data: ProcessStructuring Unstructured Data: Process FlowFlow The flight was delayed and flight attendant would not give us any new information.
  • 16. www.decideo.fr/bruley New Table: Customer Reactions Database Record from a Customer Survey date 10-02-06 region 0006 rec? 4 source telephone Why would you recommend/not recommend? The flight was delayed and flight attendant would not give us any new information. Who/What flight Behavior delay Fact/Triple flight : delay Same Record with Relational Facts Extracted from Notes Field date region source rec? who-what Behavior Fact/Triple 10-2-12 0006 telephone 4 flight delay flight : delay 10-2-12 0006 telephone 4 information give [not] information : give [not] 1-1-13 0007 e-mail 8 i happy [not] i : happy [not] 1-1-13 0007 e-mail 8 rep rude rep : rude 1-1-13 0007 e-mail 8 flight cancel flight : cancel Original Structured Data Newly Structured Data Provided by Attensity How Triples are Extracted &How Triples are Extracted & StructuredStructured Extract Extract relational facts & Triples from Notes field Then Fuse Populate new table with attribute values and fuse with structured data.

Notes de l'éditeur

  1. Here’s an example of how this process works – you can see in the upper right some of the feedback captured by a call center agent taking a complaint call from a customer – in this sentence – the facts about the flight and the details about the customer’s opinions are extracted into the relational table below. The newly structured facts are FUSED with the available structured data (customer id/segment, date, flight number, etc.) So that any of these facts can be analyzed along with the structured data.