SlideShare a Scribd company logo
1 of 26
Natural Language Processing
             Rada Mihalcea




Fall 2011
Any Light at The End of The Tunnel?




•   Yahoo, Google, Microsoft  Information Retrieval
•   Monster.com, HotJobs.com (Job finders)  Information Extraction +
    Information Retrieval
•   Systran powers Babelfish  Machine Translation
•   Ask Jeeves  Question Answering
•   Myspace, Facebook, Blogspot  Processing of User-Generated Content
•   Tools for “business intelligence”
•   All “Big Guys” have (several) strong NLP research labs:
     – IBM, Microsoft, AT&T, Xerox, Sun, etc.
•   Academia: research in an university environment
Why Natural Language Processing ?
• Huge amounts of data        •   Classify text into categories
                              •   Index and search large texts
   – Internet = at least 20
                              •   Automatic translation
     billions pages
                              •   Speech understanding
   – Intranet                      – Understand phone conversations
• Applications for            • Information extraction
                                   – Extract useful information from resumes
  processing large
                              • Automatic summarization
  amounts of texts                 – Condense 1 book into 1 page
  require NLP expertise       • Question answering
                              • Knowledge acquisition
                              • Text generations / dialogues
Natural?
• Natural Language?
   – Refers to the language spoken by people, e.g. English,
     Japanese, Swahili, as opposed to artificial languages, like
     C++, Java, etc.
• Natural Language Processing
   – Applications that deal with natural language in a way or
     another
• [Computational Linguistics
   – Doing linguistics on computers
   – More on the linguistic side than NLP, but closely related ]
Why Natural Language Processing?
•   kJfmmfj mmmvvv nnnffn333
•   Uj iheale eleee mnster vensi credur
•   Baboi oi cestnitze
•   Coovoel2^ ekk; ldsllk lkdf vnnjfj?
•   Fgmflmllk mlfm kfre xnnn!
Computers Lack Knowledge!
• Computers “see” text in English the same you have
  seen the previous text!
• People have no trouble understanding language
   – Common sense knowledge
   – Reasoning capacity
   – Experience
• Computers have
   – No common sense knowledge
   – No reasoning capacity
Where does it fit in the CS taxonomy?
                        Computers

Databases     Artificial Intelligence       Algorithms      Networking


   Robotics       Natural Language Processing              Search


    Information          Machine               Language
     Retrieval          Translation             Analysis


                                        Semantics   Parsing
Linguistics Levels of Analysis
• Speech
• Written language
  –   Phonology: sounds / letters / pronunciation
  –   Morphology: the structure of words
  –   Syntax: how these sequences are structured
  –   Semantics: meaning of the strings
• Interaction between levels
Issues in Syntax
“the dog ate my homework” - Who did what?
1. Identify the part of speech (POS)
  Dog = noun ; ate = verb ; homework = noun
  English POS tagging: 95%


2. Identify collocations
    mother in law, hot dog
    Compositional versus non-compositional
    collocates
Issues in Syntax
• Shallow parsing:
  “the dog chased the bear”
  “the dog” “chased the bear”
  subject - predicate
  Identify basic structures
  NP-[the dog] VP-[chased the bear]
Issues in Syntax
• Full parsing: John loves Mary




  Help figuring out (automatically) questions like: Who did what
  and when?
More Issues in Syntax
• Anaphora Resolution:
“The dog entered my room. It scared me”

• Preposition Attachment
“I saw the man in the park with a telescope”
Issues in Semantics
•   Understand language! How?
•   “plant” = industrial plant
•   “plant” = living organism
•   Words are ambiguous
•   Importance of semantics?
    – Machine Translation: wrong translations
    – Information Retrieval: wrong information
    – Anaphora Resolution: wrong referents
Why Semantics?
• The sea is at the home for billions factories and
  animals

• The sea is home to million of plants and
  animals
• English  French [commercial MT system]
• Le mer est a la maison de billion des usines et
  des animaux
• French  English
Issues in Semantics
• How to learn the meaning of words?
• From dictionaries:
  plant, works, industrial plant -- (buildings for carrying on
  industrial labor; "they built a large plant to manufacture
  automobiles")
  plant, flora, plant life -- (a living organism lacking the power of
  locomotion)
They are producing about 1,000 automobiles in the new plant
The sea flora consists in 1,000 different plant species
The plant was close to the farm of animals.
Issues in Semantics
• Learn from annotated examples:
  – Assume 100 examples containing “plant”
    previously tagged by a human
  – Train a learning algorithm
  – How to choose the learning algorithm?
  – How to obtain the 100 tagged examples?
Issues in Information Extraction
• “There was a group of about 8-9 people close to
  the entrance on Highway 75”
• Who? “8-9 people”
• Where? “highway 75”

• Extract information
• Detect new patterns:
  – Detect hacking / hidden information / etc.
• Gov./mil. puts lots of money put into IE
  research
Issues in Information Retrieval
• General model:
   – A huge collection of texts
   – A query
• Task: find documents that are relevant to the given
  query
• How? Create an index, like the index in a book
• More …
   – Vector-space models
   – Boolean models
• Examples: Google, Yahoo, Altavista, etc.
Issues in Information Retrieval
•   Retrieve specific information
•   Question Answering
•   “What is the height of mount Everest?”
•   11,000 feet
Issues in Information Retrieval
• Find information across languages!
• Cross Language Information Retrieval
• “What is the minimum age requirement for car
  rental in Italy?”
• Search also Italian texts for “eta minima per
  noleggio macchine”
• Integrate large number of languages
• Integrate into performant IR engines
Issues in Machine Translations
• Text to Text Machine Translations
• Speech to Speech Machine Translations

• Most of the work has addressed pairs of widely
  spread languages like English-French, English-
  Chinese
Issues in Machine Translations
• How to translate text?
  – Learn from previously translated data
 Need parallel corpora
• French-English, Chinese-English have the
  Hansards
• Reasonable translations
• Chinese-Hindi – no such tools available today!
Even More
• Discourse
• Summarization
• Subjectivity and sentiment analysis
• Text generation, dialog [pass the Turing test
  for some million dollars] – Loebner prize
• Knowledge acquisition [how to get that
  common sense knowledge]
• Speech processing
What will we study this semester?
• Intro to Perl
   – Great great for text processing
   – Fast: one person can do the work of ten others
   – Easy to pick up
• Some linguistic basics
   – Structure of English
   – Parts of speech, phrases, parsing
• Morphology
• N-grams
   – Also multi-word expressions
• Part of speech tagging
• Syntactic parsing
• Semantics
   – Word sense disambiguation
   – Semantic relations
What will we study this semester?
•   Information Retrieval
•   Question answering
•   Text classification
•   Text summarization
•   Sentiment analysis

• Depending on time, we may touch on
    –   Speech recognition
    –   Dialogue
    –   Text generation
    –   Other topics of your interest
Administrivia
•   Instructor: Rada Mihalcea, F228, rada@cs.unt.edu
•   Class meetings: TTh 11-12:20pm
•   Office hours: TTh 4:00-5:00pm
•   TA: TBA
•   Textbook: Speech and Language Processing, by Jurafsky
    and Martin (2nd edition)
    – Recommended: Statistical Methods in NLP, by Manning and
      Schutze
• Other readings (papers) may be assigned throughout the
  semester
• Grading: Assignments, 2 exams, term project
    – Late submission policy for assignments: can submit up to three days late,
      with 10% penalty / day

More Related Content

What's hot

Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingA Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingGuy De Pauw
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingDavid Rostcheck
 
Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Dolování dat z řeči pro bezpečnostní aplikace - Jan ČernockýDolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Dolování dat z řeči pro bezpečnostní aplikace - Jan ČernockýSecurity Session
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalTony Russell-Rose
 
Natural language processing and its application in ai
Natural language processing and its application in aiNatural language processing and its application in ai
Natural language processing and its application in aiRam Kumar
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 
COMPUTATIONAL LINGUISTICS
COMPUTATIONAL LINGUISTICSCOMPUTATIONAL LINGUISTICS
COMPUTATIONAL LINGUISTICSRahul Motipalle
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games ResearchJose Zagal
 
自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭Yuya Unno
 

What's hot (19)

Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Nlp
NlpNlp
Nlp
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingA Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Dolování dat z řeči pro bezpečnostní aplikace - Jan ČernockýDolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Natural language processing and its application in ai
Natural language processing and its application in aiNatural language processing and its application in ai
Natural language processing and its application in ai
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
COMPUTATIONAL LINGUISTICS
COMPUTATIONAL LINGUISTICSCOMPUTATIONAL LINGUISTICS
COMPUTATIONAL LINGUISTICS
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭
 

Viewers also liked

LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)rchbeir
 
Boolean retrieval
Boolean retrievalBoolean retrieval
Boolean retrievalsaireya _
 
Relevancy Hacks for eCommerce
Relevancy Hacks for eCommerceRelevancy Hacks for eCommerce
Relevancy Hacks for eCommercelucenerevolution
 
Information Retrieval (for beginners)
Information Retrieval (for beginners)Information Retrieval (for beginners)
Information Retrieval (for beginners)James Melzer
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Bachelor in Math - Skills you will learn which are not written in the syllabus
Bachelor in Math - Skills you will learn which are not written in the syllabusBachelor in Math - Skills you will learn which are not written in the syllabus
Bachelor in Math - Skills you will learn which are not written in the syllabusNathan Levy
 
CommerceSearch: Moving from FAST to Solr on ATG
CommerceSearch: Moving from FAST to Solr on ATGCommerceSearch: Moving from FAST to Solr on ATG
CommerceSearch: Moving from FAST to Solr on ATGlucenerevolution
 
Information retrieval basics_v1.0
Information retrieval basics_v1.0Information retrieval basics_v1.0
Information retrieval basics_v1.0Sergey Chernov
 
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Lucidworks
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes
 
powerpoint
powerpointpowerpoint
powerpointbutest
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability MahoutJake Mannix
 
Information retrieval based on word sens 1
Information retrieval based on word sens 1Information retrieval based on word sens 1
Information retrieval based on word sens 1ATHMAN HAJ-HAMOU
 
Mining Product Synonyms - Slides
Mining Product Synonyms - SlidesMining Product Synonyms - Slides
Mining Product Synonyms - SlidesAnkush Jain
 
Concept search for e commerce with solr
Concept search for e commerce with solrConcept search for e commerce with solr
Concept search for e commerce with solrlucenerevolution
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITAnkit Sharma
 

Viewers also liked (20)

Beginner avr
Beginner avrBeginner avr
Beginner avr
 
Nlp
NlpNlp
Nlp
 
Visual Search
Visual SearchVisual Search
Visual Search
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
 
Boolean retrieval
Boolean retrievalBoolean retrieval
Boolean retrieval
 
Relevancy Hacks for eCommerce
Relevancy Hacks for eCommerceRelevancy Hacks for eCommerce
Relevancy Hacks for eCommerce
 
Information Retrieval (for beginners)
Information Retrieval (for beginners)Information Retrieval (for beginners)
Information Retrieval (for beginners)
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Bachelor in Math - Skills you will learn which are not written in the syllabus
Bachelor in Math - Skills you will learn which are not written in the syllabusBachelor in Math - Skills you will learn which are not written in the syllabus
Bachelor in Math - Skills you will learn which are not written in the syllabus
 
CommerceSearch: Moving from FAST to Solr on ATG
CommerceSearch: Moving from FAST to Solr on ATGCommerceSearch: Moving from FAST to Solr on ATG
CommerceSearch: Moving from FAST to Solr on ATG
 
Information retrieval basics_v1.0
Information retrieval basics_v1.0Information retrieval basics_v1.0
Information retrieval basics_v1.0
 
Julia text mining_inmobi
Julia text mining_inmobiJulia text mining_inmobi
Julia text mining_inmobi
 
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
powerpoint
powerpointpowerpoint
powerpoint
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability Mahout
 
Information retrieval based on word sens 1
Information retrieval based on word sens 1Information retrieval based on word sens 1
Information retrieval based on word sens 1
 
Mining Product Synonyms - Slides
Mining Product Synonyms - SlidesMining Product Synonyms - Slides
Mining Product Synonyms - Slides
 
Concept search for e commerce with solr
Concept search for e commerce with solrConcept search for e commerce with solr
Concept search for e commerce with solr
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIIT
 

Similar to Intro

Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 documentUma Kant
 
Natural_Language_Processing_1.ppt
Natural_Language_Processing_1.pptNatural_Language_Processing_1.ppt
Natural_Language_Processing_1.ppttestbest6
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptxbuivantan_uneti
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
introduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).pptintroduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).pptTemesgenTolcha2
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptOlusolaTop
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Jarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingJarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingMustafa Jarrar
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 

Similar to Intro (20)

Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 document
 
Natural_Language_Processing_1.ppt
Natural_Language_Processing_1.pptNatural_Language_Processing_1.ppt
Natural_Language_Processing_1.ppt
 
L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
introduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).pptintroduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).ppt
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Jarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingJarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language Processing
 
Diachronic Analysis
Diachronic AnalysisDiachronic Analysis
Diachronic Analysis
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
1004-nlp.ppt
1004-nlp.ppt1004-nlp.ppt
1004-nlp.ppt
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
 

Recently uploaded

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Intro

  • 1. Natural Language Processing Rada Mihalcea Fall 2011
  • 2. Any Light at The End of The Tunnel? • Yahoo, Google, Microsoft  Information Retrieval • Monster.com, HotJobs.com (Job finders)  Information Extraction + Information Retrieval • Systran powers Babelfish  Machine Translation • Ask Jeeves  Question Answering • Myspace, Facebook, Blogspot  Processing of User-Generated Content • Tools for “business intelligence” • All “Big Guys” have (several) strong NLP research labs: – IBM, Microsoft, AT&T, Xerox, Sun, etc. • Academia: research in an university environment
  • 3. Why Natural Language Processing ? • Huge amounts of data • Classify text into categories • Index and search large texts – Internet = at least 20 • Automatic translation billions pages • Speech understanding – Intranet – Understand phone conversations • Applications for • Information extraction – Extract useful information from resumes processing large • Automatic summarization amounts of texts – Condense 1 book into 1 page require NLP expertise • Question answering • Knowledge acquisition • Text generations / dialogues
  • 4. Natural? • Natural Language? – Refers to the language spoken by people, e.g. English, Japanese, Swahili, as opposed to artificial languages, like C++, Java, etc. • Natural Language Processing – Applications that deal with natural language in a way or another • [Computational Linguistics – Doing linguistics on computers – More on the linguistic side than NLP, but closely related ]
  • 5. Why Natural Language Processing? • kJfmmfj mmmvvv nnnffn333 • Uj iheale eleee mnster vensi credur • Baboi oi cestnitze • Coovoel2^ ekk; ldsllk lkdf vnnjfj? • Fgmflmllk mlfm kfre xnnn!
  • 6. Computers Lack Knowledge! • Computers “see” text in English the same you have seen the previous text! • People have no trouble understanding language – Common sense knowledge – Reasoning capacity – Experience • Computers have – No common sense knowledge – No reasoning capacity
  • 7. Where does it fit in the CS taxonomy? Computers Databases Artificial Intelligence Algorithms Networking Robotics Natural Language Processing Search Information Machine Language Retrieval Translation Analysis Semantics Parsing
  • 8. Linguistics Levels of Analysis • Speech • Written language – Phonology: sounds / letters / pronunciation – Morphology: the structure of words – Syntax: how these sequences are structured – Semantics: meaning of the strings • Interaction between levels
  • 9. Issues in Syntax “the dog ate my homework” - Who did what? 1. Identify the part of speech (POS) Dog = noun ; ate = verb ; homework = noun English POS tagging: 95% 2. Identify collocations mother in law, hot dog Compositional versus non-compositional collocates
  • 10. Issues in Syntax • Shallow parsing: “the dog chased the bear” “the dog” “chased the bear” subject - predicate Identify basic structures NP-[the dog] VP-[chased the bear]
  • 11. Issues in Syntax • Full parsing: John loves Mary Help figuring out (automatically) questions like: Who did what and when?
  • 12. More Issues in Syntax • Anaphora Resolution: “The dog entered my room. It scared me” • Preposition Attachment “I saw the man in the park with a telescope”
  • 13. Issues in Semantics • Understand language! How? • “plant” = industrial plant • “plant” = living organism • Words are ambiguous • Importance of semantics? – Machine Translation: wrong translations – Information Retrieval: wrong information – Anaphora Resolution: wrong referents
  • 14. Why Semantics? • The sea is at the home for billions factories and animals • The sea is home to million of plants and animals • English  French [commercial MT system] • Le mer est a la maison de billion des usines et des animaux • French  English
  • 15. Issues in Semantics • How to learn the meaning of words? • From dictionaries: plant, works, industrial plant -- (buildings for carrying on industrial labor; "they built a large plant to manufacture automobiles") plant, flora, plant life -- (a living organism lacking the power of locomotion) They are producing about 1,000 automobiles in the new plant The sea flora consists in 1,000 different plant species The plant was close to the farm of animals.
  • 16. Issues in Semantics • Learn from annotated examples: – Assume 100 examples containing “plant” previously tagged by a human – Train a learning algorithm – How to choose the learning algorithm? – How to obtain the 100 tagged examples?
  • 17. Issues in Information Extraction • “There was a group of about 8-9 people close to the entrance on Highway 75” • Who? “8-9 people” • Where? “highway 75” • Extract information • Detect new patterns: – Detect hacking / hidden information / etc. • Gov./mil. puts lots of money put into IE research
  • 18. Issues in Information Retrieval • General model: – A huge collection of texts – A query • Task: find documents that are relevant to the given query • How? Create an index, like the index in a book • More … – Vector-space models – Boolean models • Examples: Google, Yahoo, Altavista, etc.
  • 19. Issues in Information Retrieval • Retrieve specific information • Question Answering • “What is the height of mount Everest?” • 11,000 feet
  • 20. Issues in Information Retrieval • Find information across languages! • Cross Language Information Retrieval • “What is the minimum age requirement for car rental in Italy?” • Search also Italian texts for “eta minima per noleggio macchine” • Integrate large number of languages • Integrate into performant IR engines
  • 21. Issues in Machine Translations • Text to Text Machine Translations • Speech to Speech Machine Translations • Most of the work has addressed pairs of widely spread languages like English-French, English- Chinese
  • 22. Issues in Machine Translations • How to translate text? – Learn from previously translated data  Need parallel corpora • French-English, Chinese-English have the Hansards • Reasonable translations • Chinese-Hindi – no such tools available today!
  • 23. Even More • Discourse • Summarization • Subjectivity and sentiment analysis • Text generation, dialog [pass the Turing test for some million dollars] – Loebner prize • Knowledge acquisition [how to get that common sense knowledge] • Speech processing
  • 24. What will we study this semester? • Intro to Perl – Great great for text processing – Fast: one person can do the work of ten others – Easy to pick up • Some linguistic basics – Structure of English – Parts of speech, phrases, parsing • Morphology • N-grams – Also multi-word expressions • Part of speech tagging • Syntactic parsing • Semantics – Word sense disambiguation – Semantic relations
  • 25. What will we study this semester? • Information Retrieval • Question answering • Text classification • Text summarization • Sentiment analysis • Depending on time, we may touch on – Speech recognition – Dialogue – Text generation – Other topics of your interest
  • 26. Administrivia • Instructor: Rada Mihalcea, F228, rada@cs.unt.edu • Class meetings: TTh 11-12:20pm • Office hours: TTh 4:00-5:00pm • TA: TBA • Textbook: Speech and Language Processing, by Jurafsky and Martin (2nd edition) – Recommended: Statistical Methods in NLP, by Manning and Schutze • Other readings (papers) may be assigned throughout the semester • Grading: Assignments, 2 exams, term project – Late submission policy for assignments: can submit up to three days late, with 10% penalty / day