11 007 vues

Publié le

Publié dans : Formation
0 commentaire
8 j’aime
  • Soyez le premier à commenter

Aucun téléchargement
Nombre de vues
11 007
Sur SlideShare
Issues des intégrations
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive


  1. 1. Natural language processing (NLP) Presented By : Mohamed El-Serngawy
  2. 2. Agenda  Definition & Introduction  Steps in NLP  Statistical NLP  Real World Application  Demos with free NLP Application
  3. 3. Definition & Introduction  Natural language processing (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages  Why Natural Language Processing ? Huge amounts of data ◦ Internet = at least 20 billions pages ◦ Intranet Applications for processing large amounts of texts require NLP expertise
  4. 4. Definition & Introduction  We look at how we can exploit knowledge about the world, in combination with linguistic facts, to build computational natural language systems.  Natural language generation systems convert information from computer databases into readable human language, Natural language understanding systems convert samples of human language into more formal representations such as parse trees or first- order logic structures that are easier for computer programs to manipulate
  5. 5. Steps in NLP  Phonetics, Phonology: how Word are prononce in termes of sequences of sounds  Morphological Analysis: Individual words are analyzed into their components and non word tokens such as punctuation are separated from the words.  Syntactic Analysis: Linear sequences of words are transformed into structures that show how the words relate to each other.  Semantic Analysis: The structures created by the syntactic analyzer are assigned meanings.  Discourse integration: The meaning of an individual sentence may depend on the sentences that precede it and may influence the meanings of the sentences that follow it.  Pragmatic Analysis: The structure representing what was said is reinterpreted to determine what was actually meant.
  6. 6. Phonetics Study of the physical sounds of human speech ◦ /i:/, /ɜ:/, /ɔ:/, /ɑ:/ and /u:/ ◦ 'there' => /ðeə/ ◦ 'there on the table' => /ðeər ɒn ðə teɪbl / •Transcription of sounds (IPA)
  7. 7. Phonetic  Articulory phonetics : production • Auditory phonetics : speech perception – McGurk effect • Acoustics phonetics: properties of sound waves (frequency and harmonics)
  8. 8. Morphological Analysis  Suppose we have an english interface to an operating system and the following sentence is typed: ◦ I want to print Bill’s .init file.  Morphological analysis must do the following things: ◦ Pull apart the word “Bill’s” into proper noun “Bill” and the possessive suffix “’s” ◦ Recognize the sequence “.init” as a file extension that is functioning as an adjective in the sentence.  This process will usually assign syntactic categories to all the words in the sentence.  Consider the word “prints”. This word is either a plural noun or a third person singular verb ( he prints ).
  9. 9. Syntactic Analysis  Syntactic analysis must exploit the results of morphological analysis to build a structural description of the sentence.  The goal of this process, called parsing, is to convert the flat list of words that forms the sentence into a structure that defines the units that are represented by that flat list.  The important thing here is that a flat sentence has been converted into a hierarchical structure and that the structure correspond to meaning units when semantic analysis is performed.  Reference markers are shown in the parenthesis in the parse tree  Each one corresponds to some entity that has been mentioned in the sentence.
  10. 10. Syntactic Analysis  Syntactic Processing : Almost all the systems that are actually used have two main components: ◦ A declarative representation, called a grammar, of the syntactic facts about the language. ◦ A procedure, called parser, that compares the grammar against input sentences to produce parsed structures.
  11. 11. Syntactic Analysis  Grammars and Parsers : The most common way to represent grammars is as a set of production rules. A simple Context-free phrase structure grammar fro English:  S → NP VP  NP → the NP1  NP → PRO  NP → PN  NP → NP1  NP1 → ADJS N  ADJS → ε | ADJ ADJS  VP → V  VP → V NP  N → file | printer  PN → Bill  PRO → I  ADJ → short | long | fast  V → printed | created | want First rule can be read as “ A sentence is composed of a noun phrase followed by Verb Phrase”; Vertical bar is OR ; ε represnts empty string. Symbols that are further expanded by rules are called nonterminal symbols. Symbols that correspond directly to strings that must be found in an input sentence are called terminal symbols.
  12. 12. Syntactic Analysis S NP PN Bill VP V printed NP the NP1 ADJS E N file Bill Printed the file A Parse tree for a sentence :
  13. 13. Syntactic Analysis  A parse tree : John ate the apple. 1. S -> NP VP 2. VP -> V NP 3. NP -> NAME 4. NP -> ART N 5. NAME -> John 6. V -> ate 7. ART-> the 8. N -> apple S NP VP NAME John V ate NP ART N the apple
  14. 14. Semantic Analysis  Semantic analysis must do two important things: ◦ It must map individual words into appropriate objects in the knowledge base or database ◦ It must create the correct structures to correspond to the way the meanings of the individual words combine with each other.
  15. 15. Semantic Analysis  Lexical processing :  The first step in any semantic processing system is to look up the individual words in a dictionary ( or lexicon) and extract their meanings.  Many words have several meanings, and it may not be possible to choose the correct one just by looking at the word itself.  The process of determining the correct meaning of an individual word is called word sense disambiguation or lexical disambiguation.  It is done by associating, with each word in lexicon, information about the contexts in which each of the word’s senses may appear.  Sometimes only very straightforward info about each word sense is necessary. For example, baseball field interpretation of diamond could be marked as a LOCATION.  Some useful semantic markers are : ◦ PHYSICAL-OBJECT ◦ ANIMATE-OBJECT ◦ ABSTRACT-OBJECT
  16. 16. Semantic Analysis  Word Net (common sense KnowledgBase) : A database of lexical relations. Inspired by current psycholinguistic theories of human lexical memory. Synset: a set of synonyms, representing one underlying lexical concept ◦ Example:  fool {chump, fish, fool, gull, mark, patsy, fall guy, sucker, schlemiel, shlemiel, soft touch, mug} Relations link the synsets: hypernym, Has- Member, Member-Of, Antonym, etc. 16
  17. 17. Semantic Analysis  Example$ wn bike -partn Part Meronyms of noun bike 2 senses of bike Sense 1 motorcycle, bike HAS PART: mudguard, splashguard Sense 2 bicycle, bike, wheel HAS PART: bicycle seat, saddle HAS PART: bicycle wheel HAS PART: chain HAS PART: coaster brake HAS PART: handlebar HAS PART: mudguard, splashguard HAS PART: pedal, treadle, foot lever HAS PART: sprocket, sprocket wheel 17 • Example$wn bike Information available for noun bike -hypen Hypernyms -hypon, -treen Hyponyms & Hyponym Tree -synsn Synonyms (ordered by frequency) -partn Has Part Meronyms -meron All Meronyms -famln Familiarity & Polysemy Count -coorn Coordinate Sisters -simsn Synonyms (grouped by similarity of meaning) -hmern Hierarchical Meronyms -grepn List of Compound Words -over Overview of Senses Information available for verb bike -hypev Hypernyms -hypov, -treev Hyponyms & Hyponym Tree -synsv Synonyms (ordered by frequency) -famlv Familiarity & Polysemy Count -framv Verb Frames -simsv Synonyms (grouped by similarity of meaning) -grepv List of Compound Words -over Overview of Senses
  18. 18. Discourse Integration  Specifically we do not know whom the pronoun “I” or the proper noun “Bill” refers to.  To pin down these references requires an appeal to a model of the current discourse context, from which we can learn that the current user is USER068 and that the only person named “Bill” about whom we could be talking is USER073.  Once the correct referent for Bill is known, we can also determine exactly which file is being referred to.
  19. 19. Pragmatic Analysis  The final step toward effective understanding is to decide what to do as a results.  One possible thing to do is to record what was said as a fact and be done with it.  For some sentences, whose intended effect is clearly declarative, that is precisely correct thing to do.  But for other sentences, including this one, the intended effect is different.  We can discover this intended effect by applying a set of rules that characterize cooperative dialogues.  The final step in pragmatic processing is to translate, from the knowledge based representation to a command to be executed by the system.  The results of the understanding process is
  20. 20. Pragmatic Analysis  Knowledge about the kind of actions that speakers intend by their use of sentences ◦ REQUEST: HAL, open the pod bay door. ◦ STATEMENT: HAL, the pod bay door is open. ◦ INFORMATION QUESTION: HAL, is the pod bay door open?  Speech act analysis (politeness, irony, greeting, apologizing...)
  21. 21. Statistical NLP  Statistical NLP aims to perform statistical inference for the field of NLP  Statistical inference consists of taking some data generated in accordance with some unknown probability distribution and making inferences.
  22. 22. Motivations for Statistical NLP  Cognitive modeling of the human language processing has not reached a stage where we can have a complete mapping between the language signal and the information contents.  Complete mapping is not always required.  Statistical approach provides the flexibility required for making the modeling of a language more accurate.
  23. 23. Real World Application  Automatic summarization  Foreign language reading aid  Foreign language writing aid  Information extraction  Information retrieval (IR) - IR is concerned with storing, searching and retrieving information. It is a separate field within computer science (closer to databases), but IR relies on some NLP methods (for example, stemming). Some current research and applications seek to bridge the gap between IR and NLP.  Machine translation - Automatically translating from one human language to another.  Named entity recognition (NER) - Given a stream of text, determining which items in the text map to proper names, such as people or places. Although in English, named entities are marked with capitalized words, many other languages do not use capitalization to distinguish named entities.  Natural language generation  Natural language search
  24. 24. Real World Application  Natural language understanding  Optical character recognition  anaphora resolution  Query expansion  Question answering - Given a human language question, the task of producing a human-language answer. The question may be a closed-ended (such as "What is the capital of Canada?") or open-ended (such as "What is the meaning of life?").  Speech recognition - Given a sound clip of a person or people speaking, the task of producing a text dictation of the speaker(s). (The opposite of text to speech.)  Spoken dialogue system  Stemming  Text simplification  Text-to-speech  Text-proofing
  25. 25. Demos with free NLP Application DEMO
  26. 26. THANKS