Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Information extraction for Free Text

1 319 vues

Publié le

  • Identifiez-vous pour voir les commentaires

  • Soyez le premier à aimer ceci

Information extraction for Free Text

  1. 1. Plain Text Information Extraction (based on Machine Learning ) Chia-Hui Chang Department of Computer Science & Information Engineering National Central University [email_address] 9/24/2002
  2. 2. Introduction <ul><li>Plain Text Information Extraction </li></ul><ul><ul><li>The task of locating specific pieces of data from a natural language document </li></ul></ul><ul><ul><li>To obtain useful structured information from unstructured text </li></ul></ul><ul><ul><li>DARPA’s MUC program </li></ul></ul><ul><li>The extraction rules are based on </li></ul><ul><ul><li>syntactic analyzer </li></ul></ul><ul><ul><li>semantic tagger </li></ul></ul>
  3. 3. <ul><li>On-line documents </li></ul><ul><ul><li>SRV , AAAI-1998 </li></ul></ul><ul><ul><ul><li>D. Freitag </li></ul></ul></ul><ul><ul><li>Rapier , ACL-1997, AAAI-1999 </li></ul></ul><ul><ul><ul><li>M. E. Califf </li></ul></ul></ul><ul><ul><li>WHISK , ML-1999 </li></ul></ul><ul><ul><ul><li>Solderland </li></ul></ul></ul>Related Work <ul><li>Free-text documents </li></ul><ul><ul><li>PALKA, MUC-5, 1993 </li></ul></ul><ul><ul><li>AutoSlog , AAAI-1993 </li></ul></ul><ul><ul><ul><li>E. Riloff </li></ul></ul></ul><ul><ul><li>LIEP, IJCAI-1995 </li></ul></ul><ul><ul><ul><li>Huffman </li></ul></ul></ul><ul><ul><li>Crystal , IJCAI-1995, KDD-1997 </li></ul></ul><ul><ul><ul><li>Solderland </li></ul></ul></ul>
  4. 4. SRV Information Extraction from HTML: Application of a General Machine Learning Approach Dayne Freitag [email_address] AAAI-98
  5. 5. Introduction <ul><li>SRV </li></ul><ul><ul><li>A general-purpose relational learner </li></ul></ul><ul><ul><li>A top-down relational algorithm for IE </li></ul></ul><ul><ul><li>Reliance on a set of token-oriented features </li></ul></ul><ul><li>Extraction pattern </li></ul><ul><ul><li>First-order logic extraction pattern with predicates based on attribute-value tests </li></ul></ul>
  6. 6. Extraction as Text Classification <ul><li>Extraction as Text Classification </li></ul><ul><ul><li>Identify the boundaries of field instances </li></ul></ul><ul><ul><li>Treat each fragment as a bag-of-words </li></ul></ul><ul><ul><li>Find the relations from the surrounding context </li></ul></ul>
  7. 7. Relational Learning <ul><li>Inductive Logic Programming (ILP) </li></ul><ul><li>Input: class-labeled instances </li></ul><ul><li>Output: classifier for unlabeled instances </li></ul><ul><li>Typical covering algorithm </li></ul><ul><ul><li>Attribute values are added greedily to a rule </li></ul></ul><ul><ul><li>The number of positive examples is heuristically maximized while the number of negative examples is heuristically minimized </li></ul></ul>
  8. 8. Simple Features <ul><li>Features on individual token </li></ul><ul><ul><li>Length (e.g. single letter or multiple letters) </li></ul></ul><ul><ul><li>Character type (e.g. numeric or alphabet) </li></ul></ul><ul><ul><li>Orthography (e.g. capitalized) </li></ul></ul><ul><ul><li>Part of speech (e.g. verb) </li></ul></ul><ul><ul><li>Lexical meaning (e.g. geographical_place) </li></ul></ul>
  9. 9. Individual Predicates <ul><li>Individual predicate: </li></ul><ul><ul><li>Length (=3): accepts only fragments containing three tokens </li></ul></ul><ul><ul><li>Some(?A [] capitalizedp true): the fragment contains some token that is capitalized </li></ul></ul><ul><ul><li>Every(numericp false): every token in the fragment is non-numeric </li></ul></ul><ul><ul><li>Position(?A fromfirst <2): the token bound to ?A is either first or second in the fragment </li></ul></ul><ul><ul><li>Relpos(?A ?B =1) the token bound to ?A immediately preceds the token bound to ?B </li></ul></ul>
  10. 10. Relational Features <ul><li>Relational Feature types </li></ul><ul><ul><li>Adjacency (next_token) </li></ul></ul><ul><ul><li>Linguistic syntax (subject_verb) </li></ul></ul>
  11. 11. Example
  12. 12. Search <ul><li>Adding predicates greedily, attempting to cover as many positive and as few negative examples as possible. </li></ul><ul><li>At every step in rule construction, all documents in the training set are scanned and every text fragment of appropriate size counted. </li></ul><ul><li>Every legal predicate is assessed in terms of the number of positive and negative examples it covers. </li></ul><ul><li>A position-predicate is not legal unless some-predicate is already part of the rule </li></ul>
  13. 13. Relational Paths <ul><li>Relational features are used only in the Path argument to the some-predicate </li></ul><ul><ul><li>Some(?A [prev_token prev_token] capitalized true): The fragment contains some token preceded by a capitalized token two tokens back. </li></ul></ul>
  14. 14. Validation <ul><li>Training Phase </li></ul><ul><ul><li>2/3: learning </li></ul></ul><ul><ul><li>1/3: validation </li></ul></ul><ul><li>Testing </li></ul><ul><ul><li>Bayesian m-estimates: </li></ul></ul><ul><ul><ul><li>All rules matching a given fragment are used to assign a confidence score. </li></ul></ul></ul><ul><ul><ul><li>Combined confidence : </li></ul></ul></ul>
  15. 15. Adapting SRV for HTML
  16. 16. Experiments <ul><li>Data Source: </li></ul><ul><ul><li>Four university computer science departments: Cornell, U. of Texas, U. of Washington, U. of Wisconsin </li></ul></ul><ul><li>Data Set: </li></ul><ul><ul><li>Course: title, number, instructor </li></ul></ul><ul><ul><li>Project: title, member </li></ul></ul><ul><ul><li>105 course pages </li></ul></ul><ul><ul><li>96 project pages </li></ul></ul><ul><li>Two Experiments </li></ul><ul><ul><li>Random: 5 cross-validation </li></ul></ul><ul><ul><li>LOUO: 4-fold experiments </li></ul></ul>
  17. 17. OPD Coverage: Each rule has its own confidence
  18. 18. MPD
  19. 19. Baseline Strategies Simply memorizes field instances Random Guesser OPD MPD
  20. 20. Conclusions <ul><li>Increased modularity and flexibility </li></ul><ul><ul><li>Domain-specific information is separate from the underlying learning algorithm </li></ul></ul><ul><li>Top-down induction </li></ul><ul><ul><li>From general to specific </li></ul></ul><ul><li>Accuracy-coverage trade-off </li></ul><ul><ul><li>Associate confidence score with predictions </li></ul></ul><ul><li>Critique: single-slot extraction rule </li></ul>
  21. 21. RAPIER Relational Learning of Pattern-Match Rules for Information Extraction M.E. Califf and R.J. Mooney ACL-97, AAAI-1999
  22. 22. Rule Representation <ul><li>Single-slot extraction patterns </li></ul><ul><ul><li>Syntactic information (part-of-speech tagger) </li></ul></ul><ul><ul><li>Semantic class information (WordNet) </li></ul></ul>
  23. 23. The Learning Algorithm <ul><li>A specific to general search </li></ul><ul><ul><li>The pre-filler pattern contains an item for each word </li></ul></ul><ul><ul><li>The filler pattern has one item from each word in the filler </li></ul></ul><ul><ul><li>The post-filler has one item for each word </li></ul></ul><ul><li>Compress the rules for each slot </li></ul><ul><ul><li>Generate the least general generalization (LGG) of each pair of rules </li></ul></ul><ul><ul><li>When the LGG of two constraints is a disjunction, we create two alternatives (1) disjunction (2) removal of the constraints. </li></ul></ul>
  24. 24. Example <ul><li>Located in Atlanta , Georgia. </li></ul><ul><li>Offices in Kansas City , Missouri. </li></ul>, , , ,
  25. 25. Example: <ul><li>Assume there is a semantic class for states, but not one for cities. </li></ul><ul><li>Located in Atlanta, Georgia. </li></ul><ul><li>Offices in Kansas City, Missouri. </li></ul>
  26. 27. Experimental Evaluation <ul><li>300 computer-related Jobs </li></ul><ul><ul><li>17 slots: employer, location, salary, job requirements, language and platform. </li></ul></ul>
  27. 28. Experimental Evaluation <ul><li>485 seminar announcement </li></ul><ul><ul><li>4 slots: </li></ul></ul>
  28. 29. WHISK: S. Soderland University of Washington Journal of Machine Learning 1999
  29. 30. Semi-structured Text
  30. 31. Free Text Person name Position Verb stem Verb stem
  31. 32. WHISK Rule Representation <ul><li>For Semi-structured IE </li></ul>
  32. 33. WHISK Rule Representation <ul><li>For Free Text IE </li></ul>Person name Position Verb stem Verb stem Skip only whithin the same syntactic field
  33. 34. Example – Tagged by Users
  34. 35. The WHISK Algorithm
  35. 36. Creating a Rule from a Seed Instance <ul><li>Top-down rule induction </li></ul><ul><ul><li>Start from an empty rule </li></ul></ul><ul><ul><li>Add terms within the extraction boundary (Base_1) </li></ul></ul><ul><ul><li>Add terms just outside the extraction (Base_2) </li></ul></ul><ul><ul><li>Until the seed is covered </li></ul></ul>
  36. 37. Example
  37. 40. EN
  38. 41. AutoSlog: Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Dept. of Computer Science, University of Massachusetts, AAAI93
  39. 42. AutoSlog <ul><li>Purpose: </li></ul><ul><ul><li>Automatically constructs a domain-specific dictionary for IE </li></ul></ul><ul><li>Extraction pattern (concept nodes): </li></ul><ul><ul><li>Conceptual anchor: a trigger word </li></ul></ul><ul><ul><li>Enabling conditions: constraints </li></ul></ul>
  40. 43. Concept Node Example Physical target slot of a bombing template
  41. 44. Construction of Concept Nodes <ul><li>Given a target piece of information. </li></ul><ul><li>AutoSlog finds the first sentence in the text that contains the string. </li></ul><ul><li>The sentence is handed over to CIRCUS which generates a conceptual analysis of the sentence. </li></ul><ul><li>The first clause in the sentence is used. </li></ul><ul><li>A set of heuristics are applied to suggest a good conceptual anchor point for a concept node. </li></ul><ul><li>If none of the heuristics is satisfied, AutoSlog searches for the next sentence, and goto 3. </li></ul>
  42. 45. Conceptual Anchor Point Heuristics
  43. 46. Background Knowledge <ul><li>Concept Node Construction </li></ul><ul><ul><li>Slot </li></ul></ul><ul><ul><ul><li>The slot of the answer key </li></ul></ul></ul><ul><ul><li>Hard and soft constraints </li></ul></ul><ul><ul><ul><li>Type: Use template types such as bombing, kidnapping </li></ul></ul></ul><ul><ul><li>Enabling condition: heuristic pattern </li></ul></ul><ul><li>Domain Specification </li></ul><ul><ul><li>The type of a template </li></ul></ul><ul><ul><li>The constraints for each template slot </li></ul></ul>
  44. 47. Another good concept node definition Perpetrator slot from a perpetrator template
  45. 48. A bad concept node definition Victim slot from a kidnapping template
  46. 49. Empirical Results <ul><li>Input: </li></ul><ul><ul><li>Annotated corpus of texts in which the targeted information is marked and annotated with semantic tags denoting the type of information (e.g., victim) and type of event (e.g., kidnapping) </li></ul></ul><ul><ul><li>1500 texts with 1258 answer keys contain 4780 string fillers </li></ul></ul><ul><li>Output: </li></ul><ul><ul><li>1237 concept node definitions </li></ul></ul><ul><ul><li>Human intervention: 5 user-hour to sift through all generated concept nodes </li></ul></ul><ul><ul><li>450 definitions are kept </li></ul></ul><ul><li>Performance: </li></ul>
  47. 50. Conclusion <ul><li>In 5 person-hour, AutoSlog creates a dictionary that achieves 98% of the performance of hand-crafted dictionary </li></ul><ul><li>Each concept node is a single-slot extraction pattern </li></ul><ul><li>Reasons for bad definitions </li></ul><ul><ul><li>When a sentence contains the targeted string but does not describe the event </li></ul></ul><ul><ul><li>When a heuristic proposes the wrong conceptual anchor point </li></ul></ul><ul><ul><li>When CIRCUS incorrectly analyzes the sentence </li></ul></ul>
  48. 51. CRYSTAL : Inducing a Conceptual Dictionary S. Soderland, D. Fisher, J. Aseltine, W. Lehnert University of Massachusetts IJCAI’95
  49. 52. Concept Nodes (CN) <ul><li>CN-type </li></ul><ul><li>Subtype </li></ul><ul><li>Extracted syntactic constituents </li></ul><ul><li>Linguistic patterns </li></ul><ul><li>Constraints on syntactic constituents </li></ul>
  50. 53. The CRYSTAL Induction Tool <ul><li>Creating initial CN definitions </li></ul><ul><ul><li>For each instance </li></ul></ul><ul><li>Inducing generalized CN definitions </li></ul><ul><ul><li>Relaxing constraints for highly similar definitions </li></ul></ul><ul><ul><ul><li>Word constraints: intersecting strings of words </li></ul></ul></ul><ul><ul><ul><li>Class constraints: moving up the semantic hierarchy </li></ul></ul></ul>
  51. 55. Inducing Generalized CN Definitions <ul><li>Start from a CN definition, D </li></ul><ul><li>Assume we have found a second definition D’ which is similar to D, </li></ul><ul><ul><li>Create a new definition U </li></ul></ul><ul><ul><li>Delete from the dictionary all definitions covered by U, e.g. D and D’ </li></ul></ul><ul><ul><li>Test if U extracts only marked information </li></ul></ul><ul><ul><ul><li>If ‘Yes’, then go to Step 2 and set D=U, </li></ul></ul></ul><ul><ul><ul><li>If ‘No’, then start from another definition as D </li></ul></ul></ul>
  52. 57. Implementation Issue <ul><li>Finding similar definitions </li></ul><ul><ul><li>Indexing CN definitions by verbs and by extraction buffers </li></ul></ul><ul><li>Similarity metric </li></ul><ul><ul><li>Intersecting classes or intersecting strings of words </li></ul></ul><ul><li>Testing error rate of a generalized definition </li></ul><ul><ul><li>A database of instances segmented by sentence analyzer is constructed </li></ul></ul>
  53. 58. Experimental Results <ul><li>385 annotated hospital discharge reports </li></ul><ul><li>14719 training instances </li></ul><ul><li>The choice of error tolerance parameter is used to manipulate a tradeoff between precision and recall </li></ul><ul><li>Output: CN definitions </li></ul><ul><ul><li>194, coverage=10 </li></ul></ul><ul><ul><li>527, 2<coverage<10 </li></ul></ul>
  54. 59. Comparison <ul><li>Bottom-up: From specific to generalized </li></ul><ul><ul><li>CRYSTAL [Soderland, 1996] </li></ul></ul><ul><ul><li>RAPIER [Califf & Mooney, 1997] </li></ul></ul><ul><li>Top-down: From general to specific </li></ul><ul><ul><li>SRV [Freitag, 1998] </li></ul></ul><ul><ul><li>WHISK [Soderland, 1999] </li></ul></ul>
  55. 60. References <ul><li>I. Muslea, Extraction Patterns for Information Extraction Tasks: A Survey , The AAAI-99 Workshop on Machine Learning for Information Extraction . </li></ul><ul><li>Riloff, E. (1993) Automatically Constructing a Dictionary for Information Extraction Tasks , AAAI-93 , pp. 811-816 </li></ul><ul><li>S. Soderland, et al, CRYSTAL: Inducing a Conceptual Dictionary , AAAI-95. </li></ul><ul><li>Dayne Freitag, Information Extraction from HTML: Application of a General Machine Learning Approach , AAAI98 </li></ul><ul><li>Mary Elaine Califf and Raymond J. Mooney, Relational Learning of Pattern-Match Rules for Information Extraction , AAAI-99, Orlando, FL, pp. 328-334, July, 1999. </li></ul><ul><li>S. Soderland, Learning information extraction rules for semi-structured and free text . J. of Machine Learning, 1999. </li></ul>