SlideShare a Scribd company logo
1 of 138
CARTIC  RAMAKRISHNAN MEENAKSHI  NAGARAJAN AMIT  SHETH
We have used material from several  popular books, papers, course notes and presentations made by experts in this area. We have provided all references to the best of our knowledge. This list however, serves only as a pointer to work in this area and is by no means a comprehensive resource.
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
An Overview of Empirical Natural Language Processing, Eric Brill, Raymond J. Mooney Word Sequence Syntactic Parser Parse Tree Semantic Analyzer Literal Meaning Discourse Analyzer Meaning
[object Object],[object Object],KB Text NLP System Analysis
[object Object],[object Object],KB Text NLP System Analysis Corpus Learning System
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],A bag of words
Discovering connections hidden in text UNDISCOVERED PUBLIC KNOWLEDGE
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
 
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
Documents Select  terms WordNet Build  core tree Augment core tree Remove top level categories Compress Tree Divide into facets
Domains used to prune applicable senses in  Wordnet  (e.g. “dip”)  frozen dessert sundae entity substance,matter nutriment dessert ice cream sundae frozen dessert entity substance,matter nutriment dessert sherbet,sorbet sherbet sundae sherbet substance,matter nutriment dessert sherbet,sorbet frozen dessert entity ice cream sundae
Biologically  active substance Lipid Disease or Syndrome affects causes affects causes complicates Fish Oils Raynaud’s Disease ??????? instance_of instance_of UMLS  Semantic Network MeSH PubMed 9284  documents 4733  documents 5  documents
[Hearst92] Finding class instances [Ramakrishnan et. al. 08] [Nguyen07] Finding attribute “like” relation instances
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],http://compbio.uchsc.edu/Hunter_lab/Cohen/Hunter_Cohen_Molecular_Cell.pdf
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],Word Sequence Syntactic Parser Parse Tree Semantic Analyzer Literal Meaning Discourse Analyzer Meaning
[object Object],[object Object],[object Object],[object Object],[object Object],POS Tag  Description  Example CC  coordinating conjunction  and CD  cardinal number  1, third DT  determiner  the EX  existential there  there is FW  foreign word  d'hoevre IN  preposition/subordinating conjunction  in, of, like JJ  adjective  green JJR  adjective, comparative  greener JJS  adjective, superlative  greenest LS  list marker  1) MD  modal  could, will NN  noun, singular or mass  table NNS  noun plural  tables NNP  proper noun, singular  John NNPS  proper noun, plural  Vikings PDT  predeterminer  both the boys POS  possessive ending  friend's PRP  personal pronoun  I, he, it PRP$  possessive pronoun  my, his
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],Word Sequence Syntactic Parser Parse Tree Semantic Analyzer Literal Meaning Discourse Analyzer Meaning
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://www.cs.umanitoba.ca/~comp4190/2006/NLP-Parsing.ppt
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Natural Language Parsers, Peter Hellwig, Heidelberg Constituency Parse - Nested Phrasal Structures Dependency parse - Role Specific Structures
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
From Hearst 97
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],Word Sequence Syntactic Parser Parse Tree Semantic Analyzer Literal Meaning Discourse Analyzer Meaning
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://www.stanford.edu/class/cs224u/224u.07.lec2.ppt
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://www.stanford.edu/class/cs224u/224u.07.lec2.ppt
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],http://www.stanford.edu/class/cs224u/224u.07.lec2.ppt
http://www.stanford.edu/class/cs224u/224u.07.lec2.ppt
[object Object],http://www.stanford.edu/class/cs224u/224u.07.lec2.ppt
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],http://www.stanford.edu/class/cs224u/224u.07.lec2.ppt
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://www.stanford.edu/class/cs224u/224u.07.lec2.ppt
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Partial material from http://www.stanford.edu/class/cs224u/224u.07.lec2.ppt
http://www.stanford.edu/class/cs224u/224u.07.lec2.ppt
[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TO  COME:  USAGE  EXAMPLES  OF  WHAT  WE  COVERED  THUS FAR
[object Object]
This MEK dependency was observed in BRAF mutant cells regardless of tissue lineage, and correlated with both downregulation of cyclin D1 protein expression and the induction of G1 arrest. *MEK dependency  ISA  Dependency_on_an_Organic_chemical  *BRAF mutant cells  ISA  Cell_type *downregulation of cyclin D1 protein expression  ISA  Biological_process *tissue lineage  ISA  Biological_concept *induction of G1 arrest  ISA  Biological_process Information Extraction = segmentation+classification+association+mining Text mining = entity identification+named relationship extraction+discovering association chains…. Segmentation Classification Named Relationship Extraction MEK dependency observed in BRAF mutant cells downregulation of  cyclin D1 protein expression correlated with induction of G1 arrest correlated with
MEK dependency observed in BRAF mutant cells downregulation of  cyclin D1 protein expression correlated with induction of G1 arrest correlated with
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],Frequency Based  "China International Trust and Investment Corp” "Suspended Ceiling Contractors Ltd” "Hughes“ when "Hughes Communications Ltd.“ is already marked as an organization ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Several features about same word can affect parameters
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],Pruned Extraction  patterns Feature  generation  For  CRF
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
IMPLICIT EXPLICIT
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
Relationship head Subject head Object head Object head ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Modifiers Modified entities Composite Entities
[object Object],[object Object],[object Object],[object Object]
 
 
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
 
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
9866825059
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
KU Leuven
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
Ontotext
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extraction
unyil96
 

What's hot (20)

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Edad 695 research methodology
Edad 695 research methodologyEdad 695 research methodology
Edad 695 research methodology
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
DH Tools Workshop #1: Text Analysis
DH Tools Workshop #1:  Text AnalysisDH Tools Workshop #1:  Text Analysis
DH Tools Workshop #1: Text Analysis
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Text mining
Text miningText mining
Text mining
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
 
A Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text miningA Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text mining
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extraction
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Semantics 101
Semantics 101Semantics 101
Semantics 101
 
Ontology learning
Ontology learningOntology learning
Ontology learning
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Directed versus undirected network analysis of student essays
Directed versus undirected network analysis of student essaysDirected versus undirected network analysis of student essays
Directed versus undirected network analysis of student essays
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reports
 

Viewers also liked

Detection and recognition of face using neural network
Detection and recognition of face using neural networkDetection and recognition of face using neural network
Detection and recognition of face using neural network
Smriti Tikoo
 
Face recognition using neural network
Face recognition using neural networkFace recognition using neural network
Face recognition using neural network
Indira Nayak
 
Presentation on project report
Presentation on project reportPresentation on project report
Presentation on project report
ramesh_x
 

Viewers also liked (12)

Twitris
TwitrisTwitris
Twitris
 
Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD, WWW2012
Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD, WWW2012Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD, WWW2012
Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD, WWW2012
 
Ijcai ip-2015 cyberbullying-final
Ijcai ip-2015 cyberbullying-finalIjcai ip-2015 cyberbullying-final
Ijcai ip-2015 cyberbullying-final
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 
Aisb cyberbullying
Aisb cyberbullyingAisb cyberbullying
Aisb cyberbullying
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
Semantics analysis ppt
Semantics analysis pptSemantics analysis ppt
Semantics analysis ppt
 
Detection and recognition of face using neural network
Detection and recognition of face using neural networkDetection and recognition of face using neural network
Detection and recognition of face using neural network
 
Face recognition using neural network
Face recognition using neural networkFace recognition using neural network
Face recognition using neural network
 
Presentation on project report
Presentation on project reportPresentation on project report
Presentation on project report
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 

Similar to Text Analytics for Semantic Computing

Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
anil maurya
 
From TeacherTo assist you with preparing the Week 7 assignment.docx
From TeacherTo assist you with preparing the Week 7 assignment.docxFrom TeacherTo assist you with preparing the Week 7 assignment.docx
From TeacherTo assist you with preparing the Week 7 assignment.docx
hanneloremccaffery
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
RIILP
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
unyil96
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methods
butest
 

Similar to Text Analytics for Semantic Computing (20)

Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised Approach
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Textmining
TextminingTextmining
Textmining
 
Aaai 2006 Pedersen
Aaai 2006 PedersenAaai 2006 Pedersen
Aaai 2006 Pedersen
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
 
Ijcai 2007 Pedersen
Ijcai 2007 PedersenIjcai 2007 Pedersen
Ijcai 2007 Pedersen
 
From TeacherTo assist you with preparing the Week 7 assignment.docx
From TeacherTo assist you with preparing the Week 7 assignment.docxFrom TeacherTo assist you with preparing the Week 7 assignment.docx
From TeacherTo assist you with preparing the Week 7 assignment.docx
 
Knowledge acquisition using automated techniques
Knowledge acquisition using automated techniquesKnowledge acquisition using automated techniques
Knowledge acquisition using automated techniques
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Semantics 101
Semantics 101Semantics 101
Semantics 101
 
Eurolan 2005 Pedersen
Eurolan 2005 PedersenEurolan 2005 Pedersen
Eurolan 2005 Pedersen
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
Eacl 2006 Pedersen
Eacl 2006 PedersenEacl 2006 Pedersen
Eacl 2006 Pedersen
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methods
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Recently uploaded (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 

Text Analytics for Semantic Computing

  • 1. CARTIC RAMAKRISHNAN MEENAKSHI NAGARAJAN AMIT SHETH
  • 2. We have used material from several popular books, papers, course notes and presentations made by experts in this area. We have provided all references to the best of our knowledge. This list however, serves only as a pointer to work in this area and is by no means a comprehensive resource.
  • 3.
  • 4. An Overview of Empirical Natural Language Processing, Eric Brill, Raymond J. Mooney Word Sequence Syntactic Parser Parse Tree Semantic Analyzer Literal Meaning Discourse Analyzer Meaning
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Discovering connections hidden in text UNDISCOVERED PUBLIC KNOWLEDGE
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.  
  • 23.
  • 24.
  • 25. Documents Select terms WordNet Build core tree Augment core tree Remove top level categories Compress Tree Divide into facets
  • 26. Domains used to prune applicable senses in Wordnet (e.g. “dip”) frozen dessert sundae entity substance,matter nutriment dessert ice cream sundae frozen dessert entity substance,matter nutriment dessert sherbet,sorbet sherbet sundae sherbet substance,matter nutriment dessert sherbet,sorbet frozen dessert entity ice cream sundae
  • 27. Biologically active substance Lipid Disease or Syndrome affects causes affects causes complicates Fish Oils Raynaud’s Disease ??????? instance_of instance_of UMLS Semantic Network MeSH PubMed 9284 documents 4733 documents 5 documents
  • 28. [Hearst92] Finding class instances [Ramakrishnan et. al. 08] [Nguyen07] Finding attribute “like” relation instances
  • 29.
  • 30.  
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62. Natural Language Parsers, Peter Hellwig, Heidelberg Constituency Parse - Nested Phrasal Structures Dependency parse - Role Specific Structures
  • 63.
  • 64.
  • 65.
  • 66.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100. TO COME: USAGE EXAMPLES OF WHAT WE COVERED THUS FAR
  • 101.
  • 102. This MEK dependency was observed in BRAF mutant cells regardless of tissue lineage, and correlated with both downregulation of cyclin D1 protein expression and the induction of G1 arrest. *MEK dependency ISA Dependency_on_an_Organic_chemical *BRAF mutant cells ISA Cell_type *downregulation of cyclin D1 protein expression ISA Biological_process *tissue lineage ISA Biological_concept *induction of G1 arrest ISA Biological_process Information Extraction = segmentation+classification+association+mining Text mining = entity identification+named relationship extraction+discovering association chains…. Segmentation Classification Named Relationship Extraction MEK dependency observed in BRAF mutant cells downregulation of cyclin D1 protein expression correlated with induction of G1 arrest correlated with
  • 103. MEK dependency observed in BRAF mutant cells downregulation of cyclin D1 protein expression correlated with induction of G1 arrest correlated with
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117.
  • 119.
  • 120.
  • 121.
  • 122.
  • 123.
  • 124. Modifiers Modified entities Composite Entities
  • 125.
  • 126.  
  • 127.  
  • 128.
  • 129.
  • 130.
  • 131.
  • 132.  
  • 133.  
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.

Editor's Notes

  1. One of the primary goals of AI has been natural language understanding. Understanding language required not only lexical and grammatical information but also semantic pragmatic and general world knowledge. It’s a complex task and involves many levels of processing and a variety of subtasks. Typical components of a language understanding processing system Understanding language deals with (what is said) syntax and structure of language+ (what does the thing being said say/ask/inform of the world) understanding (semantics, pragmatics, discourse)
  2. In the 1970’s AI systems were developed that demostrates interesting aspects of language understanding by developing nl understanding systems that used hand coded symbolic grammars and knowledge bases. Although there was some corpus based language learnng in the 1950’s post Shannon’s Information theory etc, Chomsky’s argument that learnability of language is more an innate property than learned was instrumental in redefining goals of linguistics in the 1950’s. Emphasis on symbolic grammars and representing innate linguistic knowledge (universal grammar)
  3. Develpoing such systems however were very human intensive requiring intensive knowledge engineering, typically ran on toy examples and were rather brittle. Partially in response to these uissues there was a paradigm shift in nl understanding. Approaches moved from rationalist methods based on hand coded rules to systems that derived these rules through introspection or empirical or courpus based methods. Development ismore data driven and atleast partially automated thru the use of statistical or machine learning methods.
  4. One of the primary goals of AI has been natural language understanding. Understanding language required not only lexical and grammatical information but also semantic pragmatic and general world knowledge. It’s a complex task and involves many levels of processing and a variety of subtasks. Typical components of a language understanding processing system Understanding language deals with (what is said) syntax and structure of language+ (what does the thing being said say/ask/inform of the world) understanding (semantics, pragmatics, discourse)
  5. When we are analyzing text for semantic comp, we are doing one of two things - Finding more about wat we know. Often termed as the finding needle in a haystack paradigm, this search / browse method contrasts the other goal of text analysis.. Finding wat we do not know or discovering undisc knowledge
  6. Assertional, simple, easier to parse and understand Biomedical literature, however, contains text that describes complex scientific investigations which do not always contain explicit factual assertions. Instead, there is often a series of arguments, opinions and experiments supported by evidence that collectively corroborate or refute a hypothesis that may not be explicitly stated in a simple sentence. Sentences tend to be rather long and convoluted. Furthermore domain specific terms, abbreviations, number ranges and symbols often make sentences hard for the human reader to parse, further complicating automated information extraction. These factors make the task of mining biomedical text substantially more complex than Wikipedia like text. Casual, goal is largely interactive as opposed to informative grammatical errors, misspellings, entity variations not uncommon
  7. One of the primary goals of AI has been natural language understanding. Understanding language required not only lexical and grammatical information but also semantic pragmatic and general world knowledge. It’s a complex task and involves many levels of processing and a variety of subtasks. Typical components of a language understanding processing system Understanding language deals with (what is said) syntax and structure of language+ (what does the thing being said say/ask/inform of the world) understanding (semantics, pragmatics, discourse)
  8. Using Text to Form Hypotheses about Disease For more than a decade, Don Swanson has eloquently argued why it is plausible to expect new information to be derivable from text collections: experts can only read a small subset of what is published in their fields and are often unaware of developments in related fields. Thus it should be possible to find useful linkages between information in related literatures, if the authors of those literatures rarely refer to one another's work. Swanson has shown how chains of causal implication within the medical literature can lead to hypotheses for causes of rare diseases, some of which have received supporting experimental evidence [Swanson1987,Swanson1991,Swanson and Smalheiser1994,Swanson and Smalheiser1997]. For example, when investigating causes of migraine headaches, he extracted various pieces of evidence from titles of articles in the biomedical literature. Some of these clues can be paraphrased as follows: stress is associated with migraines stress can lead to loss of magnesium calcium channel blockers prevent some migraines magnesium is a natural calcium channel blocker spreading cortical depression (SCD) is implicated in some migraines high leveles of magnesium inhibit SCD migraine patients have high platelet aggregability magnesium can suppress platelet aggregability These clues suggest that magnesium deficiency may play a role in some kinds of migraine headache; a hypothesis which did not exist in the literature at the time Swanson found these links. The hypothesis has to be tested via non-textual means, but the important point is that a new, potentially plausible medical hypothesis was derived from a combination of text fragments and the explorer's medical expertise. (According to swanson91, subsequent study found support for the magnesium-migraine hypothesis [Ramadan et al.1989].) This approach has been only partially automated. There is, of course, a potential for combinatorial explosion of potentially valid links. beeferman98 has developed a flexible interface and analysis tool for exploring certain kinds of chains of links among lexical relations within WordNet.2 However, sophisticated new algorithms are needed for helping in the pruning process, since a good pruning algorithm will want to take into account various kinds of semantic constraints. This may be an interesting area of investigation for computational linguists.
  9. Beyond search Analytical operations over text to answer complex questions Requiring aggregation of information across a corpus Context specific Domain specific Application specific Text mining, also known as intelligent text analysis, text data mining or knowledge-discovery in text (KDT), refers generally to the process of extracting interesting and non-trivial information and knowledge from unstructured text. Text mining is a young interdisciplinary field which draws on information retrieval, data mining, machine learning, statistics and computational linguistics. As most information (over 80%) is stored as text, text mining is believed to have a high commercial potential value.
  10. The first query is “Flu Epidemic.” In Table 1, we see that the first storyline contains information about flu (identified by terms like ‘vaccines’, ‘strains’), the second contains seasonal news (identified by terms like ‘deaths’, ‘reported’), the third is about bird flu (identified by terms like ‘avian’, ‘bird’), and the fourth is about Spanish flu epidemic from 1918 (identified by terms like ‘spanish’, ‘ 1918’).
  11. http://turing.cs.washington.edu/papers/ijcai07.pdf
  12. Select well-distributed terms from the collection Eliminate stopwords Retain only those terms with a distribution higher than a threshold (default: top 10%) Build a “backbone” Create paths from unambiguous terms only Bias the structure towards appropriate senses of words Get hypernym path if term: - has only one sense, or - matches a pre-selected WordNet domain Adding a new term increases a count at each node on its path by # of docs with the term.
  13. We will cover some fundamentals that are a core part of most TM systems.
  14. Tabulate Task, Tools/Methods, Resource, Frameworks Eg. NEI, Syntactic parse based, POS taggers, UIMA,Gate NEI, Lexicon based, lexicons available for dload, UIMA,Gate
  15. Syntactic analysis involves determingng the grammatical structure of a sentence. One subtask is pos tagging or
  16. Local word characteristics *s not always plural nouns (he works) (his works)
  17. Headed by nouns and provide information about the noun in the sentence Headed by prepositions, contains noun phrases and express spatial, temporal and other attributes Oraganizes all elements on a sentence that syntactically depend on the verb
  18. a grammar describes which of the possible sequences of symbols (strings) in a language constitute valid words or statements in that language, but it does not describe their semantics (i.e. what they mean).
  19. Typical sources of knowledge Meanings of words Meanings of grammatical constructs Knowledge about structure of discourse Common sense knowledge about topic Knowledge about state of affairs in which discourse is occurring
  20. Forming new words from old words (derivational) Suffixes (inflections)
  21. Synonym test for words/lemmas - Propositional meaning When one word can be substituted for another without changing meaning of sentence Car and automobile substitutable but not identical in meaning
  22. pathlen(c1,c2) = number of edges in the shortest path in the thesaurus graph between the sense nodes c1 and c2 simpath(c1,c2) = -log pathlen(c1,c2) wordsim(w1,w2) = max c1  senses(w1),c2  senses(w2) sim(c1,c2)
  23. if "Adam Kluver Ltd" had already been recognised as an organisation by the sure-fire rule, in this second step any occurrences of "Kluver Ltd", "Adam Ltd" and "Adam Kluver" are also tagged as possible organizations. This assignment, however, is not definite since some of these words (such as "Adam") could refer to a different entity. This information goes to a pre-trained maximum entropy model (see Mikheev (1998) for more details on this aproach).
  24. SRV - Two trends are evident in the recent evolution of the field of information extraction: a preference for simple, often corpus-driven techniques over linguistically sophisticated ones; and a broadening of the central problem definition to include many non-traditional text domains. This development calls for information extraction systems which are as retctrgetable and general as possible. Here, we describe SRV, a learning architecture for information extraction which is designed for maximum generality and flexibility.SRV can exploit domain-specific information,including linguistic syntax and lexical information, in the form of features provided to the system explicitly as input for training. This process is illustrated using a domain created from Reuters corporate acquisitions articles. Features are derived from two general-purpose NLP systems, Sleator and Temperly's link grammar parser and Wordnet. Experiments compare the learner's performance with and without such linguistic information. Surprisingly, in many cases, the system performs as well without this information as with it.
  25. The label bias problem represents a simple finite-state model designed to distinguish between the two words rib and rob. In the first time step, r matches both transitions from the start state, so the probability mass gets distributed roughly equally among those two transitions. Next we observe i. Both states 1 and 4 have only one outgoing transition. State 1 has seen this observation often in training, state 4 has almost never seen this observation; but like state 1, state 4 has no choice but to pass all its mass to its single outgoing transition, since it is not generating the observation, only conditioning on it. Thus, states with a single outgoing transition effectively ignore their observations. The top path and the bottom path will be about equally likely, independently of the observation sequence. If one of the two words is slightly more common in the training set, the transitions out of the start state will slightly prefer its corresponding transition, and that word’s state sequence will always win.
  26. i. Starting with a few seed entities, it is possible to induce high-precision context patterns by exploiting entity context redundancy. ii. New entity instances of the same category can be extracted from unlabeled data with the induced patterns to create high-precision extensions of the seed lists. iii. Features derived from token membership in the extended lists improve the accuracy of learned named-entity taggers.