SlideShare a Scribd company logo
1 of 36
Download to read offline
Text


National Center for Supercomputing Applications
   University of Illinois at Urbana-Champaign
MONK Project
MONK provides:
•  1400 works of literature in English from the 16th -
   19th century = 108 million words, POS-tagged,
   TEI-tagged, in a MySQL database.
•  Several different open-source interfaces for
   working with this data
•  A public API to the datastore
•  SEASR under the hood, for analytics
MONK Project
Executes flows for
  each analysis
  requested
  –  Predictive
     modeling using
     Naïve Bayes
  –  Predictive
     modeling using
     Support Vector
     Machines (SVM)
Dunning Loglikelihood TagCloud
•  Words that are under-represented in writings by Victorian
   women as compared to Victorian men. —Sara Steger
Feature Lens




“The discussion of the children
introduces each of the short
internal
narratives. This champions the
view that her method of
repetition was patterned:
controlled, intended, and a
measured means to an end.

It would have been impossible
to discern through traditional
reading“
Semantic Analysis: Information Extraction

  •  Definition: Information extraction is the
     identification of specific semantic elements within
     a text (e.g., entities, properties, relations)

  •  Extract
the
relevant
informa1on
and
ignore

     non‐relevant
informa1on
(important!)

  •  Link
related
informa1on
and
output
in
a

     predetermined
format

Information Extraction

                                  Informa(on
Type
                                                         State
of
the
art
(Accuracy)

                                  En((es
                                                                            90‐98%

                       an
object
of
interest
such
as
a

                          person
or
organiza1on.

                             A9ributes
                                                                               80%

                  a
property
of
an
en1ty
such
as
its

                   name,
alias,
descriptor,
or
type.

                              Facts
                                                                                 60‐70%

               a
rela1onship
held
between
two
or

                more
en11es
such
as
Posi1on
of
a

                      Person
in
a
Company.

                               Events
                                                                               50‐60%

               an
ac1vity
involving
several
en11es

               such
as
a
terrorist
act,
airline
crash,

               management
change,
new
product

                           introduc1on.

“Introduction to Text Mining,” Ronen Feldman, Computer Science Department, Bar-Ilan University, ISRAEL
Information Extraction Approaches
•  Terminology (name) lists
   –  This works very well if the list of names and name expressions is
      stable and available
•  Tokenization and morphology
   –  This works well for things like formulas or dates, which are readily
      recognized by their internal format (e.g., DD/MM/YY or chemical
      formulas)
•  Use of characteristic patterns
   –  This works fairly well for novel entities
   –  Rules can be created by hand or learned via machine learning or
      statistical algorithms
   –  Rules
capture
local
paFerns
that
characterize
en11es
from

      instances
of
annotated
training
data

Semantic Analytics
   Named Entity (NE) Tagging




        NE:Person               NE:Time
Mayor Rex Luthor announced today the establishment
                                            NE:Location
of a new research facility in Alderwood. It will be
                        NE:Organization
known as Boynton Laboratory.
Semantic Analysis
Co-reference Resolution for entities and unnamed
 entities




  Mayor Rex Luthor announced today the establishment
         UNE:Organization
  of a new research facility in Alderwood. It will be

  known as Boynton Laboratory.
Semantic Analysis
Semantic Role Analysis




            ACTOR           ACTION   WHEN               OBJECT
       Mayor Rex Luthor announced today the establishment

                                  WHERE           OBJECT
       of a new research facility in Alderwoon.      It will be

       ACTION          COMPL
       known as Boynton Laboratory
Semantic Analysis
Concept-Relation Extraction




                                                          today
                                                 e
                                              tim n )
                                                           time
                                                 e
                                             (w h
                     actor
        Rex Luthor           announce
                     (who)
         person                action



                                        ob w h a
                                        (
                                          je t)
                                            ct
                                             establ.




                                                        loc(whe
                                              event
                                    ha t
                                  (w jec




                                                           at re)
                                      t)
                                     b




                                                             io
                                  o




                                                                n
                             Boynton
                                                            Alderwood
                               Lab
                             organiz.                         location
Results: Timeline
Results: Maps
UIMA Structured data
•  Two SEASR examples using UIMA POS data
  –  Frequent patterns (rule associations) on nouns
     (fpgrowth)
  –  Sentiment analysis on adjectives
UIMA 
Unstructured Information Management Applications
UIMA + P.O.S. tagging

Four Analysis Engines to analyze document to
 record Part Of Speech information.



OpenNLP     OpenNLP       OpenNLP
                                             POSWriter
Tokenizer   PosTagger     SentanceDetector




            Serialization of the UIMA CAS
UIMA to SEASR: Experiment I
•  Finding patterns
SEASR + UIMA: Frequent Patterns
Frequent Pattern Analysis on nouns
•  Goal:
   –  Discover a cast of characters within the text
   –  Discover nouns that frequently occur together
      •  character relationships
Frequent Patterns: visualization
             Analysis of Tom Sawyer
                 10 paragraph window
                 Support set to 10%
UIMA to SEASR: Experiment II
•  Sentiment Analysis
UIMA + SEASR: Sentiment Analysis
•  Classifying text based on its sentiment
   –  Determining the attitude of a speaker or a writer
   –  Determining whether a review is positive/negative 
•  Ask: What emotion is being conveyed within a body of text?
   –  Look at only adjectives (UIMA POS)
       •  lots of issues, challenges, and but’s “but … “

•  Need to Answer:
   –  What emotions to track?
   –  How to measure/classify an adjective to one of the selected
      emotions?
   –  How to visualize the results?
UIMA + SEASR: Sentiment Analysis
•  Which emotions:
   –  http://en.wikipedia.org/wiki/List_of_emotions
   –  http://changingminds.org/explanations/emotions/
      basic%20emotions.htm
   –  http://www.emotionalcompetency.com/
      recognizing.htm

•  Parrot’s classification (2001)
   –  six core emotions
   –  Love, Joy, Surprise, Anger, Sadness, Fear
UIMA + SEASR: Sentiment Analysis
UIMA + SEASR: Sentiment Analysis
•  How to classify adjectives:
   –  Lots of metrics we could use …
      •  Lists of adjectives already classified
          –  http://www.derose.net/steve/resources/emotionwords/ewords.html

          –  Need a “nearness” metric for missing adjectives

   –  How about the thesaurus game ?

•  Using only a thesaurus, find a path between two words
   –  no antonyms
   –  no colloquialisms or slang
UIMA + SEASR: Sentiment Analysis
•  How to get from delightful to rainy ?

                 ['delightful', 'fair', 'balmy', 'moist', 'rainy'].

               •  sexy to joyless?
                 ['sexy', 'provocative', 'blue', 'joyless’]

               •  bitter to lovable?
                 ['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]
UIMA + SEASR: Sentiment Analysis

         •  Use this game as a metric for
            measuring a given adjective to one
            of the six emotions.
           •  Assume the longer the path, the “farther
              away” the two words are.
              •  address some of issues
SynNet: rainy to pleasant
UIMA + SEASR: Sentiment Analysis

         •  SynNet Metrics
           •  Common nodes
           •  Path length
           •  Symmetric: a->b->c c->b->a
           •  Link strength: 
              •  tangy->sweet

              •  sweet->lovable
              •  Use of slang or informal usage
UIMA + SEASR: Sentiment Analysis

                •  Common Nodes
                  •  depth of common
UIMA + SEASR: Sentiment Analysis
•  Symmetry of path in common nodes
UIMA + SEASR: Sentiment Analysis

         •  Find the shortest path between
            adjective and each emotion:
            •  ['delightful', 'beatific', 'joyful']
            •  ['delightful', 'ineffable', 'unspeakable',
               'fearful']

         •  Pick the emotion with shortest path
            length
            •  tie breaking procedures
UIMA + SEASR: Sentiment Analysis

•  Not a perfect solution
   –  still need context to get quality
      •  Vain
          –  ['vain', 'insignificant', 'contemptible', 'hateful']
          –  ['vain', 'misleading', 'puzzling', 'surprising’]
      •  Animal
               ['animal', 'sensual', 'pleasing', 'joyful']
          – 
               ['animal', 'bestial', 'vile', 'hateful']
          – 
               ['animal', 'gross', 'shocking', 'fearful']
          – 
               ['animal', 'gross', 'grievous', 'sorrowful']
          – 
      •  Negation
          –  “My mother was not a hateful person.”
UIMA + SEASR: Sentiment Analysis

•  Process Overview
  •  Extract the adjectives (UIMA POS analysis)
  •  Read in adjectives (SEASR library)
  •  Label each adjective (SynNet)
  •  Summarize windows of adjectives
     •  lots of experimentation here

  •  Visualize the windows
UIMA + SEASR: Sentiment Analysis

•  Visualization
   •  New SEASR visualization component
      •  Based on flare ActionScript Library
          •  http://flare.prefuse.org/

      •  Still in development

      •  http://demo.seasr.org:1714/public/resources/data/emotions/
         ev/EmotionViewer.html
UIMA + SEASR: Sentiment Analysis

More Related Content

Similar to SEASR Text

A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual Entailment
Faculty of Computer Science
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
Ken Mwai
 

Similar to SEASR Text (20)

SEASR and UIMA
SEASR and UIMASEASR and UIMA
SEASR and UIMA
 
The business case for automated software engineering
The business case for automated software engineering The business case for automated software engineering
The business case for automated software engineering
 
Sureal Methodology and Timing Analysis Innovations Forum
Sureal Methodology and Timing Analysis Innovations ForumSureal Methodology and Timing Analysis Innovations Forum
Sureal Methodology and Timing Analysis Innovations Forum
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual Entailment
 
RoSE Framework
RoSE FrameworkRoSE Framework
RoSE Framework
 
Semantic Analysis and Concept-based Translation for Multilingual Information ...
Semantic Analysis and Concept-based Translation for Multilingual Information ...Semantic Analysis and Concept-based Translation for Multilingual Information ...
Semantic Analysis and Concept-based Translation for Multilingual Information ...
 
Society 3 0
Society 3 0Society 3 0
Society 3 0
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic web
 
CONFidence 2014: Davi Ottenheimer Protecting big data at scale
CONFidence 2014: Davi Ottenheimer Protecting big data at scaleCONFidence 2014: Davi Ottenheimer Protecting big data at scale
CONFidence 2014: Davi Ottenheimer Protecting big data at scale
 
Enhancing the Social Web through Augmented Social Cognition Research
Enhancing the Social Web through Augmented Social Cognition ResearchEnhancing the Social Web through Augmented Social Cognition Research
Enhancing the Social Web through Augmented Social Cognition Research
 
A Model Of Opinion Mining For Classifying Movies
A Model Of Opinion Mining For Classifying MoviesA Model Of Opinion Mining For Classifying Movies
A Model Of Opinion Mining For Classifying Movies
 
Jan Velterop: Science publishing: the different interests of record keeping a...
Jan Velterop: Science publishing: the different interests of record keeping a...Jan Velterop: Science publishing: the different interests of record keeping a...
Jan Velterop: Science publishing: the different interests of record keeping a...
 
Avanzament 5 - Part 2 of 2
Avanzament 5 - Part 2 of 2Avanzament 5 - Part 2 of 2
Avanzament 5 - Part 2 of 2
 
After Gutenberg: The Tradition of Authenticity in a New Age
After Gutenberg: The Tradition of Authenticity in a New AgeAfter Gutenberg: The Tradition of Authenticity in a New Age
After Gutenberg: The Tradition of Authenticity in a New Age
 
Webanalytics as inspiration - Maarten Berge
Webanalytics as inspiration - Maarten BergeWebanalytics as inspiration - Maarten Berge
Webanalytics as inspiration - Maarten Berge
 
ELO Sonnet
ELO SonnetELO Sonnet
ELO Sonnet
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
 
Hacking Map API's Workshop - Where2.0 2009
Hacking Map API's Workshop - Where2.0 2009Hacking Map API's Workshop - Where2.0 2009
Hacking Map API's Workshop - Where2.0 2009
 
Where2009 - Hacking Map APIs
Where2009 - Hacking Map APIsWhere2009 - Hacking Map APIs
Where2009 - Hacking Map APIs
 
X3 D 4 Enterprise Applications Dec 11 2008
X3 D 4 Enterprise Applications   Dec 11 2008X3 D 4 Enterprise Applications   Dec 11 2008
X3 D 4 Enterprise Applications Dec 11 2008
 

More from Loretta Auvil

Seasr Overview Ws April 2009
Seasr Overview Ws April 2009Seasr Overview Ws April 2009
Seasr Overview Ws April 2009
Loretta Auvil
 
Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009
Loretta Auvil
 
Fedora App Slide 2009 Hastac
Fedora App Slide 2009 HastacFedora App Slide 2009 Hastac
Fedora App Slide 2009 Hastac
Loretta Auvil
 
Meandre Workbench Ws Jan 2009
Meandre Workbench Ws Jan 2009Meandre Workbench Ws Jan 2009
Meandre Workbench Ws Jan 2009
Loretta Auvil
 
SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009
Loretta Auvil
 
ICHASS Workshop Seasr
ICHASS Workshop SeasrICHASS Workshop Seasr
ICHASS Workshop Seasr
Loretta Auvil
 

More from Loretta Auvil (20)

Seasr Overview Ws April 2009
Seasr Overview Ws April 2009Seasr Overview Ws April 2009
Seasr Overview Ws April 2009
 
Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009
 
Fedora App Slide 2009 Hastac
Fedora App Slide 2009 HastacFedora App Slide 2009 Hastac
Fedora App Slide 2009 Hastac
 
SEASR Overview
SEASR OverviewSEASR Overview
SEASR Overview
 
Discus
DiscusDiscus
Discus
 
Meandre Architecture
Meandre ArchitectureMeandre Architecture
Meandre Architecture
 
SEASR Audio
SEASR AudioSEASR Audio
SEASR Audio
 
SEASR Overview
SEASR OverviewSEASR Overview
SEASR Overview
 
SEASR Tools
SEASR ToolsSEASR Tools
SEASR Tools
 
SEASR-and-Zotero
SEASR-and-ZoteroSEASR-and-Zotero
SEASR-and-Zotero
 
SEASR-Fedora App
SEASR-Fedora AppSEASR-Fedora App
SEASR-Fedora App
 
SEASR Installation
SEASR InstallationSEASR Installation
SEASR Installation
 
SEASR Community Hub
SEASR Community HubSEASR Community Hub
SEASR Community Hub
 
Meandre Workbench Ws Jan 2009
Meandre Workbench Ws Jan 2009Meandre Workbench Ws Jan 2009
Meandre Workbench Ws Jan 2009
 
SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009SEASR-Meandre Architecture Ws Jan 2009
SEASR-Meandre Architecture Ws Jan 2009
 
SEASR and Zotero
SEASR and ZoteroSEASR and Zotero
SEASR and Zotero
 
SEASR Overview
SEASR OverviewSEASR Overview
SEASR Overview
 
SEASR eScience 2008
SEASR eScience 2008SEASR eScience 2008
SEASR eScience 2008
 
ICHASS Workshop Lab
ICHASS Workshop LabICHASS Workshop Lab
ICHASS Workshop Lab
 
ICHASS Workshop Seasr
ICHASS Workshop SeasrICHASS Workshop Seasr
ICHASS Workshop Seasr
 

Recently uploaded

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

SEASR Text

  • 1. Text National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
  • 2. MONK Project MONK provides: •  1400 works of literature in English from the 16th - 19th century = 108 million words, POS-tagged, TEI-tagged, in a MySQL database. •  Several different open-source interfaces for working with this data •  A public API to the datastore •  SEASR under the hood, for analytics
  • 3. MONK Project Executes flows for each analysis requested –  Predictive modeling using Naïve Bayes –  Predictive modeling using Support Vector Machines (SVM)
  • 4. Dunning Loglikelihood TagCloud •  Words that are under-represented in writings by Victorian women as compared to Victorian men. —Sara Steger
  • 5. Feature Lens “The discussion of the children introduces each of the short internal narratives. This champions the view that her method of repetition was patterned: controlled, intended, and a measured means to an end. It would have been impossible to discern through traditional reading“
  • 6. Semantic Analysis: Information Extraction •  Definition: Information extraction is the identification of specific semantic elements within a text (e.g., entities, properties, relations) •  Extract
the
relevant
informa1on
and
ignore
 non‐relevant
informa1on
(important!)
 •  Link
related
informa1on
and
output
in
a
 predetermined
format

  • 7. Information Extraction Informa(on
Type
 State
of
the
art
(Accuracy)
 En((es
 90‐98%
 an
object
of
interest
such
as
a
 person
or
organiza1on.
 A9ributes
 80%
 a
property
of
an
en1ty
such
as
its
 name,
alias,
descriptor,
or
type.
 Facts
 60‐70%
 a
rela1onship
held
between
two
or
 more
en11es
such
as
Posi1on
of
a
 Person
in
a
Company.
 Events
 50‐60%
 an
ac1vity
involving
several
en11es
 such
as
a
terrorist
act,
airline
crash,
 management
change,
new
product
 introduc1on.
 “Introduction to Text Mining,” Ronen Feldman, Computer Science Department, Bar-Ilan University, ISRAEL
  • 8. Information Extraction Approaches •  Terminology (name) lists –  This works very well if the list of names and name expressions is stable and available •  Tokenization and morphology –  This works well for things like formulas or dates, which are readily recognized by their internal format (e.g., DD/MM/YY or chemical formulas) •  Use of characteristic patterns –  This works fairly well for novel entities –  Rules can be created by hand or learned via machine learning or statistical algorithms –  Rules
capture
local
paFerns
that
characterize
en11es
from
 instances
of
annotated
training
data

  • 9. Semantic Analytics Named Entity (NE) Tagging NE:Person NE:Time Mayor Rex Luthor announced today the establishment NE:Location of a new research facility in Alderwood. It will be NE:Organization known as Boynton Laboratory.
  • 10. Semantic Analysis Co-reference Resolution for entities and unnamed entities Mayor Rex Luthor announced today the establishment UNE:Organization of a new research facility in Alderwood. It will be known as Boynton Laboratory.
  • 11. Semantic Analysis Semantic Role Analysis ACTOR ACTION WHEN OBJECT Mayor Rex Luthor announced today the establishment WHERE OBJECT of a new research facility in Alderwoon. It will be ACTION COMPL known as Boynton Laboratory
  • 12. Semantic Analysis Concept-Relation Extraction today e tim n ) time e (w h actor Rex Luthor announce (who) person action ob w h a ( je t) ct establ. loc(whe event ha t (w jec at re) t) b io o n Boynton Alderwood Lab organiz. location
  • 15. UIMA Structured data •  Two SEASR examples using UIMA POS data –  Frequent patterns (rule associations) on nouns (fpgrowth) –  Sentiment analysis on adjectives
  • 16. UIMA Unstructured Information Management Applications
  • 17. UIMA + P.O.S. tagging Four Analysis Engines to analyze document to record Part Of Speech information. OpenNLP OpenNLP OpenNLP POSWriter Tokenizer PosTagger SentanceDetector Serialization of the UIMA CAS
  • 18. UIMA to SEASR: Experiment I •  Finding patterns
  • 19. SEASR + UIMA: Frequent Patterns Frequent Pattern Analysis on nouns •  Goal: –  Discover a cast of characters within the text –  Discover nouns that frequently occur together •  character relationships
  • 20. Frequent Patterns: visualization Analysis of Tom Sawyer 10 paragraph window Support set to 10%
  • 21. UIMA to SEASR: Experiment II •  Sentiment Analysis
  • 22. UIMA + SEASR: Sentiment Analysis •  Classifying text based on its sentiment –  Determining the attitude of a speaker or a writer –  Determining whether a review is positive/negative •  Ask: What emotion is being conveyed within a body of text? –  Look at only adjectives (UIMA POS) •  lots of issues, challenges, and but’s “but … “ •  Need to Answer: –  What emotions to track? –  How to measure/classify an adjective to one of the selected emotions? –  How to visualize the results?
  • 23. UIMA + SEASR: Sentiment Analysis •  Which emotions: –  http://en.wikipedia.org/wiki/List_of_emotions –  http://changingminds.org/explanations/emotions/ basic%20emotions.htm –  http://www.emotionalcompetency.com/ recognizing.htm •  Parrot’s classification (2001) –  six core emotions –  Love, Joy, Surprise, Anger, Sadness, Fear
  • 24. UIMA + SEASR: Sentiment Analysis
  • 25. UIMA + SEASR: Sentiment Analysis •  How to classify adjectives: –  Lots of metrics we could use … •  Lists of adjectives already classified –  http://www.derose.net/steve/resources/emotionwords/ewords.html –  Need a “nearness” metric for missing adjectives –  How about the thesaurus game ? •  Using only a thesaurus, find a path between two words –  no antonyms –  no colloquialisms or slang
  • 26. UIMA + SEASR: Sentiment Analysis •  How to get from delightful to rainy ? ['delightful', 'fair', 'balmy', 'moist', 'rainy']. •  sexy to joyless? ['sexy', 'provocative', 'blue', 'joyless’] •  bitter to lovable? ['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]
  • 27. UIMA + SEASR: Sentiment Analysis •  Use this game as a metric for measuring a given adjective to one of the six emotions. •  Assume the longer the path, the “farther away” the two words are. •  address some of issues
  • 28. SynNet: rainy to pleasant
  • 29. UIMA + SEASR: Sentiment Analysis •  SynNet Metrics •  Common nodes •  Path length •  Symmetric: a->b->c c->b->a •  Link strength: •  tangy->sweet •  sweet->lovable •  Use of slang or informal usage
  • 30. UIMA + SEASR: Sentiment Analysis •  Common Nodes •  depth of common
  • 31. UIMA + SEASR: Sentiment Analysis •  Symmetry of path in common nodes
  • 32. UIMA + SEASR: Sentiment Analysis •  Find the shortest path between adjective and each emotion: •  ['delightful', 'beatific', 'joyful'] •  ['delightful', 'ineffable', 'unspeakable', 'fearful'] •  Pick the emotion with shortest path length •  tie breaking procedures
  • 33. UIMA + SEASR: Sentiment Analysis •  Not a perfect solution –  still need context to get quality •  Vain –  ['vain', 'insignificant', 'contemptible', 'hateful'] –  ['vain', 'misleading', 'puzzling', 'surprising’] •  Animal ['animal', 'sensual', 'pleasing', 'joyful'] –  ['animal', 'bestial', 'vile', 'hateful'] –  ['animal', 'gross', 'shocking', 'fearful'] –  ['animal', 'gross', 'grievous', 'sorrowful'] –  •  Negation –  “My mother was not a hateful person.”
  • 34. UIMA + SEASR: Sentiment Analysis •  Process Overview •  Extract the adjectives (UIMA POS analysis) •  Read in adjectives (SEASR library) •  Label each adjective (SynNet) •  Summarize windows of adjectives •  lots of experimentation here •  Visualize the windows
  • 35. UIMA + SEASR: Sentiment Analysis •  Visualization •  New SEASR visualization component •  Based on flare ActionScript Library •  http://flare.prefuse.org/ •  Still in development •  http://demo.seasr.org:1714/public/resources/data/emotions/ ev/EmotionViewer.html
  • 36. UIMA + SEASR: Sentiment Analysis