SlideShare une entreprise Scribd logo
1  sur  48
DocuBurst:
               Visualizing Document Content
               Using Language Structure

EuroVis 2009   Christopher Collins, Sheelagh Carpendale, and Gerald Penn
2
3
Document Content Visualization
4


     Navigation in collections of digital text
     Content analysis (digital humanities)

     Plagiarism detection

     Authorship attribution
...Using Language Structure
5


       Traditional glyph techniques use unstructured
        word counts (e.g. tag clouds)
       DocuBurst structure is based on a carefully
        designed ontology called WordNet
WordNet Background
6


       Basic data unit is a set of synonyms called a synset:
        {lawyer, attorney}, {jump, hop, skip}


       Words can occur in multiple synsets:
        {bank, financial institution}
        {bank, slope, riverside}


       Free resource from Princeton University
Hyponymy Relation
7


       X is a Y or X is a kind of Y
       transitive, asymmetric relationship
       example
         {robin,redbreast} IS A {bird}
         robin and redbreast are hyponyms of bird

       forms the basic structure of the noun network

        {robin, redbreast} IS-A {bird} IS-A
          {animal, animate_being} IS-A
          {organism, life_form, living_thing} IS-A {entity}
Creating DocuBurst
8



          gamesgame
          takentake




          absolute,noun,10
          chair,noun,2
          moment,noun,11
          game,noun,30
          reality,noun,3
          take,verb,13
          represent,verb,17
          ...




          game IS activity
          chair IS furniture
Hyponymy Structure
Word Sense Ambiguity
10


        Man = {mankind,world}, {male human}, ...
        Water = {H2O}, {water supply}, {body of water}, ...
        Word senses are roughly ordered by frequency in
         WordNet
Alternative Scoring Models
11


        Count for all senses
          undue prominence     to ambiguous words
        Count first sense only
          loses   too much information
        Divide by sense count (same for all senses)
          high   penalty on polysemous words
        Divide by sense index
          decreased prominence    for uncommon senses
Visual Encoding
12



        Node Size: # of leaves in subtree
            Stability across documents
        Node Position: IS-A relation
            Multi-level linguistic abstraction
            Additive
             (2 ducks + 3 geese = 5 birds)
        Node Hue: sense index
            Differentiates subtrees
        Node Saturation: word count
            Ordering & approximate scale is perceived
        Node Label: First word in synset
            Words are ordered by commonality in the
             language, reveals well-known words
Node Colouring Alternatives
13




         Cumulative Counts               Single Node Counts
      Supports Visual Summaries   Supports Precision and Selection
14   Interaction
Trace-to-Root
15




     Cattle IS-A bovine IS-A bovid IS-A ... Mammal IS-A vertebrate IS-A chordate IS-A animal
Roll Up
16
Drill Down
17
18
19
Concordance
20
Level of Detail Filter
21


        Nodes > N away from root are hidden
Search
22
23   Design Trade-Offs
Node Size Mapping
24


        Size by # leaves
         + consistent
         – visual artifacts (highly relevant words with few leaves
           are too small)


        Size by score
         + redundant encoding
         + important words more prominent
         – disrupts inter-document comparison
Font Size Mapping
25


        Size to fit cell
         + maximize legibility
         – short words have huge font


        Font size proportional to cell size
         + short words not more prominent
         – small maximum size to accommodate long words
Inclusion of Zero-count Words
26


      + provides context (what is not in document)
      – more cluttered
27   Case Studies
28
29
30
31
2008 U.S. Presidential Debate
32
Unexpected Uses
33


        WordNet Visualization
Unexpected Uses
34


        WordNet Visualization
Unexpected Uses
35


        Language Education
          “invaluable potential for writing and vocabulary
           development at the secondary level”
          “I'm very interested in using the program, I'm an English
           teacher”
36   Related Work
Types of Document
37
     Visualization
Features of Document
38
     Visualization
        Semantic:    indicate meaning
        Cluster:     generalize into concepts
        Overview:    provide quick gist
        Zoom:        support varying level of detail
        Compare:     multi-document comparisons
        Search:      find specific words/phrases
        Read:        drill-down to original text
        Pattern:     reveal patterns of repetition
        Features:    reveal extracted features such as emotion
        Suggest:     automatically select interesting focus words
        Phrases:     can show multi-word phrases
        All words:   can show all parts of speech
Features of Document
39
     Visualization
Semantics & Clustering
40


        Provides word
         definitions and
         relations
        Clusters of
         related terms
         allow variable
         level of
         abstraction
Phrases & All Words
41


        Cannot visualize multi-word phrases that are not
         ‘words’ in WordNet
        Only English nouns, verbs
42   Future Work
Uneven Tree Cut Models
43
44
DocuBurst Comparative Views
45


        Embed small multiples in e-libraries
        Colour scale based on text difference
          From each other
          From corpus average
Simplification
47


        Root suggestion
          How   to know where to start exploring?
        Word sense disambiguation
          Attempt to  select a sense
          Use a less detailed ontology
Thanks for your Attention!


    Acknowledgements:
    Ravin Balakrishnan and helpful reviewers.
    Contact: ccollins@cs.utoronto.ca




EuroVis 2009             Christopher Collins, Sheelagh Carpendale, and Gerald Penn

Contenu connexe

Similaire à EuroVis DocuBurst Presentation 2009

Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...shakimov
 
Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"CITE
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games ResearchJose Zagal
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
WORDNET: A Database of Lexical Relations
WORDNET: A Database of Lexical RelationsWORDNET: A Database of Lexical Relations
WORDNET: A Database of Lexical RelationsAhmed Abd-Elwasaa
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics Ibutest
 
Understanding ASL Grammatical Features and Discourse Mapping
Understanding ASL Grammatical Features and Discourse MappingUnderstanding ASL Grammatical Features and Discourse Mapping
Understanding ASL Grammatical Features and Discourse MappingDoug Stringham
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1Sumit Sony
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learningfridolin.wild
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative researchGhulam Qambar
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Interactive Analysis of Word Vector Embeddings
Interactive Analysis of Word Vector EmbeddingsInteractive Analysis of Word Vector Embeddings
Interactive Analysis of Word Vector Embeddingsgleicher
 
Rettig.interface designislanguagedesign
Rettig.interface designislanguagedesignRettig.interface designislanguagedesign
Rettig.interface designislanguagedesignMarc Rettig
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Appropriate use of sources in academic writing
Appropriate use of sources in academic writing Appropriate use of sources in academic writing
Appropriate use of sources in academic writing Dr Stylianos Mystakidis
 
Interpreting Embeddings with Comparison
Interpreting Embeddings with ComparisonInterpreting Embeddings with Comparison
Interpreting Embeddings with Comparisongleicher
 

Similaire à EuroVis DocuBurst Presentation 2009 (20)

Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
 
NLP
NLPNLP
NLP
 
Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
WORDNET: A Database of Lexical Relations
WORDNET: A Database of Lexical RelationsWORDNET: A Database of Lexical Relations
WORDNET: A Database of Lexical Relations
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
Understanding ASL Grammatical Features and Discourse Mapping
Understanding ASL Grammatical Features and Discourse MappingUnderstanding ASL Grammatical Features and Discourse Mapping
Understanding ASL Grammatical Features and Discourse Mapping
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative research
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Interactive Analysis of Word Vector Embeddings
Interactive Analysis of Word Vector EmbeddingsInteractive Analysis of Word Vector Embeddings
Interactive Analysis of Word Vector Embeddings
 
Rettig.interface designislanguagedesign
Rettig.interface designislanguagedesignRettig.interface designislanguagedesign
Rettig.interface designislanguagedesign
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Appropriate use of sources in academic writing
Appropriate use of sources in academic writing Appropriate use of sources in academic writing
Appropriate use of sources in academic writing
 
Interpreting Embeddings with Comparison
Interpreting Embeddings with ComparisonInterpreting Embeddings with Comparison
Interpreting Embeddings with Comparison
 

Dernier

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

EuroVis DocuBurst Presentation 2009

  • 1. DocuBurst: Visualizing Document Content Using Language Structure EuroVis 2009 Christopher Collins, Sheelagh Carpendale, and Gerald Penn
  • 2. 2
  • 3. 3
  • 4. Document Content Visualization 4  Navigation in collections of digital text  Content analysis (digital humanities)  Plagiarism detection  Authorship attribution
  • 5. ...Using Language Structure 5  Traditional glyph techniques use unstructured word counts (e.g. tag clouds)  DocuBurst structure is based on a carefully designed ontology called WordNet
  • 6. WordNet Background 6  Basic data unit is a set of synonyms called a synset: {lawyer, attorney}, {jump, hop, skip}  Words can occur in multiple synsets: {bank, financial institution} {bank, slope, riverside}  Free resource from Princeton University
  • 7. Hyponymy Relation 7  X is a Y or X is a kind of Y  transitive, asymmetric relationship  example  {robin,redbreast} IS A {bird}  robin and redbreast are hyponyms of bird  forms the basic structure of the noun network {robin, redbreast} IS-A {bird} IS-A {animal, animate_being} IS-A {organism, life_form, living_thing} IS-A {entity}
  • 8. Creating DocuBurst 8 gamesgame takentake absolute,noun,10 chair,noun,2 moment,noun,11 game,noun,30 reality,noun,3 take,verb,13 represent,verb,17 ... game IS activity chair IS furniture
  • 10. Word Sense Ambiguity 10  Man = {mankind,world}, {male human}, ...  Water = {H2O}, {water supply}, {body of water}, ...  Word senses are roughly ordered by frequency in WordNet
  • 11. Alternative Scoring Models 11  Count for all senses  undue prominence to ambiguous words  Count first sense only  loses too much information  Divide by sense count (same for all senses)  high penalty on polysemous words  Divide by sense index  decreased prominence for uncommon senses
  • 12. Visual Encoding 12  Node Size: # of leaves in subtree  Stability across documents  Node Position: IS-A relation  Multi-level linguistic abstraction  Additive (2 ducks + 3 geese = 5 birds)  Node Hue: sense index  Differentiates subtrees  Node Saturation: word count  Ordering & approximate scale is perceived  Node Label: First word in synset  Words are ordered by commonality in the language, reveals well-known words
  • 13. Node Colouring Alternatives 13 Cumulative Counts Single Node Counts Supports Visual Summaries Supports Precision and Selection
  • 14. 14 Interaction
  • 15. Trace-to-Root 15 Cattle IS-A bovine IS-A bovid IS-A ... Mammal IS-A vertebrate IS-A chordate IS-A animal
  • 18. 18
  • 19. 19
  • 21. Level of Detail Filter 21  Nodes > N away from root are hidden
  • 23. 23 Design Trade-Offs
  • 24. Node Size Mapping 24  Size by # leaves + consistent – visual artifacts (highly relevant words with few leaves are too small)  Size by score + redundant encoding + important words more prominent – disrupts inter-document comparison
  • 25. Font Size Mapping 25  Size to fit cell + maximize legibility – short words have huge font  Font size proportional to cell size + short words not more prominent – small maximum size to accommodate long words
  • 26. Inclusion of Zero-count Words 26 + provides context (what is not in document) – more cluttered
  • 27. 27 Case Studies
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 33. Unexpected Uses 33  WordNet Visualization
  • 34. Unexpected Uses 34  WordNet Visualization
  • 35. Unexpected Uses 35  Language Education  “invaluable potential for writing and vocabulary development at the secondary level”  “I'm very interested in using the program, I'm an English teacher”
  • 36. 36 Related Work
  • 37. Types of Document 37 Visualization
  • 38. Features of Document 38 Visualization  Semantic: indicate meaning  Cluster: generalize into concepts  Overview: provide quick gist  Zoom: support varying level of detail  Compare: multi-document comparisons  Search: find specific words/phrases  Read: drill-down to original text  Pattern: reveal patterns of repetition  Features: reveal extracted features such as emotion  Suggest: automatically select interesting focus words  Phrases: can show multi-word phrases  All words: can show all parts of speech
  • 39. Features of Document 39 Visualization
  • 40. Semantics & Clustering 40  Provides word definitions and relations  Clusters of related terms allow variable level of abstraction
  • 41. Phrases & All Words 41  Cannot visualize multi-word phrases that are not ‘words’ in WordNet  Only English nouns, verbs
  • 42. 42 Future Work
  • 43. Uneven Tree Cut Models 43
  • 44. 44
  • 45. DocuBurst Comparative Views 45  Embed small multiples in e-libraries  Colour scale based on text difference  From each other  From corpus average
  • 46.
  • 47. Simplification 47  Root suggestion  How to know where to start exploring?  Word sense disambiguation  Attempt to select a sense  Use a less detailed ontology
  • 48. Thanks for your Attention! Acknowledgements: Ravin Balakrishnan and helpful reviewers. Contact: ccollins@cs.utoronto.ca EuroVis 2009 Christopher Collins, Sheelagh Carpendale, and Gerald Penn