SlideShare une entreprise Scribd logo
1  sur  29
DH TOOLS
Introduction to Text Analysis


         Cameron Buckner
    Visiting Assistant Professor
    Department of Philosophy
        cjbuckner@uh.edu
Our Initiative
• Promote, facilitate, interact
  • Reading group                   • Tools workshops
  • Speaker series                  • Grantwriting support
  • Infrastructure advocacy




            http://www.uh.edu/class/digitalhumanities/
Roadmap
Goal today: Analyze texts using cutting-edge analyses
from computational psycholinguistics with an off-the-shelf
tool, word2word

1.   What can you do with text analysis?
2.   A little bit of theory: Semantic spaces
3.   BEAGLE: The holographic lexicon
4.   MDS: Visualizing multidimensional networks
5.   Examples
6.   Hands-on play
What is DH?
• Computation and interpretation
  • The use of computational tools for the
    production, exploration, analysis, and
    dissemination of humanistic knowledge
      • Thread common between new and old:
        pattern recognition
• Includes
  •   Digitization and archiving, markup
  •   Analysis & visualization
  •   Search & dissemination
  •   Pedagogy
Methods of Text Analysis I
• Statistical analysis, information extraction, machine
  learning
  • Syntactic: word frequencies (Google n-grams), vocabulary
    usage, stylometry (authorship and genre), Pagerank




     http://www.nytimes.com/interactive/2012/09/06/us/politics/conventio
     n-word-counts.html
Methods of Text Analysis II
• Semantic: tf-idf, latent semantic analysis, latent dirichlet
  allocation, entropy-based measures, ontologies
  • Aim to model relevance, semantic similarity, taxonomic
    relationships, object properties and relations
Reminders
• Be creative and have fun, but if you want to publish…
• Be principled:
  • Junk in, junk out
  • Always know assumptions required by a method
     • Analyses should hold up under trivial transformations of data
       representation
  • Be prepared for pragmatic design decisions
  • Go in with hypotheses and structured questions
  • Confirm with careful humanistic interpretation
The Mental Lexicon
• A “mental dictionary”
  • Contains information about:
     • Word meaning, grammatical roles, taxonomic relations, typical
       properties
     • Behavioral indicators: recognition speed, synonymy and relevance
       judgments, priming, frequency effects, categorization
BEAGLE
• A model that learns (unsupervised) a holographic mental
  lexicon automatically from text
• History: Two approaches to semantic analysis
  • Co-occurrence based measures (“bag of words”, LSA, tf-
    idf)
     • Good at determining relevance, bad at determining roles and
       relations
  • Order-based measures (n-gram models, generative
    grammars, hidden Markov models)
     • Good at identifying grammatical and structural relations, bad at
       identifying relevance and meaning
• Challenge: Can the two be combined?
Context + Role
• Assumption: People acquire an
  idiosyncratic mental lexicon from
  patterns of co-occurrence and syntactic
  relationships they encounter in natural
  language.
  • “You shall know a word by both the
    company it keeps and how it keeps it.”
• Goal: If we could build a representation
  of a text’s context/role distributions, we
  could predict the structure of a mental
  lexicon that produced a corpus and/or
  that would be produced by it
  • Texts as “mental fingerprints”
How
Holograms
  Work
Basic Vector Approach
1. Start with a multi-dimensional vector space
2. Each term meaning is initially represented by a random,
   constant environment vector and an empty memory
   vector
3. Associations between terms can be represented by adding or
   averaging their environment vectors into their memory
   vectors
4. Each time terms co-occur, their memory vectors become
   closer in multi-dimensional similarity space
Representing Order Info
• Convolution: compressing outer-product matrix of two
  term vectors so that the product contains recoverable
  information about both
• Example: z = x * y
  • Association vector z contains information about both x and
    y
  • Can (approximately) reconstruct source vector y by probing
    z (deconvolution) with x (and vice versa)
• Combined BEAGLE memory vector: Context memory
  comes from vector addition, and order information comes
  from n-gram binding using convolution
Combined Memory Vector



•   m = memory vector
•   e = initial random environment vector
•   p = position in sentence
•   lambda = constant chunking factor (size of n-gram window)
•   bind i,j = a non-commutative convolution of constant order
    vector with other environment vectors in n-gram
Resonance retrieval…
So, BEAGLE method
1. Choose number of dimensions for vector space, size of
   n-gram window for order info
2. Clean up source documents using standard NLP (stop
   words, stemmers, etc.)
3. Learn context and order vectors from corpus, combine
4. Select words of interest
5. Visualize multi-dimensional space using favorite
   method (e.g. MDS)
Limitations of BEAGLE
• Only considers 1-sentence windows
• Lexical ambiguity
• Valence (e.g. synonyms, antonyms)
MDS
• A way to view a multi-dimensional similarity space
• Collapses multi-dimensional space in way that tries to
  mutually preserve distances between vectors
  • Collapsing dimensions often reveals most significant
    [higher-order] dimensions
Uses
• How do two academic reference works compare in their
  coverage of a discipline?
  • Biases? Overlap?



                                                      InPhO-
                                                     Semantics
                                                      Credit:
                                                      Robert
                                                       Rose
Black = SEP, Red = IEP

  Credit: Jun Otsuka
Political rhetoric
• What can we learn from the “semantic space” derived
  from a party or candidate’s rhetoric?
  •   Central issues?
  •   Key comparisons?
  •   Ideological focus/big tent?
  •   Location on ideological spectrum?
• Example: compare speeches from Republican and
  Democratic political conventions
Heat Map: Terms most diagnostic of a speech’s being delivered by a Democrat
“Hotter” indicates more diagnostic in comparison. Hottest terms =
aarp, experience, affordable, abuelo, billionaires, afghanistan, beijing, biofuels, aliens
Character Analysis
• Moretti: “protagonist is the character that minimized the
  sum of the distances to all other vertices”
  • (But Moretti did it by hand!)
Character similarity analysis from A Dance with Dragons
Acknowledgements
  InPhO Team



               Brent Kievet-Kylar
                  word2word




                  Mike Jones
                  BEAGLE

Contenu connexe

Tendances

Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vecananth
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentationSai Mohith
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Knowledge base system appl. p 1,2-ver1
Knowledge base system appl.  p 1,2-ver1Knowledge base system appl.  p 1,2-ver1
Knowledge base system appl. p 1,2-ver1Taymoor Nazmy
 
content analysis and discourse analysis
content analysis and discourse analysiscontent analysis and discourse analysis
content analysis and discourse analysisRudy Banuta
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisAli BELCAID
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
 
Sentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusSentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusShalin Hai-Jew
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNetSeid Hassen
 

Tendances (20)

Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
subrat
 subrat subrat
subrat
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Knowledge base system appl. p 1,2-ver1
Knowledge base system appl.  p 1,2-ver1Knowledge base system appl.  p 1,2-ver1
Knowledge base system appl. p 1,2-ver1
 
content analysis and discourse analysis
content analysis and discourse analysiscontent analysis and discourse analysis
content analysis and discourse analysis
 
SCTUR: A Sentiment Classification Technique for URDU
SCTUR: A Sentiment Classification Technique for URDUSCTUR: A Sentiment Classification Technique for URDU
SCTUR: A Sentiment Classification Technique for URDU
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
NLPinAAC
NLPinAACNLPinAAC
NLPinAAC
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment Analysis
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
Sentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusSentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 Plus
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Detecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversationsDetecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversations
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
 

En vedette

Customer Linguistic Profiling
Customer Linguistic ProfilingCustomer Linguistic Profiling
Customer Linguistic ProfilingF789GH
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysisjagannath Dange
 
How to write chapter 5
How to write chapter 5How to write chapter 5
How to write chapter 5Xtine Rubio
 
Summary, Conclusions and Recommendations
Summary, Conclusions and RecommendationsSummary, Conclusions and Recommendations
Summary, Conclusions and RecommendationsRoqui Malijan
 
Chapter 5 SUMMARY OF FINDINGS, CONCLUSION AND RECCOMENDATION
Chapter 5 SUMMARY OF FINDINGS, CONCLUSION AND RECCOMENDATIONChapter 5 SUMMARY OF FINDINGS, CONCLUSION AND RECCOMENDATION
Chapter 5 SUMMARY OF FINDINGS, CONCLUSION AND RECCOMENDATIONLJ Villanueva
 

En vedette (8)

Customer Linguistic Profiling
Customer Linguistic ProfilingCustomer Linguistic Profiling
Customer Linguistic Profiling
 
Qualitative Data Analysis I: Text Analysis
Qualitative Data Analysis I: Text AnalysisQualitative Data Analysis I: Text Analysis
Qualitative Data Analysis I: Text Analysis
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
 
How to write chapter 5
How to write chapter 5How to write chapter 5
How to write chapter 5
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
Summary, Conclusions and Recommendations
Summary, Conclusions and RecommendationsSummary, Conclusions and Recommendations
Summary, Conclusions and Recommendations
 
Chapter 5 SUMMARY OF FINDINGS, CONCLUSION AND RECCOMENDATION
Chapter 5 SUMMARY OF FINDINGS, CONCLUSION AND RECCOMENDATIONChapter 5 SUMMARY OF FINDINGS, CONCLUSION AND RECCOMENDATION
Chapter 5 SUMMARY OF FINDINGS, CONCLUSION AND RECCOMENDATION
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
 

Similaire à DH TOOLS: Introduction to Text Analysis Using BEAGLE and MDS

Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...IT Arena
 
doore dissertation grad expo 42716 white finalb
doore dissertation grad expo 42716 white finalbdoore dissertation grad expo 42716 white finalb
doore dissertation grad expo 42716 white finalbStacy Doore
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...hajinouha0
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Discourse analysis (Schmitt's book chapter 4)
Discourse analysis (Schmitt's book chapter 4)Discourse analysis (Schmitt's book chapter 4)
Discourse analysis (Schmitt's book chapter 4)Samira Rahmdel
 
Cognitive ethnography
Cognitive ethnographyCognitive ethnography
Cognitive ethnographyBrock Dubbels
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteAldo Gangemi
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
ARTIFICIAL INTELLIGENCE---UNIT 4.pptx
ARTIFICIAL INTELLIGENCE---UNIT 4.pptxARTIFICIAL INTELLIGENCE---UNIT 4.pptx
ARTIFICIAL INTELLIGENCE---UNIT 4.pptxRuchitaMaaran
 
The Semantic Web #8 - Ontology
The Semantic Web #8 - OntologyThe Semantic Web #8 - Ontology
The Semantic Web #8 - OntologyMyungjin Lee
 
Multimodal presentation by_claire_thickett
Multimodal presentation by_claire_thickettMultimodal presentation by_claire_thickett
Multimodal presentation by_claire_thickettjannski
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Extracting semantics from crowds
Extracting semantics from crowdsExtracting semantics from crowds
Extracting semantics from crowdsMarkus Strohmaier
 
Drifting distributions? Possibilities and risks of using distributional seman...
Drifting distributions? Possibilities and risks of using distributional seman...Drifting distributions? Possibilities and risks of using distributional seman...
Drifting distributions? Possibilities and risks of using distributional seman...Antske Fokkens
 
Content analysis
Content analysis Content analysis
Content analysis ayesha shah
 

Similaire à DH TOOLS: Introduction to Text Analysis Using BEAGLE and MDS (20)

Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
 
doore dissertation grad expo 42716 white finalb
doore dissertation grad expo 42716 white finalbdoore dissertation grad expo 42716 white finalb
doore dissertation grad expo 42716 white finalb
 
Kishaloy Haldar and Wenqiang Lei - WESST - Sentiment Analysis of Social Media
Kishaloy Haldar and Wenqiang Lei - WESST - Sentiment Analysis of Social MediaKishaloy Haldar and Wenqiang Lei - WESST - Sentiment Analysis of Social Media
Kishaloy Haldar and Wenqiang Lei - WESST - Sentiment Analysis of Social Media
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Discourse analysis (Schmitt's book chapter 4)
Discourse analysis (Schmitt's book chapter 4)Discourse analysis (Schmitt's book chapter 4)
Discourse analysis (Schmitt's book chapter 4)
 
Cognitive ethnography
Cognitive ethnographyCognitive ethnography
Cognitive ethnography
 
Metaphor detection
Metaphor detectionMetaphor detection
Metaphor detection
 
Ontology
OntologyOntology
Ontology
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynote
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
ARTIFICIAL INTELLIGENCE---UNIT 4.pptx
ARTIFICIAL INTELLIGENCE---UNIT 4.pptxARTIFICIAL INTELLIGENCE---UNIT 4.pptx
ARTIFICIAL INTELLIGENCE---UNIT 4.pptx
 
The Semantic Web #8 - Ontology
The Semantic Web #8 - OntologyThe Semantic Web #8 - Ontology
The Semantic Web #8 - Ontology
 
Multimodal presentation by_claire_thickett
Multimodal presentation by_claire_thickettMultimodal presentation by_claire_thickett
Multimodal presentation by_claire_thickett
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Extracting semantics from crowds
Extracting semantics from crowdsExtracting semantics from crowds
Extracting semantics from crowds
 
Seminar on legal reading, research, writing
Seminar on legal reading, research, writingSeminar on legal reading, research, writing
Seminar on legal reading, research, writing
 
Drifting distributions? Possibilities and risks of using distributional seman...
Drifting distributions? Possibilities and risks of using distributional seman...Drifting distributions? Possibilities and risks of using distributional seman...
Drifting distributions? Possibilities and risks of using distributional seman...
 
Content analysis
Content analysis Content analysis
Content analysis
 

Dernier

GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 

Dernier (20)

GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 

DH TOOLS: Introduction to Text Analysis Using BEAGLE and MDS

  • 1. DH TOOLS Introduction to Text Analysis Cameron Buckner Visiting Assistant Professor Department of Philosophy cjbuckner@uh.edu
  • 2. Our Initiative • Promote, facilitate, interact • Reading group • Tools workshops • Speaker series • Grantwriting support • Infrastructure advocacy http://www.uh.edu/class/digitalhumanities/
  • 3. Roadmap Goal today: Analyze texts using cutting-edge analyses from computational psycholinguistics with an off-the-shelf tool, word2word 1. What can you do with text analysis? 2. A little bit of theory: Semantic spaces 3. BEAGLE: The holographic lexicon 4. MDS: Visualizing multidimensional networks 5. Examples 6. Hands-on play
  • 4. What is DH? • Computation and interpretation • The use of computational tools for the production, exploration, analysis, and dissemination of humanistic knowledge • Thread common between new and old: pattern recognition • Includes • Digitization and archiving, markup • Analysis & visualization • Search & dissemination • Pedagogy
  • 5. Methods of Text Analysis I • Statistical analysis, information extraction, machine learning • Syntactic: word frequencies (Google n-grams), vocabulary usage, stylometry (authorship and genre), Pagerank http://www.nytimes.com/interactive/2012/09/06/us/politics/conventio n-word-counts.html
  • 6. Methods of Text Analysis II • Semantic: tf-idf, latent semantic analysis, latent dirichlet allocation, entropy-based measures, ontologies • Aim to model relevance, semantic similarity, taxonomic relationships, object properties and relations
  • 7. Reminders • Be creative and have fun, but if you want to publish… • Be principled: • Junk in, junk out • Always know assumptions required by a method • Analyses should hold up under trivial transformations of data representation • Be prepared for pragmatic design decisions • Go in with hypotheses and structured questions • Confirm with careful humanistic interpretation
  • 8. The Mental Lexicon • A “mental dictionary” • Contains information about: • Word meaning, grammatical roles, taxonomic relations, typical properties • Behavioral indicators: recognition speed, synonymy and relevance judgments, priming, frequency effects, categorization
  • 9. BEAGLE • A model that learns (unsupervised) a holographic mental lexicon automatically from text • History: Two approaches to semantic analysis • Co-occurrence based measures (“bag of words”, LSA, tf- idf) • Good at determining relevance, bad at determining roles and relations • Order-based measures (n-gram models, generative grammars, hidden Markov models) • Good at identifying grammatical and structural relations, bad at identifying relevance and meaning • Challenge: Can the two be combined?
  • 10. Context + Role • Assumption: People acquire an idiosyncratic mental lexicon from patterns of co-occurrence and syntactic relationships they encounter in natural language. • “You shall know a word by both the company it keeps and how it keeps it.” • Goal: If we could build a representation of a text’s context/role distributions, we could predict the structure of a mental lexicon that produced a corpus and/or that would be produced by it • Texts as “mental fingerprints”
  • 12. Basic Vector Approach 1. Start with a multi-dimensional vector space 2. Each term meaning is initially represented by a random, constant environment vector and an empty memory vector 3. Associations between terms can be represented by adding or averaging their environment vectors into their memory vectors 4. Each time terms co-occur, their memory vectors become closer in multi-dimensional similarity space
  • 13. Representing Order Info • Convolution: compressing outer-product matrix of two term vectors so that the product contains recoverable information about both • Example: z = x * y • Association vector z contains information about both x and y • Can (approximately) reconstruct source vector y by probing z (deconvolution) with x (and vice versa) • Combined BEAGLE memory vector: Context memory comes from vector addition, and order information comes from n-gram binding using convolution
  • 14.
  • 15.
  • 16. Combined Memory Vector • m = memory vector • e = initial random environment vector • p = position in sentence • lambda = constant chunking factor (size of n-gram window) • bind i,j = a non-commutative convolution of constant order vector with other environment vectors in n-gram
  • 17.
  • 18.
  • 20. So, BEAGLE method 1. Choose number of dimensions for vector space, size of n-gram window for order info 2. Clean up source documents using standard NLP (stop words, stemmers, etc.) 3. Learn context and order vectors from corpus, combine 4. Select words of interest 5. Visualize multi-dimensional space using favorite method (e.g. MDS)
  • 21. Limitations of BEAGLE • Only considers 1-sentence windows • Lexical ambiguity • Valence (e.g. synonyms, antonyms)
  • 22. MDS • A way to view a multi-dimensional similarity space • Collapses multi-dimensional space in way that tries to mutually preserve distances between vectors • Collapsing dimensions often reveals most significant [higher-order] dimensions
  • 23. Uses • How do two academic reference works compare in their coverage of a discipline? • Biases? Overlap? InPhO- Semantics Credit: Robert Rose
  • 24. Black = SEP, Red = IEP Credit: Jun Otsuka
  • 25. Political rhetoric • What can we learn from the “semantic space” derived from a party or candidate’s rhetoric? • Central issues? • Key comparisons? • Ideological focus/big tent? • Location on ideological spectrum? • Example: compare speeches from Republican and Democratic political conventions
  • 26. Heat Map: Terms most diagnostic of a speech’s being delivered by a Democrat “Hotter” indicates more diagnostic in comparison. Hottest terms = aarp, experience, affordable, abuelo, billionaires, afghanistan, beijing, biofuels, aliens
  • 27. Character Analysis • Moretti: “protagonist is the character that minimized the sum of the distances to all other vertices” • (But Moretti did it by hand!)
  • 28. Character similarity analysis from A Dance with Dragons
  • 29. Acknowledgements InPhO Team Brent Kievet-Kylar word2word Mike Jones BEAGLE

Notes de l'éditeur

  1. You shall know a word by the company it keepsYou shall know a word by the company it keeps and how it keeps it
  2. If you have a photographic medium that can record and reproduce not only the amount of light that strikes it but also its direction, then you can represent multiple dimensions of an object simultaneously and recall the desired dimensions by shining reconstruction beam at different angles.
  3. Snow/slow
  4. SEP = Stanford Encyclopedia of Philosophy, IEP = Internet Encyclopedia of Philosophy
  5. Analysis method = LSA
  6. Analysis computed on composite transcripts from 2012 Democratic and Republican national conventions.