SlideShare une entreprise Scribd logo
1  sur  73
Télécharger pour lire hors ligne
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Visual Analytics for Linguistics - Day 3
Olga Scrivner
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
What You Will Learn
DAY 1 Introduction to Visual Analytics
DAY 2 Visualization Methods, Design, and Tools
DAY 3 Working with Unstructured Data
DAY 4 Working with Structured Data
DAY 5 Advanced Analytics
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Our Materials - Web Site
http:
//obscrivn.wixsite.com/visualization
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
What We Need
Interactive Text Mining Suite
Voyant
R and Rstudio
R libraries: ggplot2, plotly, reshape2
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
What We Need
Interactive Text Mining Suite
Voyant
R and Rstudio
R libraries: ggplot2, plotly, reshape2
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Quiz: Which Chart Are You?
https://www.sisense.com/blog/quiz-chart/
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating a Bar Chart
The value of a column in the data set. This is done with
stat=“identity”, which leaves the y values unchanged.
The count of cases for each group - each x value
represents one group.
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating a Bar Chart - Sample
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating a Bar Chart - Sample
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating a Bar Chart - Values
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating a Bar Chart - Counts
To get a bar graph of counts, we do not map a variable to y,
and we use stat=“count”
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating a Bar Chart - Counts
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Title
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating Line Chart
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating Line Chart
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating Area Chart
http://www.r-graph-gallery.com/136-stacked-area-chart/
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating Scatter Plot
http://www.r-graph-gallery.com/272-basic-scatterplot-with-ggplot2/
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating Bubble Plot
https://plot.ly/r/bubble-charts/
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating Bubble Plot
https://plot.ly/r/bubble-charts/
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating Heatmap
http:
//www.r-graph-gallery.com/215-interactive-heatmap-with-plotly/
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating Heatmap
http://www.r-graph-gallery.com/215-interactive-heatmap-with-plotly/
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating Heatmap
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Creating Word Cloud
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Word Cloud - Contest - 10 min
Create your own word cloud
Look at the function - type ?wordcloud2 and run
Can you change a shape of your cloud?
Save (or make a screenshot) and post it on
twitter/facebook etc
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Why Analyze Text?
The “epic transformation of archives” - shifting from print to
digital archival form (Folsom, 2007)
“As our collective knowledge continues to be digitized and
stored (...) it becomes more difficult to find and discover
what we are looking for.” (Blei 2012)
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Text Mining Challenges
source - 1) Dan Jurafsky, 2) Text Mining with R for Social Science Research (Ryan Wesslen)
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Basic Terminology
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
What is Bag of Words?
Simplest way to quantify text
Word order ignored
Term counts per document
N-grams (uni-grams, bi-grams)
Source - Chris Manning
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Preprocessing
Tokenization (splitting words)
Cleaning (lower case, punctuation)
Stemming
Filter (stopwords)
Source - Wesslen
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Preprocessing
Tokenization (splitting words)
Cleaning (lower case, punctuation)
Stemming
works, worked → work
Filter (stopwords)
Source - Wesslen
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Preprocessing
Tokenization (splitting words)
Cleaning (lower case, punctuation)
Stemming
works, worked → work
Filter (stopwords)
and, the, a
Source - Wesslen
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Macro-analysis
Concept Macro-analysis (Jockers, 2013)
“the construction of abstract models”
(Jasinski, 2001)
Methods Tag clouds, heat maps, clusters, topics,
network graphs
Tools GUI: Voyant, Papermachine, ITMS
TUI: Mallet, Meta, R and Python packages
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Visual Analytics
Visual Analytics - “The science of analytical reasoning
facilitated by visual interactive interfaces” (Thomas et all.,
2005)
Graphs, maps and trees for literature analysis (Moretti,
2005)
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Visualization Methods
Word clouds to analyze a novel (Vuillemot et al., 2009)
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Visualization Methods
Social network graphs of characters in Greek tragedies
(Rydberg-Cox, 2011)
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Visualization Methods
Literary fingerprint and summaries (Oelke et al., 2012)
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Visualization Methods
Tracking emotion and sentiment in fairy tales
(Mohammad, 2012)
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Topic Modeling
Discovering underlying theme of collection from Science magazine
1990-2000 (Blei 2012)
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Topics - Word Term
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Topics - Word Term
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Wikipedia Topics
http:
//www.princeton.edu/~achaney/tmve/
wiki100k/browse/topic-presence.html
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Wikipedia Topics - Assignment - 10 min
1. Language Related Topic
2. Words: Dialect
3. Related Document: Macedonian Language
4. Related Document: Egyptian hieroglyphs
5. Go to Full article:
6. Find meaning:
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Voyant
http://voyant-tools.org/
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Voyant
http://voyant-tools.org/
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Voyant - 10 min
http://voyant-tools.org/
Examine visualization charts (identify types
and properties)
Apply various filters and queries
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Voyant Tools - Bubblelines - 7 min
http://docs.voyant-tools.org/tools/
Delete top terms
Search for man and woman
Make sure to have “separate lines for terms” clicked
Change search terms
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Voyant Tools - Pair Work - 10 min
http://docs.voyant-tools.org/tools/
Examine visualization methods
Select 5 methods
Look at the documentation and how to use them
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Interactive Text Mining Suite
A user-friendly tool for quantitative analysis and
visualization of unstructured data
Platform-independent
Interactive
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
ITMS Structure
1. File Uploads
Upload files (txt, pdf, rdf and Google books API)
2. Data Preparation
Data preprocessing (stopwords, stemming, metadata)
3. Data Visualization
Word frequencies, Cluster analysis and topic modeling
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
ITMS Structure
1. File Uploads
Upload files (txt, pdf, rdf and Google books API)
2. Data Preparation
Data preprocessing (stopwords, stemming, metadata)
3. Data Visualization
Word frequencies, Cluster analysis and topic modeling
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Workshop Files
Download 3 text files
https://iu.box.com/s/
knua9af3bip7g63s3zdax9ti4z243ldz
NY Times articles (3 documents in a plain text format)
ITMS Web site:
http://www.interactivetextminingsuite.com
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Upload File
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Upload File
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Upload File
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Preprocessing Data
Before performing data analysis we should preprocess data.
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Preprocessing Options
Select preprocessing options and click apply.
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Stopwords
Stopwords (e.g. the, and): select Default for English
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Manual Removal of Stopwords
Based on the need, remove any additional stopwords that you
may consider a noise, e,g, paper, shows etc
Select apply
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Stemming
To improve analytics, you can stem all your tokens, ex.
instead of worked, works, working, you will have only one
relevant stem work
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Metadata Extraction
You can extract or upload metadata. You will need
datestamp (year) information for chronological topic
modeling.
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Visualization
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Word Cloud Representation
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Customization
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Cluster Analysis
You need to have at least three documents
Documents will be grouped based on their term similarity
measures
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Cluster Analysis
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Topic Modeling
LDA (Latent Dirichlet allocation)
STM (Structural Topic model)
Chronological topic visualization (lda): requires
metadata
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Topic Modeling Tuning
Selection of topics (how many different themes)
Selection of words per theme (how many words per
topic)
Selection of iteration
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Topic Model Selection
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
LDA Topic Model
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
STM Topic Model
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Other Formats - Google Books
Before switching to other data formats, refresh your local
browser.
Start with File Uploads and select Structured Data
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Other Formats - Google Books
Select your search terms and submit
Current limitation is 40 books
Visual Analytics
for Linguistics -
Day 3
Olga Scrivner
Course Info
Charts
Text
Visualization
ITMS
Preprocessing
Data
Data
Visualization
Cluster Analysis
Topic Modeling
Google Book API
Resources
http://www.rdatamining.com/examples/text-mining
https:
//en.wikibooks.org/wiki/R_Programming/Text_Processing
http://data.library.virginia.edu/
reading-pdf-files-into-r-for-text-mining/
http://www.katrinerk.com/courses/
words-in-a-haystack-an-introductory-statistics-course/
schedule-words-in-a-haystack/
r-code-the-text-mining-package
tm package

Contenu connexe

Similaire à Visual Analytics for Linguistics - Day 3 ESSLLI

Similaire à Visual Analytics for Linguistics - Day 3 ESSLLI (20)

For project
For projectFor project
For project
 
DevFest Taipei - Advanced Ticketing System.pdf
DevFest Taipei - Advanced Ticketing System.pdfDevFest Taipei - Advanced Ticketing System.pdf
DevFest Taipei - Advanced Ticketing System.pdf
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data Team
 
Search as-you-type (Exact search)
Search as-you-type (Exact search)Search as-you-type (Exact search)
Search as-you-type (Exact search)
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
 
20181108 abecon klantendag - vernieuwing - breinwave - peter de haas - incl...
20181108   abecon klantendag - vernieuwing - breinwave - peter de haas - incl...20181108   abecon klantendag - vernieuwing - breinwave - peter de haas - incl...
20181108 abecon klantendag - vernieuwing - breinwave - peter de haas - incl...
 
Nikhil CV
Nikhil CVNikhil CV
Nikhil CV
 
Julia text mining_inmobi
Julia text mining_inmobiJulia text mining_inmobi
Julia text mining_inmobi
 
Splitup Syllabus for Class XII
Splitup Syllabus for Class XIISplitup Syllabus for Class XII
Splitup Syllabus for Class XII
 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & Statistics
 
Resume
ResumeResume
Resume
 
Vinit b. shah
Vinit b. shahVinit b. shah
Vinit b. shah
 
Working with text data
Working with text dataWorking with text data
Working with text data
 
Tu_Ni_Resume
Tu_Ni_ResumeTu_Ni_Resume
Tu_Ni_Resume
 
PoojabResume
PoojabResumePoojabResume
PoojabResume
 
Santhosh_Resume Current
Santhosh_Resume CurrentSanthosh_Resume Current
Santhosh_Resume Current
 
Resume_Sneha
Resume_SnehaResume_Sneha
Resume_Sneha
 
Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...
 
ChelzCV-Python
ChelzCV-PythonChelzCV-Python
ChelzCV-Python
 
Resume
ResumeResume
Resume
 

Plus de Olga Scrivner

Plus de Olga Scrivner (20)

Engaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptxEngaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptx
 
HICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning TechnologiesHICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning Technologies
 
The power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systemsThe power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systems
 
Cognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use DisorderCognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use Disorder
 
Introduction to Web Scraping with Python
Introduction to Web Scraping with PythonIntroduction to Web Scraping with Python
Introduction to Web Scraping with Python
 
Call for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and TechnologyCall for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and Technology
 
Jupyter machine learning crash course
Jupyter machine learning crash courseJupyter machine learning crash course
Jupyter machine learning crash course
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash course
 
The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...
 
If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...
 
Introduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web ApplicationIntroduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web Application
 
Introduction to Overleaf Workshop
Introduction to Overleaf WorkshopIntroduction to Overleaf Workshop
Introduction to Overleaf Workshop
 
R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
Gender Disparity in Employment and Education
Gender Disparity in Employment and EducationGender Disparity in Employment and Education
Gender Disparity in Employment and Education
 
CrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for BeginnersCrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for Beginners
 
Optimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with ShinyOptimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with Shiny
 
Data Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowData Analysis and Visualization: R Workflow
Data Analysis and Visualization: R Workflow
 
Reproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid dataReproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid data
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
 

Dernier

sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 

Dernier (20)

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 

Visual Analytics for Linguistics - Day 3 ESSLLI

  • 1. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Visual Analytics for Linguistics - Day 3 Olga Scrivner
  • 2. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API What You Will Learn DAY 1 Introduction to Visual Analytics DAY 2 Visualization Methods, Design, and Tools DAY 3 Working with Unstructured Data DAY 4 Working with Structured Data DAY 5 Advanced Analytics
  • 3. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Our Materials - Web Site http: //obscrivn.wixsite.com/visualization
  • 4. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API What We Need Interactive Text Mining Suite Voyant R and Rstudio R libraries: ggplot2, plotly, reshape2
  • 5. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API What We Need Interactive Text Mining Suite Voyant R and Rstudio R libraries: ggplot2, plotly, reshape2
  • 6. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Quiz: Which Chart Are You? https://www.sisense.com/blog/quiz-chart/
  • 7. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating a Bar Chart The value of a column in the data set. This is done with stat=“identity”, which leaves the y values unchanged. The count of cases for each group - each x value represents one group.
  • 8. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating a Bar Chart - Sample
  • 9. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating a Bar Chart - Sample
  • 10. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating a Bar Chart - Values
  • 11. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating a Bar Chart - Counts To get a bar graph of counts, we do not map a variable to y, and we use stat=“count”
  • 12. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating a Bar Chart - Counts
  • 13. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Title
  • 14. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating Line Chart
  • 15. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating Line Chart
  • 16. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating Area Chart http://www.r-graph-gallery.com/136-stacked-area-chart/
  • 17. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating Scatter Plot http://www.r-graph-gallery.com/272-basic-scatterplot-with-ggplot2/
  • 18. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating Bubble Plot https://plot.ly/r/bubble-charts/
  • 19. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating Bubble Plot https://plot.ly/r/bubble-charts/
  • 20. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating Heatmap http: //www.r-graph-gallery.com/215-interactive-heatmap-with-plotly/
  • 21. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating Heatmap http://www.r-graph-gallery.com/215-interactive-heatmap-with-plotly/
  • 22. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating Heatmap
  • 23. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Creating Word Cloud
  • 24. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Word Cloud - Contest - 10 min Create your own word cloud Look at the function - type ?wordcloud2 and run Can you change a shape of your cloud? Save (or make a screenshot) and post it on twitter/facebook etc
  • 25. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Why Analyze Text? The “epic transformation of archives” - shifting from print to digital archival form (Folsom, 2007) “As our collective knowledge continues to be digitized and stored (...) it becomes more difficult to find and discover what we are looking for.” (Blei 2012)
  • 26. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Text Mining Challenges source - 1) Dan Jurafsky, 2) Text Mining with R for Social Science Research (Ryan Wesslen)
  • 27. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Basic Terminology
  • 28. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API What is Bag of Words? Simplest way to quantify text Word order ignored Term counts per document N-grams (uni-grams, bi-grams) Source - Chris Manning
  • 29. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Preprocessing Tokenization (splitting words) Cleaning (lower case, punctuation) Stemming Filter (stopwords) Source - Wesslen
  • 30. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Preprocessing Tokenization (splitting words) Cleaning (lower case, punctuation) Stemming works, worked → work Filter (stopwords) Source - Wesslen
  • 31. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Preprocessing Tokenization (splitting words) Cleaning (lower case, punctuation) Stemming works, worked → work Filter (stopwords) and, the, a Source - Wesslen
  • 32. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Macro-analysis Concept Macro-analysis (Jockers, 2013) “the construction of abstract models” (Jasinski, 2001) Methods Tag clouds, heat maps, clusters, topics, network graphs Tools GUI: Voyant, Papermachine, ITMS TUI: Mallet, Meta, R and Python packages
  • 33. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Visual Analytics Visual Analytics - “The science of analytical reasoning facilitated by visual interactive interfaces” (Thomas et all., 2005) Graphs, maps and trees for literature analysis (Moretti, 2005)
  • 34. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Visualization Methods Word clouds to analyze a novel (Vuillemot et al., 2009)
  • 35. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Visualization Methods Social network graphs of characters in Greek tragedies (Rydberg-Cox, 2011)
  • 36. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Visualization Methods Literary fingerprint and summaries (Oelke et al., 2012)
  • 37. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Visualization Methods Tracking emotion and sentiment in fairy tales (Mohammad, 2012)
  • 38. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Topic Modeling Discovering underlying theme of collection from Science magazine 1990-2000 (Blei 2012)
  • 39. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Topics - Word Term
  • 40. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Topics - Word Term
  • 41. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Wikipedia Topics http: //www.princeton.edu/~achaney/tmve/ wiki100k/browse/topic-presence.html
  • 42. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Wikipedia Topics - Assignment - 10 min 1. Language Related Topic 2. Words: Dialect 3. Related Document: Macedonian Language 4. Related Document: Egyptian hieroglyphs 5. Go to Full article: 6. Find meaning:
  • 43. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Voyant http://voyant-tools.org/
  • 44. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Voyant http://voyant-tools.org/
  • 45. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Voyant - 10 min http://voyant-tools.org/ Examine visualization charts (identify types and properties) Apply various filters and queries
  • 46. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Voyant Tools - Bubblelines - 7 min http://docs.voyant-tools.org/tools/ Delete top terms Search for man and woman Make sure to have “separate lines for terms” clicked Change search terms
  • 47. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Voyant Tools - Pair Work - 10 min http://docs.voyant-tools.org/tools/ Examine visualization methods Select 5 methods Look at the documentation and how to use them
  • 48. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Interactive Text Mining Suite A user-friendly tool for quantitative analysis and visualization of unstructured data Platform-independent Interactive
  • 49. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API ITMS Structure 1. File Uploads Upload files (txt, pdf, rdf and Google books API) 2. Data Preparation Data preprocessing (stopwords, stemming, metadata) 3. Data Visualization Word frequencies, Cluster analysis and topic modeling
  • 50. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API ITMS Structure 1. File Uploads Upload files (txt, pdf, rdf and Google books API) 2. Data Preparation Data preprocessing (stopwords, stemming, metadata) 3. Data Visualization Word frequencies, Cluster analysis and topic modeling
  • 51. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Workshop Files Download 3 text files https://iu.box.com/s/ knua9af3bip7g63s3zdax9ti4z243ldz NY Times articles (3 documents in a plain text format) ITMS Web site: http://www.interactivetextminingsuite.com
  • 52. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Upload File
  • 53. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Upload File
  • 54. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Upload File
  • 55. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Preprocessing Data Before performing data analysis we should preprocess data.
  • 56. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Preprocessing Options Select preprocessing options and click apply.
  • 57. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Stopwords Stopwords (e.g. the, and): select Default for English
  • 58. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Manual Removal of Stopwords Based on the need, remove any additional stopwords that you may consider a noise, e,g, paper, shows etc Select apply
  • 59. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Stemming To improve analytics, you can stem all your tokens, ex. instead of worked, works, working, you will have only one relevant stem work
  • 60. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Metadata Extraction You can extract or upload metadata. You will need datestamp (year) information for chronological topic modeling.
  • 61. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Visualization
  • 62. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Word Cloud Representation
  • 63. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Customization
  • 64. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Cluster Analysis You need to have at least three documents Documents will be grouped based on their term similarity measures
  • 65. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Cluster Analysis
  • 66. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Topic Modeling LDA (Latent Dirichlet allocation) STM (Structural Topic model) Chronological topic visualization (lda): requires metadata
  • 67. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Topic Modeling Tuning Selection of topics (how many different themes) Selection of words per theme (how many words per topic) Selection of iteration
  • 68. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Topic Model Selection
  • 69. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API LDA Topic Model
  • 70. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API STM Topic Model
  • 71. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Other Formats - Google Books Before switching to other data formats, refresh your local browser. Start with File Uploads and select Structured Data
  • 72. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Other Formats - Google Books Select your search terms and submit Current limitation is 40 books
  • 73. Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Resources http://www.rdatamining.com/examples/text-mining https: //en.wikibooks.org/wiki/R_Programming/Text_Processing http://data.library.virginia.edu/ reading-pdf-files-into-r-for-text-mining/ http://www.katrinerk.com/courses/ words-in-a-haystack-an-introductory-statistics-course/ schedule-words-in-a-haystack/ r-code-the-text-mining-package tm package