Data Visualization and Digital Tools for the Humanities

•Télécharger en tant que PPTX, PDF•

3 j'aime•1,823 vues

This document summarizes available data visualization tools and datasets for digital humanities research. It discusses examples of tools for searching, discovery, visualization, analysis and publishing including Perseus, JSTOR Data For Research, Wordseer, Google Ngram Viewer, Concordancing tools, Google's Public Data Explorer, NodeXL for network and text analysis, and Google Refine for data cleaning. The document also outlines roles for librarians in providing comparisons of tools, research support, and helping shift reference services to support new forms of data-driven research.

Formation Technologie

Data visualization and digital humanities research: a survey of available data sets and tools LITA National Forum 2011 St. Louis, MO Friday, September 30, 2011 Erik Mitchell, University of Maryland Susan Sharpless Smith, Wake Forest University

Motivation “Digital humanities needs gateway drugs. Kudos to the pushers on the Google Books team.” - Dan Cohen http://www.dancohen.org/2010/12/19/ “Linked open data could have the same leveraging effect that the World Wide Web had on computing, said Micki McGee, an assistant professor of sociology at Fordham University” -Steve Kolowich, The Promise of Digital Humanities, Inside HigherEd

Birth of a word “Imagine if you could record your life, everything you said, everything you did available in a perfect memory store at your finger tips. “ - Deb Roy – The Birth of a Word http://www.ted.com/

Overview Discuss examples of data-focused research tools Explore tools Consider roles for librarians Wrap-up/Q & A

Searching and Discovery Examples: BYU Corpuahttp://corpus.byu.edu/ WOK Citation Mapping WOK

Analysis and publishing NodeXLhttp://nodexl.codeplex.com/

Tool exploration Discover / Search What kinds of discovery tools exist and how common are the discovery features across different datasets / systems? Visualization What visualization features exist, are there products that are easy to use, are the skills transferable? Analysis / Annotation What analytical tools are included, what analysis techniques are common?

JSTOR Data For Research http://dfr.jstor.org

Wordseer AditiMuralidharan Marti Hearst http://bebop.berkeley.edu/wordseer

Google’s Ngram Viewerbooks.google.com/ngramsculturomics.org But here's the rub. Google Books, as others point out, wasn't really built for research. . . That means Google Books didn't come with the interfaces scholars need for vast data manipulation . . . http://chronicle.com/article/The-Humanities-Go-Google/65713/

Ted talk on Google NGRAM viewer http://www.ted.com/talks/what_we_learned_from_5_million_books.html

Concordancing Eric Lease Morgan - http://dh.crc.nd.edu/sandbox/cyl/catalog/

Google’s public data explorer http://www.google.com/publicdata/

Data analysis - NodeXL http://nodexl.codeplex.com/ Analyzing Social Media Networks with NodeXL: Insights from a Connected World

Data cleaning – Google Refine http://code.google.com/p/google-refine

Data visualization – Google Fusion Tables http://www.google.com/fusiontables/DataSource?dsrcid=332788 http://google.com/fusiontables

Research/teaching need Researcher needs vary from advanced linguistic analysis and IT support to need for basic digital content/infrastructure Corpus-based research

Librarian contributions Domain specific, tool-type specific comparisons IT and research support – data analysis, data curation, tool/data sources identification Shift from “reference” to “research” in sync with move from resource discovery to thematic analysis

Next steps Build new skills, develop new systems Create tutorials guides Explore connections between data/curation and publishing and these tools – so is there a connection Explore role of library discovery systems and consider new feature implementation.

Recommandé

Maass mass-omahaBMaass97

December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM

Brandeis presentationdkyriakis

New Directions in Information Organization: A Linked Data Model with BIBFRAMESharonYang

Workset Creation for Scholarly Analysis Project presentation at CNI 2013Harriett Green

Search EnginesFacebird McSweeney

Beyond the Scanned Image: A Needs Assessment of Faculty Users of Digital Coll...Harriett Green

Recommandé

Maass mass-omahaBMaass97

December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM

Brandeis presentationdkyriakis

New Directions in Information Organization: A Linked Data Model with BIBFRAMESharonYang

Workset Creation for Scholarly Analysis Project presentation at CNI 2013Harriett Green

Search EnginesFacebird McSweeney

Beyond the Scanned Image: A Needs Assessment of Faculty Users of Digital Coll...Harriett Green

Electronic BooksRebecca Kate Miller

Humanities Users in the Digital Age: Library Needs AssessmentHarriett Green

Digital Libraries on International CampusesHarriett Green

Building the Archive of DH ResearchHarriett Green

User Engagement with Digital Archives: A Case Study of Emblematica OnlineHarriett Green

Research Management ToolsRebecca Kate Miller

Data for the Humanitieslibrarianrafia

Digital humanitiesMokhtar Ben Henda

Tei2012 slides revisedHarriett Green

Robert hunteriDocQ

資訊素養工作坊PowerPointkaikwong

Data management plans archeology class 10 18 2012Elizabeth Brown

an introduction to social media and researchRichard Hall

Introduction to Digital humanitiesmarklocklear

Oss swotBill Ott

Humanities data curation slidesHarriett Green

EricRochesterResumeEric Rochester

Meyer dig ethno_2013sdpEric Meyer

The Hidden Data of Social Media Rearch_CSS-winter-symposiumKatrin Weller

Query Design for Digital Methods by Richard RogersDigital Methods Initiative

Making Sense of a Pile of PDFsLauren Pressley

#socstrat: Leveraging Social and Mobile Technologies in Experiential CoursesSusan Smith

Contenu connexe

Tendances

Electronic BooksRebecca Kate Miller

Humanities Users in the Digital Age: Library Needs AssessmentHarriett Green

Digital Libraries on International CampusesHarriett Green

Building the Archive of DH ResearchHarriett Green

User Engagement with Digital Archives: A Case Study of Emblematica OnlineHarriett Green

Research Management ToolsRebecca Kate Miller

Data for the Humanitieslibrarianrafia

Digital humanitiesMokhtar Ben Henda

Tei2012 slides revisedHarriett Green

Robert hunteriDocQ

資訊素養工作坊PowerPointkaikwong

Data management plans archeology class 10 18 2012Elizabeth Brown

an introduction to social media and researchRichard Hall

Introduction to Digital humanitiesmarklocklear

Oss swotBill Ott

Humanities data curation slidesHarriett Green

EricRochesterResumeEric Rochester

Meyer dig ethno_2013sdpEric Meyer

The Hidden Data of Social Media Rearch_CSS-winter-symposiumKatrin Weller

Query Design for Digital Methods by Richard RogersDigital Methods Initiative

Tendances (20)

Electronic Books

Humanities Users in the Digital Age: Library Needs Assessment

Digital Libraries on International Campuses

Building the Archive of DH Research

User Engagement with Digital Archives: A Case Study of Emblematica Online

Research Management Tools

Data for the Humanities

Digital humanities

Tei2012 slides revised

Robert hunter

資訊素養工作坊PowerPoint

Data management plans archeology class 10 18 2012

an introduction to social media and research

Introduction to Digital humanities

Oss swot

Humanities data curation slides

EricRochesterResume

Meyer dig ethno_2013sdp

The Hidden Data of Social Media Rearch_CSS-winter-symposium

Query Design for Digital Methods by Richard Rogers

En vedette

Making Sense of a Pile of PDFsLauren Pressley

#socstrat: Leveraging Social and Mobile Technologies in Experiential CoursesSusan Smith

The Pittsburgh Citizen: An IntroductionThe Public Square Project

Building a Culture of Assessment @ ZSRSusan Smith

From Department Director to Race DirectorSusan Smith

DH101 2013/2014 course 7 - OCR, Printed text recognition, Handwriting recogni...Frederic Kaplan

Lessons Learned: Through a Librarian's LensSusan Smith

A Decade of Presentation Lessons in One HourLauren Pressley

A Library for the Whole Student: Creating a Multi-dimensional Culture of Heal...Susan Smith

Natural Language Processing and Pythonanntp

Similarities & Differences in Financial Management Between a Small Private an...Susan Smith

Another Data PointLauren Pressley

Rob Berman Value Proposition -- LinkedInRob Berman

Zsr presentationSusan Smith

A Culture of Creativity and ActionLauren Pressley

Change: Personal & ProfessionalLauren Pressley

En vedette (16)

Making Sense of a Pile of PDFs

#socstrat: Leveraging Social and Mobile Technologies in Experiential Courses

The Pittsburgh Citizen: An Introduction

Building a Culture of Assessment @ ZSR

From Department Director to Race Director

DH101 2013/2014 course 7 - OCR, Printed text recognition, Handwriting recogni...

Lessons Learned: Through a Librarian's Lens

A Decade of Presentation Lessons in One Hour

A Library for the Whole Student: Creating a Multi-dimensional Culture of Heal...

Natural Language Processing and Python

Similarities & Differences in Financial Management Between a Small Private an...

Another Data Point

Rob Berman Value Proposition -- LinkedIn

Zsr presentation

A Culture of Creativity and Action

Change: Personal & Professional

Similaire à Data Visualization and Digital Tools for the Humanities

GOLD GALILEO 2010 Summarydasmith1038

The Era of OpenPhilip Bourne

The Web, the User and the LibraryGuus van den Brekel

Bibliotheek & Onderzoek 2.0?Guus van den Brekel

Virtual Research Networks : Towards Research 2.0Guus van den Brekel

e-Research and the Demise of the Scholarly ArticleDavid De Roure

Introduction for skills seminar on Search and Data Mining, Master of European...Gerben Zaagsma

Learning as a Social ProcessRobert Cormia

Social Media Tools and Mobile Apps for Research and PublishingCheryl Peltier-Davis

1 to-1 across disciplines-finalehelfant

Lern, june 2016, digital media slidesYork University - Osgoode Hall Law School

ESSIR 2013 - IR and Social MediaArjen de Vries

Big Data and the Future of PublishingAnita de Waard

Learning in Networks of KnowledgeJudy O'Connell

Discovery elsewhereJenn Riley

Humanities Research with the Web of DataMathieu d'Aquin

Online-Resources-and-ICT-in-Research.pptxRomaSmart1

Online-Resources-and-ICT-in-Research.pptxForum of Blended Learning

HMID6303 Assignment 1 - YeapYeap Aun

Future of Scholarly CommunicationsDavid De Roure

Similaire à Data Visualization and Digital Tools for the Humanities (20)

GOLD GALILEO 2010 Summary

The Era of Open

The Web, the User and the Library

Bibliotheek & Onderzoek 2.0?

Virtual Research Networks : Towards Research 2.0

e-Research and the Demise of the Scholarly Article

Introduction for skills seminar on Search and Data Mining, Master of European...

Learning as a Social Process

Social Media Tools and Mobile Apps for Research and Publishing

1 to-1 across disciplines-final

Lern, june 2016, digital media slides

ESSIR 2013 - IR and Social Media

Big Data and the Future of Publishing

Learning in Networks of Knowledge

Discovery elsewhere

Humanities Research with the Web of Data

Online-Resources-and-ICT-in-Research.pptx

HMID6303 Assignment 1 - Yeap

Future of Scholarly Communications

Plus de Susan Smith

Wake Forest University Faculty Survey 2016Susan Smith

Z. Smith Reynolds Library - The Sutton Years 2004-2015Susan Smith

What ZSR Library Does to Build Value/Sage Value ResearchSusan Smith

Life Lessons Learned: Through a Librarian's LensSusan Smith

Working Together, Evolving Value for Academic Libraries/Examples from One Lib...Susan Smith

Digital Forsyth: Through a Social Entrepreneurial LensSusan Smith

Digital Forsyth: A Partnership/Budgeting in a Collaborative GrantSusan Smith

ZSR Library Presents: Wikis and BlogsSusan Smith

Bringing Information Literacy into the Social Sphere: A Case Study Using Soci...Susan Smith

Teaching Them (2.0) to Fish: Web 2.0 as Subject and Method in Information Lit...Susan Smith

Digital Forsyth: An NCECHO Collaborative Multi-year Digitization ProjectSusan Smith

On the Road in the Deep South: A Collaborative Experiential Course in Social ...Susan Smith

Competing For Fun And Funds: The First Annual "Wake the Library 5K and fun RunSusan Smith

Plus de Susan Smith (13)

Wake Forest University Faculty Survey 2016

Z. Smith Reynolds Library - The Sutton Years 2004-2015

What ZSR Library Does to Build Value/Sage Value Research

Life Lessons Learned: Through a Librarian's Lens

Working Together, Evolving Value for Academic Libraries/Examples from One Lib...

Digital Forsyth: Through a Social Entrepreneurial Lens

Digital Forsyth: A Partnership/Budgeting in a Collaborative Grant

ZSR Library Presents: Wikis and Blogs

Bringing Information Literacy into the Social Sphere: A Case Study Using Soci...

Teaching Them (2.0) to Fish: Web 2.0 as Subject and Method in Information Lit...

Digital Forsyth: An NCECHO Collaborative Multi-year Digitization Project

On the Road in the Deep South: A Collaborative Experiential Course in Social ...

Competing For Fun And Funds: The First Annual "Wake the Library 5K and fun Run

Dernier

The basics of sentences session 2pptx copy.pptxheathfieldcps1

Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K

Accessible design: Minimum effort, maximum impactdawncurless

Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre

Código Creativo y Arte de Software | Unidad 1Maestría en Comunicación Digital Interactiva - UNR

1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood

Advanced Views - Calendar View in Odoo 17Celine George

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

Activity 01 - Artificial Culture (1).pdfciinovamais

social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3

Software Engineering Methodologies (overview)eniolaolutunde

Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande

APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management

Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622

Interactive Powerpoint_How to Master effective communicationnomboosow

Nutritional Needs Presentation - HLTH 104misteraugie

Measures of Central Tendency: Mean, Median and ModeThiyagu K

Mattingly "AI & Prompt Design: The Basics of Prompt Design"National Information Standards Organization (NISO)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"National Information Standards Organization (NISO)

Dernier (20)

The basics of sentences session 2pptx copy.pptx

Measures of Dispersion and Variability: Range, QD, AD and SD

Accessible design: Minimum effort, maximum impact

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

Código Creativo y Arte de Software | Unidad 1

1029-Danh muc Sach Giao Khoa khoi 6.pdf

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx

Advanced Views - Calendar View in Odoo 17

Q4-W6-Restating Informational Text Grade 3

Activity 01 - Artificial Culture (1).pdf

social pharmacy d-pharm 1st year by Pragati K. Mahajan

Software Engineering Methodologies (overview)

Web & Social Media Analytics Previous Year Question Paper.pdf

APM Welcome, APM North West Network Conference, Synergies Across Sectors

Disha NEET Physics Guide for classes 11 and 12.pdf

Interactive Powerpoint_How to Master effective communication

Nutritional Needs Presentation - HLTH 104

Measures of Central Tendency: Mean, Median and Mode

Mattingly "AI & Prompt Design: The Basics of Prompt Design"

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Data Visualization and Digital Tools for the Humanities

1. Data visualization and digital humanities research: a survey of available data sets and tools LITA National Forum 2011 St. Louis, MO Friday, September 30, 2011 Erik Mitchell, University of Maryland Susan Sharpless Smith, Wake Forest University

2. Motivation “Digital humanities needs gateway drugs. Kudos to the pushers on the Google Books team.” - Dan Cohen http://www.dancohen.org/2010/12/19/ “Linked open data could have the same leveraging effect that the World Wide Web had on computing, said Micki McGee, an assistant professor of sociology at Fordham University” -Steve Kolowich, The Promise of Digital Humanities, Inside HigherEd

3. Birth of a word “Imagine if you could record your life, everything you said, everything you did available in a perfect memory store at your finger tips. “ - Deb Roy – The Birth of a Word http://www.ted.com/

4. Overview Discuss examples of data-focused research tools Explore tools Consider roles for librarians Wrap-up/Q & A

5. Taxonomy of uses

6. Searching and Discovery Examples: BYU Corpuahttp://corpus.byu.edu/ WOK Citation Mapping WOK

7. Visualization Free Visualization Tools

8. Analysis and publishing NodeXLhttp://nodexl.codeplex.com/

9. Tool Comparison - linguistics

10. Tool exploration Discover / Search What kinds of discovery tools exist and how common are the discovery features across different datasets / systems? Visualization What visualization features exist, are there products that are easy to use, are the skills transferable? Analysis / Annotation What analytical tools are included, what analysis techniques are common?

11. Perseus http://www.perseus.tufts.edu

12. JSTOR Data For Research http://dfr.jstor.org

13. Wordseer AditiMuralidharan Marti Hearst http://bebop.berkeley.edu/wordseer

14. Google’s Ngram Viewerbooks.google.com/ngramsculturomics.org But here's the rub. Google Books, as others point out, wasn't really built for research. . . That means Google Books didn't come with the interfaces scholars need for vast data manipulation . . . http://chronicle.com/article/The-Humanities-Go-Google/65713/

15. Ted talk on Google NGRAM viewer http://www.ted.com/talks/what_we_learned_from_5_million_books.html

16. Concordancing Eric Lease Morgan - http://dh.crc.nd.edu/sandbox/cyl/catalog/

17. Google’s public data explorer http://www.google.com/publicdata/

18. Data analysis - NodeXL http://nodexl.codeplex.com/ Analyzing Social Media Networks with NodeXL: Insights from a Connected World

19. Data cleaning – Google Refine http://code.google.com/p/google-refine

20. Data visualization – Google Fusion Tables http://www.google.com/fusiontables/DataSource?dsrcid=332788 http://google.com/fusiontables

21. Research/teaching need Researcher needs vary from advanced linguistic analysis and IT support to need for basic digital content/infrastructure Corpus-based research

22. Librarian contributions Domain specific, tool-type specific comparisons IT and research support – data analysis, data curation, tool/data sources identification Shift from “reference” to “research” in sync with move from resource discovery to thematic analysis

23. Next steps Build new skills, develop new systems Create tutorials guides Explore connections between data/curation and publishing and these tools – so is there a connection Explore role of library discovery systems and consider new feature implementation.

24. Sites of interest

Notes de l'éditeur

Today presenting on a “summer exploration” project completed at WFUWide scope, exploratory in natureHere today to share what we foundFrom article: The Promise of Digital Humanities (Inside HigherEd) September 28, 2011They are building tools that could facilitate insights into history, language, art and culture that human researchers might never have been able to glean on their own. And some say that could help restore public interest in the humanities.Digital humanities is a hot topic this year: The NEH held a symposium on Tuesday for 60 recipients of its 2011 Digital Humanities Start-Up Grants, most of whom were given between $25,000 and $50,000. digital humanities -- a branch of scholarship that takes the computational rigor that has long undergirded the sciences and applies it the study of history, language, art and culture.
Got interested because WFU faculty were talking about DH researchWe saw lots of enthusiasm but little knowledge about what really existed. Story – different definitions, WFU DH Institute, Computational Humanities, linguisticsPoint – it is clear that the field has energy and that DH is focusing on the same structures and information tools as libraries
discuss how data and computational power is sexyWe pause on this video for a moment to mention this video specifically (Deb Roy, 20 minutes)Focuses on the impact of large scale data collection and cross analysis“Imagine if you could record your life, everything you said, everything you did available in a perfect memory store at your finger tips”Picture shows connection between a televised moment (Obama's State of the Union Speech) on the bottom of the screen and all of the social media conversations happening in real time at the top of the screen. Network graph - wider view of experience, understand ideas from more than one perspectivePoint –Consider the impact if librarians could help students and researchers begin this type of data analysis
going to present some examples of “data” focused research tools. Definition: Databases that allow asking research questions focused on dataWe are going to explore tools that fit three functions – searching/discovery, visualization, analysis/publishConsider how these tools could impact teaching and researchConsider the roles that librarians can play in this field
Goal in this chart is to introduce the types of tools – Show how they complement each otherDiscoveryText searchingCitation chaining: tracing citations both forward and backward, something core to academic research, WOK (citation mapping give a visual of this idea), and this data can be exported.Concept exploration, facets and contextual metadataVisualization (for both presentation and behind the scenes)MappingGraphingChartingData cleanup and normalizationAnalysis/PublishingDataset publishing:Statistical analysisAnnotation (tagging text), drilling in, inverse Be aware that there is overlap among the groups.
Discuss types of discoveryCorpus (collection of written texts) exploration – Full text, linguistic components, concepts (copa, coca, ngram . .) examples at: http://corpus.byu.eduBibliometrics – citation trees (Web of Knowledge, DFR)Bibliometrics is a set of methods used to study or measure texts and information. Citation analysis and content analysis are commonly used bibliometric methods.Used to study impact, of researchers, papers, journals, academic output (Eigenfactor recommends) new project coming outMetadata – structured data on any topic (Google public data, GIS)Hybrid – JSTOR DFR (Data for Research) is a good example – it includes full text searching and metadata limiting and bibliometrics
Purpose:main goal of data visualization is to communicate information clearly and effectively through graphical meansMany free tools are available for visualization (link on slide)Purpose of these tools is to provide visualization and data exploration platforms – Nodexl is an excel plug in for windowsTypes of visualizationData cleaningData analysisGraphical representations of data: ie: table, map, heatmap, line chart, bar graph, pie chart, scatter plot, timeline, storyline or motion (animation over time) (Google Fusion Tablesdoes all these)One example using GIS http://inside.uidaho.edu/ Google Fusion Table: http://www.google.com/fusiontables/Home
These tools allow statistical analysis of data or provide a platform for visualization or publishing.Great understandable example is the Google public data explorerWe will look at this in a few minutes
Second thing we did was explore. We tried to compare linguistic tools:Article - Literary & Linguistic Computing as a journal – Corpus design criteria – Volume 7, 1992How we explored – interviews, datasets, tools, Focused on linguisticsGoal of this slide is to talk through one comparison exercise: corpus.byu.eduCorpus of contemporary americanenglish, Google Books, Brisish National CorpusFindings – lack of consistency, new search features, Need here is for published comparative documentsAll familiar but different context Word frequency, concordancing, lemmatization (roots), semantic and syntactic relationships, kwic, sense disambiguation, links, population scope (open closed), randomPoint – librarians already know what these tools can do to an extent.word frequencyconcordancingIndex of words in text, often shown in context of sentence structureAbility to search/lemmatizationSearching words using rootsSemantic relationshipsDerived relationship (e.g. is done by, is described as)Syntactic relationships part of speech labelingSentence decomposition (Stanford parser)collocationKWIC, sequence of words that are taken togethersense disambiguation(e.g. run, running, ran)link to lexical databasedictionary of words - http://wordnet.princeton.edu/how is population defined?Is the corpus open our closed? Was it a random sample, a limited text source? What impact does that have on generalization?synchronic/diachronicDoes the corpus focus on a "point in time" or change over time?monolingual/bilingual/plurilingualWhat languages are represented?
Nowwe are going to explore some toolsWe grouped into three areas: discovery, visualization, and analysisWe have included some questions that we asked as we explored
A tool that I did not know about until recently is Perseus, mentioned in the project bambooDigital humanities research tool at Tufts.Listened to David Mimnofrom Princeton talk about Computational HumanitiesSpanning between distant and close readingDavid was the head programmer for Perseus for many yearsFeatures includeDirect access to text searchingAbility to explore connections between documents – lexicons, concordances (alphabetical list of words)See position of text in larger collection
At the talk I was at, David also talked about his work on computational topic modeling using JSTOR dataIt is an interesting talk – you can find it at mith.umd.edu under digital dialogues – recap his ideaDavid’s idea was – if you analyze all of the text in a specific set of journals (Classics journals), you canSee changes in topics and language over time – he found that in the 1980s the two fields of philology (books) and archaeology converged in some journalsGenerate topics that show granular ‘aboutness’ – Some interesting discussions about value of human vs computing modelsExplore aboutness not from a qualitative ‘hunch’ but from statistical comparisonDemo – I want to see what topics academics have explored with janeausten1. dfr.jstor.org, login2. You can search, view chart data or view citations3. you can export although by default you are limited to 1000 records4. I searched for janeausten, limited to research articles, limited to subject language and literature5. I then downloaded the data >> data requests > submit new request6. Download key terms, csv -> janeaustenkeyterms7. check email, wait, download
From another presentation at UMD MITH – AditiMuralidharanThis is a highly focused corpus database that includes semantic relationship analysis, visualization tools, and data annotationNeat hybrid systemWordseer focuses on Slave NarrativesDiscovery, Annotation, VisualizationSemantic relationshipsDemo: link to itExamples > god, point to chartAdd blessClick on heat maps, or read/annotate
Google NGRAM I expect we are familiar with the NGRAM viewer http://books.google.com/ngramsWork by Jean-Baptiste Michel and lots of others2009 snapshot, 5.2 Million books, English, French, German, Hebrew, Russian, Spanish, ChineseBest data is between 1800 and 2000Searching – date, phrase, language, smoothing (Average of occurrence over years), ngrams (how far from other words it is – within 2, 3 4) Discover trends – for instance, while the concept of “good cats” has remained steady (but limited), there has been diminishing focus on ”good dogs” in the 20th century. Does this point to a disturbing trend in dog goodness?But be careful – culturomics.org – what does this data say – Paper by jean-baptisemichel, and lots of other folks “quantitative analysis of culture using millions of books”
In fact there was a recent ted talk on the ngram viewer. In 15 minutes it gives a good overview of the background and uses of the system
We found innumerable tools for processing!Eric Lease Morgan at Notre Dame has done some interesting work in this area and has released his Lingua perl modules for processingThere are other methods – Stanford parser for example offers these toolsDeveloped concordancing software Available in cpanGreat iPad demo hereHis data is from internet archive – interesting source for data for harvesting and analysisYou can see he focuses on some other specific search methdodsPoint of this one – wordseer and cathlolic – both special collection focused, different research tools availableProblem – this proves to be very confusing for people trying to practice a research method across multiple data sets.
Google Public data explorerA visualization tool that animates so you can see change over time. You also can embed charts into your website (link icon in upper right corner)Over 40 data sets currently uploaded and ready to use.Allows simple visualization tools to be applied to any datasetQuick Demo of unemployment rateDo the search, Show how you limit to resultsViews: Line Chart, bar chart, map, bubble chart
NodeXLis a tool to display and analyze data through a network graph. It is open source, windows only and is a Excel template.Specifically,NodeXL was designed especially to facilitate learning the concepts and methods of social network analysis with visualization as a key componentWhat can you do?*Easily* customize the graph’s appearance; zoom, scale and pan the graph; dynamically filter vertices and edges; alter the graph’s layout; find clusters of related vertices; and calculate graph metrics.What I like is that I could use it quickly by importing data. There are built-in connections for getting networks from Twitter, Flickr, YouTube, and your local email are provided. Additional importers for Exchange Email, Facebook, and Hyperlink networks are available here.There is a 47 page tutorial, which was a good indication that it is not totally intuitive to learn, however it has good flexibility
We also found a number of data clearning and tools. There is a great site digitalresearchtools.pbworks.com that lists a lot o these toolsGoogle refine runs in Chrome, it supports up to 200K rows – which is actually not that much when we get to humanities data1. goto erikmitchell.dyndns.org:3333 - explain what you are doing2. I downloaded key terms from jstor doing a search of janeausten3. I imported the file using defaults4. It imported weight and key terms5. Weight is the relevance or centrality to document (e.g. every document has a term with rank 1)6. Lets say I just want to see the central words7. weight, facet, numeric facet8. limit to .98-19. You can see this drops the matching rows10. Now lets say I want to see how many times each of these key terms is used11. keyterms -> facet -> text facet12. sort by count13. You can include and exclude, perform other data analysis,etc.If this is interesting there are some good quick video tutorials on the siteModify for XML or wiki publishing formats
Google refine is also designed to work with their visualization tools. We showed public data explorer,There is also Google fusion tablesFusion tables makes it very easy to connect and explore dataHere is one link
So what did we find.We found lots of tools, lots of uses, lots of dataWe ultimately decided that there is a strong research and teaching needslide is to talk about the data focused research activities that we found researchers engaged in.A second part of our project was to explore research needsWidely varied – some statistical , some linguistic, some just wanted to digitize stuffJerid is actually doing research on movie subtitles and translation . .not sure what this is. .Focus on - http://francojc.wordpress.com/List of publications from corpus-based research:http://corpus.byu.edu/publicationSearch.asp
We also found that there are areas for us to contributeConversational.One BYU comparison:http://googlebooks.byu.edu/compare-googleBooks.asp Compares “possible” and “not possible” for the following functionality:Exact word and phrasesRelated words and cultural insightsSearching for conceptsChanges in meaningCollates (nearby words) s and cultural shiftsFunction of wordsGrammatical changesLanguage change and genreA tool to locate research data is being developed by Purdue Libraries (Michael Witt) and Penn State: Databib. The goal is “to create a community-driven, annotated bibliography of research data repositories”http://databib.lib.purdue.edu/
Next stepsfirst - librarians already understand metadata interoperability and harvesting, we should expand our understanding of these fields to include full text data and develop toolkits to facilitate harvesting and meshing of research data from different sources. This includes tools like the Stanford NLP parser (Stanford Parsernlp.stanford.edu/software/lex-parser.shtml ) , a tool that facilitates the coding and parsing of text data.Second - librarians understand searching across multiple systems - we need to build on this skill by honing our abilities to perform content anslys and generalize results.Third - We need to better understand the landscape of research data. This means understanding types of data set and sources of data. It also means having thea ability to crosswalk data between databases. It also means getting past resource disovery and into resource analysisFourth - we need qualitative and quantitative research skills - we ned to be able to help researchers know when they have a representative sample, how to harvest, code and analyze that dataFifth - We bring a multi-disciplinary understanding of domains of knowledge - we need to leverage that familiarity with active research agendas.Story here is about that hathitrustserach in summon and in oclc. These searching platforms are tying to leverage book fulltext in a new way but what else could they do?
Can we add the list of tools? https://digitalresearchtools.pbworks.com/w/page/17801672/FrontPage