SlideShare a Scribd company logo
1 of 26
Your spoken paper cannot be the same as your written paper Read more: Museums and the Web 2011 (MW2011): Presentation Guidelines | conference.archimuse.com
Computational Linguistics in Museums: Applications for Cultural Datasets Klavans Judith Susan Robert Chun Stein Guerra Raul
ComputationalLinguistics Language  - Words, Words, Words Use Meaning Syntax Shape of words Sounds
Applications Speech synthesis – 1980’s Talking Machines for the Blind Intelligent search – pre-google Finding names – who, what, where Translation Speech recognition Answering Questions – What is Watson?
Domains for Computational Linguistics Healthcare – interpreting patient records Government – helping people find information International Affairs – cross-language translation Law – analyzing Enron scandal email Marketing – Opinions on products Museums – analyzing text and tags associated with objects for better access
Computational  Linguistics for Metadata Building +
Computational Linguistics in Museums: Applications for Cultural Datasets Klavans Judith Susan Robert Chun Stein Guerra Raul
InterdisciplinaryResearch Computational Linguistics in Museums
Text, Tags, Trust Funded in 2008 by IMLS With the University of Maryland, and collaborative of museum partners Studying the relationships between social tags, scholarly text and resources, and the application of trust networks to improve access to museum collections.
MW 2011 Contributions		 Which Computational Linguistic tools can or should be applied to tags? How do these tools impact tag analysis? What results differ from the initial steve.museum results from Trant 2007? So what – for CL? So what – for Museums?
Hard  Challenges ,[object Object]
  How can tags be related to other tags? 		across languages 		across users ,[object Object]
   How can they be used?  ,[object Object]
Gallery Label This canvas was the first one Gauguin painted during the two months he spent in Provence.... Gauguin had rebelled against Impressionism's reliance on the visible world, and he altered nature's shapes and colors to suggest his own more subjective reaction to the landscape. While the rural subject and acidic colors show the influence of van Gogh, this image is more indebted to Paul Cézanne. In his careful integration of the haystack and farm buildings, Gauguin has echoed Cézanne's emphasis on geometric form.
Tools for Tags Morphological Analysis – Conflate when possible Cats, cat Haystacks, haystack Painting, paint ? What words are verbs, nouns, adjectives? How should multi-word tags be handled?
Raw Tags or Tokens
Results		 25%  93%  68%
1. NN=25205 2. JJ=6319 3. NNS=4041 4. NN_NN=2257 5. JJ_NN=1792 6. VBG=1043 7. VBN=727 8. NP=708 9. OD_NN=454 10. JJ_NNS=413
Top 10 POS Patterns: 1. NN=6706 2. NN_NN=1713 3. JJ_NN=1194 4. JJ=921 5. NNS=757 6. JJ_NNS=303 7. NN_NNS=300 8. VBG=238 9. NP=209 10. VBN_NN=202
Hard  Challenges ,[object Object]
  How can tags be related to other tags? 		across languages 		across users ,[object Object]
   How can they be used?  ,[object Object]
Irecursor to parsing.
   However, for social tags, parsing is not a meaningful step.  Research: ,[object Object]
  Link part of speech information with other lexical resources for disambiguation,[object Object]
What About “New England” Idioms / lexicalized phrases are more difficult Heuristic comparison to Wikipedia Titles matched 46% (30% distinct) of multiword tags E.g. “Grapes of Wrath”, “Irish Wolfhound”, “Franco-Prussian War” *Klavans and Golbeck, 2010

More Related Content

Similar to MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

Literacy Integration Presentation
Literacy Integration PresentationLiteracy Integration Presentation
Literacy Integration Presentation
NAFCareerAcads
 
Ounl Celstec Presentation
Ounl Celstec PresentationOunl Celstec Presentation
Ounl Celstec Presentation
Riina Vuorikari
 
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Bradley Allen
 

Similar to MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets (20)

An Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical SemanticsAn Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical Semantics
 
Literacy Integration Presentation
Literacy Integration PresentationLiteracy Integration Presentation
Literacy Integration Presentation
 
Diachronic Analysis
Diachronic AnalysisDiachronic Analysis
Diachronic Analysis
 
Antropologia/anthropology
Antropologia/anthropologyAntropologia/anthropology
Antropologia/anthropology
 
Ounl Celstec Presentation
Ounl Celstec PresentationOunl Celstec Presentation
Ounl Celstec Presentation
 
Vuorikari Multilingual Tagging behaviour by teachers
Vuorikari Multilingual Tagging behaviour by teachersVuorikari Multilingual Tagging behaviour by teachers
Vuorikari Multilingual Tagging behaviour by teachers
 
MacroMicroZoom.pdf
MacroMicroZoom.pdfMacroMicroZoom.pdf
MacroMicroZoom.pdf
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Graphic literacies for a digital age the survival of layout
Graphic literacies for a digital age the survival of layoutGraphic literacies for a digital age the survival of layout
Graphic literacies for a digital age the survival of layout
 
Technologies and englishes
Technologies and englishesTechnologies and englishes
Technologies and englishes
 
Reading Street
Reading StreetReading Street
Reading Street
 
Finding and Citing Online Images & Sources
Finding and Citing Online Images & SourcesFinding and Citing Online Images & Sources
Finding and Citing Online Images & Sources
 
Exploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic EnlightenmentExploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic Enlightenment
 
Class14
Class14Class14
Class14
 
Animal Essay.pdf
Animal Essay.pdfAnimal Essay.pdf
Animal Essay.pdf
 
Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
eMargin Presentation given to Skills Funding Agency
eMargin Presentation given to Skills Funding AgencyeMargin Presentation given to Skills Funding Agency
eMargin Presentation given to Skills Funding Agency
 
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
 
Vocabulary 2010 rubena
Vocabulary 2010 rubenaVocabulary 2010 rubena
Vocabulary 2010 rubena
 

More from museums and the web

MW2011: N. Di Blas +, A “Smart” Authoring and Delivery Tool for Multichannel ...
MW2011: N. Di Blas +, A “Smart” Authoring and Delivery Tool for Multichannel ...MW2011: N. Di Blas +, A “Smart” Authoring and Delivery Tool for Multichannel ...
MW2011: N. Di Blas +, A “Smart” Authoring and Delivery Tool for Multichannel ...
museums and the web
 
MW2011: D. Birchall + M. Henson, Gaming the museum
MW2011: D. Birchall + M. Henson, Gaming the museumMW2011: D. Birchall + M. Henson, Gaming the museum
MW2011: D. Birchall + M. Henson, Gaming the museum
museums and the web
 
MW2011: G. Chae +, Can Social Tagging Be a Tool to Reduce the Semantic Gap be...
MW2011: G. Chae +, Can Social Tagging Be a Tool to Reduce the Semantic Gap be...MW2011: G. Chae +, Can Social Tagging Be a Tool to Reduce the Semantic Gap be...
MW2011: G. Chae +, Can Social Tagging Be a Tool to Reduce the Semantic Gap be...
museums and the web
 
MW2011: L. Tallon + I. Froes, Going Mobile? Insights into the museum communit...
MW2011: L. Tallon + I. Froes, Going Mobile? Insights into the museum communit...MW2011: L. Tallon + I. Froes, Going Mobile? Insights into the museum communit...
MW2011: L. Tallon + I. Froes, Going Mobile? Insights into the museum communit...
museums and the web
 
MW2011: D. Laursen, Guided expectations: a case study of a sound collage audi...
MW2011: D. Laursen, Guided expectations: a case study of a sound collage audi...MW2011: D. Laursen, Guided expectations: a case study of a sound collage audi...
MW2011: D. Laursen, Guided expectations: a case study of a sound collage audi...
museums and the web
 
MW2011: J. Flemming +, Launching the MFA Multimedia Guide
MW2011: J. Flemming +, Launching the MFA Multimedia GuideMW2011: J. Flemming +, Launching the MFA Multimedia Guide
MW2011: J. Flemming +, Launching the MFA Multimedia Guide
museums and the web
 
MW2011: S. Fantoni, Mobile devices for orientation and way finding: the case ...
MW2011: S. Fantoni, Mobile devices for orientation and way finding: the case ...MW2011: S. Fantoni, Mobile devices for orientation and way finding: the case ...
MW2011: S. Fantoni, Mobile devices for orientation and way finding: the case ...
museums and the web
 
MW2011: Quigley, S., Integration of Print and Digital Publishing Workflows at...
MW2011: Quigley, S., Integration of Print and Digital Publishing Workflows at...MW2011: Quigley, S., Integration of Print and Digital Publishing Workflows at...
MW2011: Quigley, S., Integration of Print and Digital Publishing Workflows at...
museums and the web
 
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
museums and the web
 
MW2010: N. Proctor, The Museum Is Mobile: Cross-platform content design for a...
MW2010: N. Proctor, The Museum Is Mobile: Cross-platform content design for a...MW2010: N. Proctor, The Museum Is Mobile: Cross-platform content design for a...
MW2010: N. Proctor, The Museum Is Mobile: Cross-platform content design for a...
museums and the web
 
MW2010: M. Petrie + L. Tallon, The iPhone effect?: Comparing visitors’ and mu...
MW2010: M. Petrie + L. Tallon, The iPhone effect?: Comparing visitors’ and mu...MW2010: M. Petrie + L. Tallon, The iPhone effect?: Comparing visitors’ and mu...
MW2010: M. Petrie + L. Tallon, The iPhone effect?: Comparing visitors’ and mu...
museums and the web
 
MW2010: Building an online research community: The Reciprocal Research Network
MW2010: Building an online research community: The Reciprocal Research Network MW2010: Building an online research community: The Reciprocal Research Network
MW2010: Building an online research community: The Reciprocal Research Network
museums and the web
 
MW2010: S. Hazan et al., ATHENA: A Mechanism for Harvesting Europe's Museum H...
MW2010: S. Hazan et al., ATHENA: A Mechanism for Harvesting Europe's Museum H...MW2010: S. Hazan et al., ATHENA: A Mechanism for Harvesting Europe's Museum H...
MW2010: S. Hazan et al., ATHENA: A Mechanism for Harvesting Europe's Museum H...
museums and the web
 

More from museums and the web (20)

How to Give an Accessible Presentation - Yue-Ting Siu
How to Give an Accessible Presentation - Yue-Ting SiuHow to Give an Accessible Presentation - Yue-Ting Siu
How to Give an Accessible Presentation - Yue-Ting Siu
 
MW2011: N. Di Blas +, A “Smart” Authoring and Delivery Tool for Multichannel ...
MW2011: N. Di Blas +, A “Smart” Authoring and Delivery Tool for Multichannel ...MW2011: N. Di Blas +, A “Smart” Authoring and Delivery Tool for Multichannel ...
MW2011: N. Di Blas +, A “Smart” Authoring and Delivery Tool for Multichannel ...
 
MW2011: D. Birchall + M. Henson, Gaming the museum
MW2011: D. Birchall + M. Henson, Gaming the museumMW2011: D. Birchall + M. Henson, Gaming the museum
MW2011: D. Birchall + M. Henson, Gaming the museum
 
MW2011: G. Chae +, Can Social Tagging Be a Tool to Reduce the Semantic Gap be...
MW2011: G. Chae +, Can Social Tagging Be a Tool to Reduce the Semantic Gap be...MW2011: G. Chae +, Can Social Tagging Be a Tool to Reduce the Semantic Gap be...
MW2011: G. Chae +, Can Social Tagging Be a Tool to Reduce the Semantic Gap be...
 
MW2011: L. Tallon + I. Froes, Going Mobile? Insights into the museum communit...
MW2011: L. Tallon + I. Froes, Going Mobile? Insights into the museum communit...MW2011: L. Tallon + I. Froes, Going Mobile? Insights into the museum communit...
MW2011: L. Tallon + I. Froes, Going Mobile? Insights into the museum communit...
 
MW2011: D. Laursen, Guided expectations: a case study of a sound collage audi...
MW2011: D. Laursen, Guided expectations: a case study of a sound collage audi...MW2011: D. Laursen, Guided expectations: a case study of a sound collage audi...
MW2011: D. Laursen, Guided expectations: a case study of a sound collage audi...
 
MW2011: J. Flemming +, Launching the MFA Multimedia Guide
MW2011: J. Flemming +, Launching the MFA Multimedia GuideMW2011: J. Flemming +, Launching the MFA Multimedia Guide
MW2011: J. Flemming +, Launching the MFA Multimedia Guide
 
MW2011: S. Fantoni, Mobile devices for orientation and way finding: the case ...
MW2011: S. Fantoni, Mobile devices for orientation and way finding: the case ...MW2011: S. Fantoni, Mobile devices for orientation and way finding: the case ...
MW2011: S. Fantoni, Mobile devices for orientation and way finding: the case ...
 
MW2011: J. Bickersteth + C. Ainsley, Mobile Phones and Visitor Tracking
MW2011: J. Bickersteth + C. Ainsley, Mobile Phones and Visitor TrackingMW2011: J. Bickersteth + C. Ainsley, Mobile Phones and Visitor Tracking
MW2011: J. Bickersteth + C. Ainsley, Mobile Phones and Visitor Tracking
 
MW2011 Best of the Web Awards
MW2011 Best of the Web AwardsMW2011 Best of the Web Awards
MW2011 Best of the Web Awards
 
MW2011: Quigley, S., Integration of Print and Digital Publishing Workflows at...
MW2011: Quigley, S., Integration of Print and Digital Publishing Workflows at...MW2011: Quigley, S., Integration of Print and Digital Publishing Workflows at...
MW2011: Quigley, S., Integration of Print and Digital Publishing Workflows at...
 
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
 
MW2011: S. Kenderdine, Cultural Data Sculpting
MW2011: S. Kenderdine, Cultural Data SculptingMW2011: S. Kenderdine, Cultural Data Sculpting
MW2011: S. Kenderdine, Cultural Data Sculpting
 
MW2010: N. Proctor, The Museum Is Mobile: Cross-platform content design for a...
MW2010: N. Proctor, The Museum Is Mobile: Cross-platform content design for a...MW2010: N. Proctor, The Museum Is Mobile: Cross-platform content design for a...
MW2010: N. Proctor, The Museum Is Mobile: Cross-platform content design for a...
 
MW2010: J. Doyle + M. Doyle, Mixing Social Glue with Brick and Mortar: Experi...
MW2010: J. Doyle + M. Doyle, Mixing Social Glue with Brick and Mortar: Experi...MW2010: J. Doyle + M. Doyle, Mixing Social Glue with Brick and Mortar: Experi...
MW2010: J. Doyle + M. Doyle, Mixing Social Glue with Brick and Mortar: Experi...
 
MW2010: M. Petrie + L. Tallon, The iPhone effect?: Comparing visitors’ and mu...
MW2010: M. Petrie + L. Tallon, The iPhone effect?: Comparing visitors’ and mu...MW2010: M. Petrie + L. Tallon, The iPhone effect?: Comparing visitors’ and mu...
MW2010: M. Petrie + L. Tallon, The iPhone effect?: Comparing visitors’ and mu...
 
MW2010: Building an online research community: The Reciprocal Research Network
MW2010: Building an online research community: The Reciprocal Research Network MW2010: Building an online research community: The Reciprocal Research Network
MW2010: Building an online research community: The Reciprocal Research Network
 
MW2010: S. Hazan et al., ATHENA: A Mechanism for Harvesting Europe's Museum H...
MW2010: S. Hazan et al., ATHENA: A Mechanism for Harvesting Europe's Museum H...MW2010: S. Hazan et al., ATHENA: A Mechanism for Harvesting Europe's Museum H...
MW2010: S. Hazan et al., ATHENA: A Mechanism for Harvesting Europe's Museum H...
 
MW2010: D. Peacock, Putting Mallala on the map: Creating a wiki community wit...
MW2010: D. Peacock, Putting Mallala on the map: Creating a wiki community wit...MW2010: D. Peacock, Putting Mallala on the map: Creating a wiki community wit...
MW2010: D. Peacock, Putting Mallala on the map: Creating a wiki community wit...
 
MW2010: E. Bachta and R. Stein, Breaking the Bottleneck: Using Pseudo-Wikis t...
MW2010: E. Bachta and R. Stein, Breaking the Bottleneck: Using Pseudo-Wikis t...MW2010: E. Bachta and R. Stein, Breaking the Bottleneck: Using Pseudo-Wikis t...
MW2010: E. Bachta and R. Stein, Breaking the Bottleneck: Using Pseudo-Wikis t...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

  • 1. Your spoken paper cannot be the same as your written paper Read more: Museums and the Web 2011 (MW2011): Presentation Guidelines | conference.archimuse.com
  • 2. Computational Linguistics in Museums: Applications for Cultural Datasets Klavans Judith Susan Robert Chun Stein Guerra Raul
  • 3. ComputationalLinguistics Language - Words, Words, Words Use Meaning Syntax Shape of words Sounds
  • 4. Applications Speech synthesis – 1980’s Talking Machines for the Blind Intelligent search – pre-google Finding names – who, what, where Translation Speech recognition Answering Questions – What is Watson?
  • 5. Domains for Computational Linguistics Healthcare – interpreting patient records Government – helping people find information International Affairs – cross-language translation Law – analyzing Enron scandal email Marketing – Opinions on products Museums – analyzing text and tags associated with objects for better access
  • 6. Computational Linguistics for Metadata Building +
  • 7. Computational Linguistics in Museums: Applications for Cultural Datasets Klavans Judith Susan Robert Chun Stein Guerra Raul
  • 9. Text, Tags, Trust Funded in 2008 by IMLS With the University of Maryland, and collaborative of museum partners Studying the relationships between social tags, scholarly text and resources, and the application of trust networks to improve access to museum collections.
  • 10. MW 2011 Contributions Which Computational Linguistic tools can or should be applied to tags? How do these tools impact tag analysis? What results differ from the initial steve.museum results from Trant 2007? So what – for CL? So what – for Museums?
  • 11.
  • 12.
  • 13.
  • 14. Gallery Label This canvas was the first one Gauguin painted during the two months he spent in Provence.... Gauguin had rebelled against Impressionism's reliance on the visible world, and he altered nature's shapes and colors to suggest his own more subjective reaction to the landscape. While the rural subject and acidic colors show the influence of van Gogh, this image is more indebted to Paul Cézanne. In his careful integration of the haystack and farm buildings, Gauguin has echoed Cézanne's emphasis on geometric form.
  • 15. Tools for Tags Morphological Analysis – Conflate when possible Cats, cat Haystacks, haystack Painting, paint ? What words are verbs, nouns, adjectives? How should multi-word tags be handled?
  • 16. Raw Tags or Tokens
  • 17. Results 25% 93% 68%
  • 18. 1. NN=25205 2. JJ=6319 3. NNS=4041 4. NN_NN=2257 5. JJ_NN=1792 6. VBG=1043 7. VBN=727 8. NP=708 9. OD_NN=454 10. JJ_NNS=413
  • 19. Top 10 POS Patterns: 1. NN=6706 2. NN_NN=1713 3. JJ_NN=1194 4. JJ=921 5. NNS=757 6. JJ_NNS=303 7. NN_NNS=300 8. VBG=238 9. NP=209 10. VBN_NN=202
  • 20.
  • 21.
  • 22.
  • 24.
  • 25.
  • 26. What About “New England” Idioms / lexicalized phrases are more difficult Heuristic comparison to Wikipedia Titles matched 46% (30% distinct) of multiword tags E.g. “Grapes of Wrath”, “Irish Wolfhound”, “Franco-Prussian War” *Klavans and Golbeck, 2010
  • 27. Wish List - Better ways to tame the proliferation of rich but “noisy” content Clustering over tags for similarity Clustering over tags and terms from text Matching over existing terms to identify meaningful units Apply machine learning techniques to guess meaning Bigrams, Trigram, Thesauri, Corpus Analysis
  • 28. Acknowledgements Steve.museum project members T3 and steve.museum museum partners University of Maryland, T3 group IMA Museum ……and other participants

Editor's Notes

  1. Take this seriously.
  2. IN presenting this paper, start with something not in the paper.
  3. Still need to finish
  4. Words,words, words.