SlideShare a Scribd company logo
1 of 20
JSTOR Sustainability
Collection
Sharon Garewal, JSTOR Senior Metadata Librarian
Ron Snyder, ITHAKA Labs Director of Research and Development
Overview
 Sustainability collection defined
 Utilization of the thesaurus within the sustainability collection
 Subject matter experts enlisted
 Results
 Live demo
JSTOR- a quick primer
 3,200+ journals & 30,000+ books
 9.3 million full length articles
 70 million pages
 2.9 million book reviews
 138 million content accesses in
2013
 100 million searches per year
http://www.jstor.org/
Sustainability Collection: what will it
be?
 Driver: Emerging interdisciplinary
area that JSTOR wanted to
support in both research and
teaching needs.
 Core topics of Cities and
Urbanization, Food and
Agriculture, Industrial Ecology,
Resource Economics, Forestry and
Land Use and Environmental
Policy and Law
 Composed of journals, books,
grey literature (working reports,
research reports, technical reports
etc.)
 Specialized functionality to
support research by including
semantic indexing to help
researchers locate related terms
and concepts. This is where the
JSTOR Thesaurus (JTHES) comes
into play!
JTHES
19 Top terms, 57,470 Terms;
103,129 rules
The challenge
 To assemble a list of key terms in Sustainability
 The terms will be used to organize and tag sustainability-related research articles
on JSTOR starting in 2015.
 These terms will also be used for an auto complete function in the search
component.
 Utilize the JTHES in a live prototype
 This was the first project where we looked at how to use the thesaurus as an
intelligence layer within a collection. How should it work? How do we do this?
How do we get this done? The
options…
 Create a new thesaurus for sustainability:
 Pros: Specific to sustainability
 Cons: Remembering to make changes in more than one place. Cost associated with
creating and maintaining a separate thesaurus
 Create a sustainability branch within JTHES:
 Pros: Could BT (Broader terms) all relevant branches and terms from elsewhere in the
JTHES into 1 branch
 Cons: Redundant; Multiple BT’s clutter up the JTHES
 Create a facet to tag terms within JTHES as “Sustainability”:
 Pros: Creates a flat list (in faceted view) of all of the terms in that facet; Easy to
maintain
 Cons: Does not show a hierarchy; Cannot have multiple facets
The road to sustainability…
 Research: examined existing
glossaries and thesauri created
by research libraries, discipline
associations and individual
scholars in each of the disciplines.
 Existing terms (pulling lists)
 Existing branches (clean up)
 Adding new terms
 Adding new branches: Food
studies, Urban studies, etc.
 Constructing new rules and
refining existing rules
 Testing content
Enlisting Subject matter experts
 Contacted faculty members in ten disciplines to go over the subset of terms
assembled in their discipline and review those terms with an eye toward:
 Is this how people in the field express this concept?
 Is it correctly included in the sustainability facet?
 Are there any important terms or concepts that we've missed? (including
acronyms, synonyms, variant spellings, inverted phrases)
SME spreadsheets
Each SME was slightly
different in how they
approached their
subject areas with
some SMEs being
reluctant to give much
feedback and others
giving large amounts
of feedback to sift
through.
Example of terms pulled from Law, Public administration/policy and International/global studies
View- Facet
provides
alphabetical
list of all
tagged terms
labs.jstor.org/sustainability
Implementation of the Sustainability
Prototype
 The thesaurus and semantic index are used for content discovery and
presentation
 The identification of a “sustainability collection” from the JSTOR corpus was
performed using topic modeling (specifically LDA – Latent Dirichlet
Allocation)
 A model of 100 topics was generated from the content
 Staff assigned sustainability scores for each of the topics based on a review of
the top words in each topic
 Each document in the JSTOR corpus was then assigned a sustainability score of
0-9 based on the sustainability scores for the topics most closely associated with
the document
Weighting of document-level indexed
terms
 Document-level weights were computed for each sematic term using TF-
IDF
 TF-IDF is a measure of how important a word is to a document in a collection
 The TF-IDF value increases proportionally to the number of times the word
appears in a document (the ‘TF’ or term frequency), but is offset by how
common the word is in a corpus (the ‘IDF’ or inverse document frequency)
 The TF-IDF weighted terms are used to:
 order the terms displayed for each document
 boost document relevancy when index terms are used in discovery
Auto-suggest and refining results
[Thesaurus slide: a new thing, metadata we create, screenshot(s) of
Sustainability Portal]
Refinements in our use of the thesaurus
and semantic index in sustainability
 Auto calculation of sustainability score using LDA topics and thesaurus
sustainability facet
 Calculate topics and term correlations
 Compute sustainability score for each topic based on the most relevant terms and
sustainability facet
 Compute a sustainability score for each corpus document based on topic weights
and topic sustainability score
 Automated LDA topic labeling
 Labeling topics generated by unsupervised topic modeling is an ongoing challenge
 We’re investigating the feasibility of using the same topic/term correlations used to
compute sustainability scores to assign labels
 Attempts to find the thesaurus term that best characterizes the most highly correlated
terms for each topic
Other JSTOR Labs projects/tools using
the thesaurus and semantic index
http://labs.jstor.org/jthes/
http://labs.jstor.org/snap/
http://labs.jstor.org/readings/
Thesaurus Visualization Tool
And some other JSTOR Labs projects
http://labs.jstor.org/reflowit/
http://labs.jstor.org/shakespeare/
Thank you!
Sharon.Garewal@ithaka.org
Ronald.Snyder@ithaka.org

More Related Content

What's hot

Journal Impact Factors and Citation Analysis
Journal Impact Factors and Citation AnalysisJournal Impact Factors and Citation Analysis
Journal Impact Factors and Citation Analysisrepayne
 
E-LEARN: Determining Scope
E-LEARN: Determining ScopeE-LEARN: Determining Scope
E-LEARN: Determining ScopeRose Petralia
 
Impact factor of journals
Impact factor of journalsImpact factor of journals
Impact factor of journalsDr. Pinki Insan
 
Paper 8: Business and Management Journal Quality (Mingers)
Paper 8: Business and Management Journal Quality (Mingers)Paper 8: Business and Management Journal Quality (Mingers)
Paper 8: Business and Management Journal Quality (Mingers)Kent Business School
 
Citation Trends in Library & Information Science
Citation Trends in Library & Information ScienceCitation Trends in Library & Information Science
Citation Trends in Library & Information ScienceRohit Jangra
 
Scopus : the largest abstract and citation database of peer-reviewed literature
Scopus : the largest abstract and citation database of peer-reviewed literatureScopus : the largest abstract and citation database of peer-reviewed literature
Scopus : the largest abstract and citation database of peer-reviewed literatureSumit Kumar Gupta
 
Scopus: Research Metrics and Indicators
Scopus: Research Metrics and Indicators Scopus: Research Metrics and Indicators
Scopus: Research Metrics and Indicators Michaela Kurschildgen
 
Research performance measurement
Research performance measurementResearch performance measurement
Research performance measurementElham Abied
 
Rubriq: Independent Peer Review & Journal Matching ISMTE 2012
Rubriq: Independent Peer Review & Journal Matching   ISMTE 2012Rubriq: Independent Peer Review & Journal Matching   ISMTE 2012
Rubriq: Independent Peer Review & Journal Matching ISMTE 2012Keith Collier
 
Measuring the Research Impact
Measuring the Research ImpactMeasuring the Research Impact
Measuring the Research ImpactRohit Jangra
 
Bibliometrics (1) JIFs and JCRs
Bibliometrics (1) JIFs and JCRsBibliometrics (1) JIFs and JCRs
Bibliometrics (1) JIFs and JCRsJamie Bisset
 

What's hot (18)

Journal Impact Factors and Citation Analysis
Journal Impact Factors and Citation AnalysisJournal Impact Factors and Citation Analysis
Journal Impact Factors and Citation Analysis
 
E-LEARN: Determining Scope
E-LEARN: Determining ScopeE-LEARN: Determining Scope
E-LEARN: Determining Scope
 
Impact factor of journals
Impact factor of journalsImpact factor of journals
Impact factor of journals
 
journal and impact factor
journal and impact factorjournal and impact factor
journal and impact factor
 
Introduction to Bibliometrics
Introduction to BibliometricsIntroduction to Bibliometrics
Introduction to Bibliometrics
 
Why You Should Not Use The Journal Impact Factor To Evaluate Research
Why You Should Not Use The Journal Impact Factor To Evaluate ResearchWhy You Should Not Use The Journal Impact Factor To Evaluate Research
Why You Should Not Use The Journal Impact Factor To Evaluate Research
 
Paper 8: Business and Management Journal Quality (Mingers)
Paper 8: Business and Management Journal Quality (Mingers)Paper 8: Business and Management Journal Quality (Mingers)
Paper 8: Business and Management Journal Quality (Mingers)
 
Citation Trends in Library & Information Science
Citation Trends in Library & Information ScienceCitation Trends in Library & Information Science
Citation Trends in Library & Information Science
 
OCN 1010 (Spring 2020)
OCN 1010 (Spring 2020)OCN 1010 (Spring 2020)
OCN 1010 (Spring 2020)
 
Scopus : the largest abstract and citation database of peer-reviewed literature
Scopus : the largest abstract and citation database of peer-reviewed literatureScopus : the largest abstract and citation database of peer-reviewed literature
Scopus : the largest abstract and citation database of peer-reviewed literature
 
Understanding the Basics of Journal Metrics
Understanding the Basics of Journal MetricsUnderstanding the Basics of Journal Metrics
Understanding the Basics of Journal Metrics
 
Reportch2
Reportch2Reportch2
Reportch2
 
Scopus: Research Metrics and Indicators
Scopus: Research Metrics and Indicators Scopus: Research Metrics and Indicators
Scopus: Research Metrics and Indicators
 
Research performance measurement
Research performance measurementResearch performance measurement
Research performance measurement
 
Rubriq: Independent Peer Review & Journal Matching ISMTE 2012
Rubriq: Independent Peer Review & Journal Matching   ISMTE 2012Rubriq: Independent Peer Review & Journal Matching   ISMTE 2012
Rubriq: Independent Peer Review & Journal Matching ISMTE 2012
 
Measuring the Research Impact
Measuring the Research ImpactMeasuring the Research Impact
Measuring the Research Impact
 
Conducting your literature search
Conducting your literature search Conducting your literature search
Conducting your literature search
 
Bibliometrics (1) JIFs and JCRs
Bibliometrics (1) JIFs and JCRsBibliometrics (1) JIFs and JCRs
Bibliometrics (1) JIFs and JCRs
 

Similar to JSTOR Sustainability Collection - DHUG 2015

The repository ecology: an approach to understanding repository and service i...
The repository ecology: an approach to understanding repository and service i...The repository ecology: an approach to understanding repository and service i...
The repository ecology: an approach to understanding repository and service i...R. John Robertson
 
DHA Prospectus FormStudents Complete your doctoral prospectus
DHA Prospectus FormStudents  Complete your doctoral prospectus DHA Prospectus FormStudents  Complete your doctoral prospectus
DHA Prospectus FormStudents Complete your doctoral prospectus LinaCovington707
 
OR for Sustainability: Assessing Contribution and Call for Action (Euro2013 p...
OR for Sustainability: Assessing Contribution and Call for Action (Euro2013 p...OR for Sustainability: Assessing Contribution and Call for Action (Euro2013 p...
OR for Sustainability: Assessing Contribution and Call for Action (Euro2013 p...Miles Weaver
 
Strategy Choices and ChangeAssessment Brief Individual Strateg.docx
Strategy Choices and ChangeAssessment Brief Individual Strateg.docxStrategy Choices and ChangeAssessment Brief Individual Strateg.docx
Strategy Choices and ChangeAssessment Brief Individual Strateg.docxjohniemcm5zt
 
Database, Indices and Metrics.pptx
Database, Indices and Metrics.pptxDatabase, Indices and Metrics.pptx
Database, Indices and Metrics.pptxAmareshJha4
 
Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...Ryan Scicluna
 
Lit Reviews for the Health Sciences
Lit Reviews for the Health SciencesLit Reviews for the Health Sciences
Lit Reviews for the Health SciencesRobin Featherstone
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone
 
Resource Management document 10.docx
Resource Management document 10.docxResource Management document 10.docx
Resource Management document 10.docxintel-writers.com
 
1, 100-200 word summary(Max250 words)Assessment 1 - Group Annotate.docx
1, 100-200 word summary(Max250 words)Assessment 1 - Group Annotate.docx1, 100-200 word summary(Max250 words)Assessment 1 - Group Annotate.docx
1, 100-200 word summary(Max250 words)Assessment 1 - Group Annotate.docxdurantheseldine
 
InstructionsFinal ProjectThe Final Project should demonstrat.docx
InstructionsFinal ProjectThe Final Project should demonstrat.docxInstructionsFinal ProjectThe Final Project should demonstrat.docx
InstructionsFinal ProjectThe Final Project should demonstrat.docxJeniceStuckeyoo
 
IBR 2.pptx
IBR 2.pptxIBR 2.pptx
IBR 2.pptxKwekuJnr
 
FORMAT OF SYNOPSIS 12-10-2018.pdf
FORMAT OF SYNOPSIS 12-10-2018.pdfFORMAT OF SYNOPSIS 12-10-2018.pdf
FORMAT OF SYNOPSIS 12-10-2018.pdfAZIZ ULLAH SURANI
 

Similar to JSTOR Sustainability Collection - DHUG 2015 (20)

Thesaurus 2101
Thesaurus 2101Thesaurus 2101
Thesaurus 2101
 
The repository ecology: an approach to understanding repository and service i...
The repository ecology: an approach to understanding repository and service i...The repository ecology: an approach to understanding repository and service i...
The repository ecology: an approach to understanding repository and service i...
 
DHA Prospectus FormStudents Complete your doctoral prospectus
DHA Prospectus FormStudents  Complete your doctoral prospectus DHA Prospectus FormStudents  Complete your doctoral prospectus
DHA Prospectus FormStudents Complete your doctoral prospectus
 
Case Study: JSTOR: A Year Later
Case Study: JSTOR: A Year LaterCase Study: JSTOR: A Year Later
Case Study: JSTOR: A Year Later
 
OR for Sustainability: Assessing Contribution and Call for Action (Euro2013 p...
OR for Sustainability: Assessing Contribution and Call for Action (Euro2013 p...OR for Sustainability: Assessing Contribution and Call for Action (Euro2013 p...
OR for Sustainability: Assessing Contribution and Call for Action (Euro2013 p...
 
Strategy Choices and ChangeAssessment Brief Individual Strateg.docx
Strategy Choices and ChangeAssessment Brief Individual Strateg.docxStrategy Choices and ChangeAssessment Brief Individual Strateg.docx
Strategy Choices and ChangeAssessment Brief Individual Strateg.docx
 
Database, Indices and Metrics.pptx
Database, Indices and Metrics.pptxDatabase, Indices and Metrics.pptx
Database, Indices and Metrics.pptx
 
A combination of reduction and expansion approaches to handle with long natur...
A combination of reduction and expansion approaches to handle with long natur...A combination of reduction and expansion approaches to handle with long natur...
A combination of reduction and expansion approaches to handle with long natur...
 
Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...
 
Garewal Harnessing the Power of a Semantic Index at JSTOR
Garewal Harnessing the Power of a Semantic Index at JSTORGarewal Harnessing the Power of a Semantic Index at JSTOR
Garewal Harnessing the Power of a Semantic Index at JSTOR
 
Lit Reviews for the Health Sciences
Lit Reviews for the Health SciencesLit Reviews for the Health Sciences
Lit Reviews for the Health Sciences
 
Unit 6
Unit 6Unit 6
Unit 6
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
Resource Management document 10.docx
Resource Management document 10.docxResource Management document 10.docx
Resource Management document 10.docx
 
Indexing
IndexingIndexing
Indexing
 
1, 100-200 word summary(Max250 words)Assessment 1 - Group Annotate.docx
1, 100-200 word summary(Max250 words)Assessment 1 - Group Annotate.docx1, 100-200 word summary(Max250 words)Assessment 1 - Group Annotate.docx
1, 100-200 word summary(Max250 words)Assessment 1 - Group Annotate.docx
 
InstructionsFinal ProjectThe Final Project should demonstrat.docx
InstructionsFinal ProjectThe Final Project should demonstrat.docxInstructionsFinal ProjectThe Final Project should demonstrat.docx
InstructionsFinal ProjectThe Final Project should demonstrat.docx
 
IBR 2.pptx
IBR 2.pptxIBR 2.pptx
IBR 2.pptx
 
FORMAT OF SYNOPSIS 12-10-2018.pdf
FORMAT OF SYNOPSIS 12-10-2018.pdfFORMAT OF SYNOPSIS 12-10-2018.pdf
FORMAT OF SYNOPSIS 12-10-2018.pdf
 
Methods: Searching & Systematic Reviews
Methods: Searching & Systematic ReviewsMethods: Searching & Systematic Reviews
Methods: Searching & Systematic Reviews
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

JSTOR Sustainability Collection - DHUG 2015

  • 1. JSTOR Sustainability Collection Sharon Garewal, JSTOR Senior Metadata Librarian Ron Snyder, ITHAKA Labs Director of Research and Development
  • 2. Overview  Sustainability collection defined  Utilization of the thesaurus within the sustainability collection  Subject matter experts enlisted  Results  Live demo
  • 3. JSTOR- a quick primer  3,200+ journals & 30,000+ books  9.3 million full length articles  70 million pages  2.9 million book reviews  138 million content accesses in 2013  100 million searches per year http://www.jstor.org/
  • 4. Sustainability Collection: what will it be?  Driver: Emerging interdisciplinary area that JSTOR wanted to support in both research and teaching needs.  Core topics of Cities and Urbanization, Food and Agriculture, Industrial Ecology, Resource Economics, Forestry and Land Use and Environmental Policy and Law  Composed of journals, books, grey literature (working reports, research reports, technical reports etc.)  Specialized functionality to support research by including semantic indexing to help researchers locate related terms and concepts. This is where the JSTOR Thesaurus (JTHES) comes into play!
  • 5.
  • 6. JTHES 19 Top terms, 57,470 Terms; 103,129 rules
  • 7. The challenge  To assemble a list of key terms in Sustainability  The terms will be used to organize and tag sustainability-related research articles on JSTOR starting in 2015.  These terms will also be used for an auto complete function in the search component.  Utilize the JTHES in a live prototype  This was the first project where we looked at how to use the thesaurus as an intelligence layer within a collection. How should it work? How do we do this?
  • 8. How do we get this done? The options…  Create a new thesaurus for sustainability:  Pros: Specific to sustainability  Cons: Remembering to make changes in more than one place. Cost associated with creating and maintaining a separate thesaurus  Create a sustainability branch within JTHES:  Pros: Could BT (Broader terms) all relevant branches and terms from elsewhere in the JTHES into 1 branch  Cons: Redundant; Multiple BT’s clutter up the JTHES  Create a facet to tag terms within JTHES as “Sustainability”:  Pros: Creates a flat list (in faceted view) of all of the terms in that facet; Easy to maintain  Cons: Does not show a hierarchy; Cannot have multiple facets
  • 9. The road to sustainability…  Research: examined existing glossaries and thesauri created by research libraries, discipline associations and individual scholars in each of the disciplines.  Existing terms (pulling lists)  Existing branches (clean up)  Adding new terms  Adding new branches: Food studies, Urban studies, etc.  Constructing new rules and refining existing rules  Testing content
  • 10. Enlisting Subject matter experts  Contacted faculty members in ten disciplines to go over the subset of terms assembled in their discipline and review those terms with an eye toward:  Is this how people in the field express this concept?  Is it correctly included in the sustainability facet?  Are there any important terms or concepts that we've missed? (including acronyms, synonyms, variant spellings, inverted phrases)
  • 11. SME spreadsheets Each SME was slightly different in how they approached their subject areas with some SMEs being reluctant to give much feedback and others giving large amounts of feedback to sift through. Example of terms pulled from Law, Public administration/policy and International/global studies
  • 14. Implementation of the Sustainability Prototype  The thesaurus and semantic index are used for content discovery and presentation  The identification of a “sustainability collection” from the JSTOR corpus was performed using topic modeling (specifically LDA – Latent Dirichlet Allocation)  A model of 100 topics was generated from the content  Staff assigned sustainability scores for each of the topics based on a review of the top words in each topic  Each document in the JSTOR corpus was then assigned a sustainability score of 0-9 based on the sustainability scores for the topics most closely associated with the document
  • 15. Weighting of document-level indexed terms  Document-level weights were computed for each sematic term using TF- IDF  TF-IDF is a measure of how important a word is to a document in a collection  The TF-IDF value increases proportionally to the number of times the word appears in a document (the ‘TF’ or term frequency), but is offset by how common the word is in a corpus (the ‘IDF’ or inverse document frequency)  The TF-IDF weighted terms are used to:  order the terms displayed for each document  boost document relevancy when index terms are used in discovery
  • 16. Auto-suggest and refining results [Thesaurus slide: a new thing, metadata we create, screenshot(s) of Sustainability Portal]
  • 17. Refinements in our use of the thesaurus and semantic index in sustainability  Auto calculation of sustainability score using LDA topics and thesaurus sustainability facet  Calculate topics and term correlations  Compute sustainability score for each topic based on the most relevant terms and sustainability facet  Compute a sustainability score for each corpus document based on topic weights and topic sustainability score  Automated LDA topic labeling  Labeling topics generated by unsupervised topic modeling is an ongoing challenge  We’re investigating the feasibility of using the same topic/term correlations used to compute sustainability scores to assign labels  Attempts to find the thesaurus term that best characterizes the most highly correlated terms for each topic
  • 18. Other JSTOR Labs projects/tools using the thesaurus and semantic index http://labs.jstor.org/jthes/ http://labs.jstor.org/snap/ http://labs.jstor.org/readings/ Thesaurus Visualization Tool
  • 19. And some other JSTOR Labs projects http://labs.jstor.org/reflowit/ http://labs.jstor.org/shakespeare/

Editor's Notes

  1. Third year at DHUG. First year was building our thesaurus; Second year was maintenance and training and this year we are excited to present how we are utilizing the thesaurus in our content.
  2. JTHES partners w/Labs (Their job is to get new ideas off the ground. To seek out new concepts and opportunities for JSTOR, & refine and validate them through research and experimentation) Grey literature=academic literature that is not formally published Natural resource economics deals with the supply, demand, and allocation of the Earth's natural resources
  3. There is no simple definition of 'sustainability'... Most definitions include: 1. living within the limits of what the environment can provide. 2. understanding the many interconnections between economy, society and the environment. 3. the equal distribution of resources and opportunities. http://www.environment.nsw.gov.au/sustainability
  4. Orig. had 18 TT but due to the Sustainability Collection we added Environmental studies.
  5. terms would be used from multiple JTHES branches. What terms are always about sustainability (all/some rule) Only terms always  and unambiguously pertaining to sustainability topics should be included.  [e.g. Carbon vs. Green buildings]. I first pulled a list of what I thought sustainability terms would be (recycle/green etc) which equaled 1200. Then I pulled full branches [using File-Export function] in multiple areas including Architecture, Economics, Law etc. which totaled 18k.
  6. BT-Explain more fully: As a rule we will only have a term living in up to 3 branches (e.g. Sustainable design lives in Sustainability science, Design engineering and Sustainable engineering). Chose the facet approach- it didn’t do everything we wanted but for time/cost and execution it was the best choice. The facet is applied in the Admin module. The facet tab appears at the bottom of the term record along with other tabs such as Definition, History, Scope note etc. In the facet tab we added the word “Sustainability” to each term that was chosen for the collection.
  7. New branches: Development studies, Environmental biology, Environmental social sciences, Environmental studies, Food studies, Sustainable architecture, Sustainable engineering, Urban studies, Wildlife studies New rules: Build*/Buildings, Conservation, Ecology/Ecologic*, Environment, Environmental, Garden*, Green, Industr*/Industry/Industries, Soils, Sustainab*, Urbaniz*, Wildlife
  8. Ended up with 6 SME’s total (Architecture, Engineering, Bio/Env science, Agriculture/Urban studies, Econ, Law/Policy). Once SME was secured an introductory phone conversation was set up where the project was defined and discussed. A brief review of basic taxonomic practices was given. The SME was sent a flat list of thesaurus terms (via Excel spreadsheet) within their discipline along with a short document on how to approach looking at the term and suggestions for feedback. Approx. 100-150 terms to review for each SME. Once the spreadsheet was completed, the taxonomists would review their suggestions and incorporate them into the jthes.
  9. If SME said “no” to “Correctly included”, term was removed from facet. Suggested terms section (at bottom of the spreadsheet for SME to list out anything missing from the list). Lessons learned: Spreadsheets could have been more user friendly (use drop down choices for columns); Suggested terms at the bottom of the spreadsheet overlapped with many SMEs; collate feedback first prior to implementation would have been more efficient. May have been helpful to send SMEs hierarchical view so they could see how terms relate to each other.
  10. Since May, when I estimated we began work on adding terms for this project, 385 terms have been added to the JTHES, 1197 rules were added and over 400 rules were updated; Over 1500 terms are tagged in the facet.
  11. Hand off to Ron for live demo portion of presentation.
  12. Shakespeare: Using a primary text as a portal for locating secondary literature, specifically journal content available from JSTOR. Partnership with the Folger Library. Classroom Readings: Helping teachers select content from JSTOR. Usage profiles/patterns that looked at a single institution and the spike of a documents usage on either side of a 2 week period. Additional labs projects include Reflowit (for mobile device viewing)
  13. Shakespeare: Using a primary text as a portal for locating secondary literature, specifically journal content available from JSTOR. Partnership with the Folger Library. Classroom Readings: Helping teachers select content from JSTOR. Usage profiles/patterns that looked at a single institution and the spike of a documents usage on either side of a 2 week period. Additional labs projects include Reflowit (for mobile device viewing)