SlideShare une entreprise Scribd logo
1  sur  12
Metadata Harvesting and the OAI-PMH Andrew Schenck Pamela Russell LIS 688
What is Metadata Harvesting? An automatic metadata generating method Occurs when metadata is automatically collected from META tags  Automatically gathers metadata from individual repositories
Example Metadata Generators Metadata generators are also known as metadata extraction systems Sample metadata extraction systems available for libraries include: DC-dot MarcEdit Metaextract IBM Magic System Some are available via open source
DC-dot DC-dot is open source and it can be redistributed or modified DC-dot creates Dublin Core metadata Metadata creation is initiated by submitting a URL Generates keywords by analyzing hyperlinked concepts and presentation encoding Does not produce description metadata Generates type, format and date metadata
MarcEdit MarcEdit is open source MarcEdit was initially conceived as a graphical user interface designed as a batch MARC editing tool. An application suite of metadata editing tools that includes character set conversion, XML crosswalking, and metadata harvesting.  It allows users to: Customize the existing data conversion rules or create new data conversion rules Harvest metadata from a supported metadata format Create conversion templates for additional metadata formats Customize existing conversion templates to reflect many variations in best practices used among projects
Metaextract Designed for metadata extraction in the domain of math and science education for K-12 Also designed to extract Dublin Core and Gateway to Educational Materials metadata on both the item and collection levels  Collection-level metadata is generated based on a collection-specific configuration Item-level metadata is extracted from the content of educational documents using three extraction modules: eQuery HTML-based modules Keyword generator module
IBM Magic System Includes various content analytic modules for metadata generation: Audiovisual analysis modules – recognizes semantic sound categories as well as text analysis modules that extract title, keywords, and summary from text documents Facilitates content reuse and repurposing Improves interoperability Creates more timely registration of content
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Released in June 2002 Provides an application-independent interoperability framework based on metadata harvesting Two levels of participants in the OAI-PMH: Data providers: Administer the systems Service providers: Use the metadata harvested to build their digital collection
OAI-PMH Key terms Harvester Operated by a service provider as a way to collect metadata from a repository Repository A network accessible server that is able to process OAI-PMH requests Managed by the data provider to allow harvesters access to its metadata
Harvesting Problems Lack of consistency Different collections using different DC elements and controlled vocabularies Repositories may have missing data within their metadata The repository may decline to fill out elements Incorrect data Data in the wrong element Harvested metadata can be confusing Strings of names can be ordered in an inconsistent manner or ambiguously separated with commas instead of semicolons Insufficient data
Recommendations for Improving Harvesting Establish guidelines and best practices Develop local standards Evaluate metadata Check to see if there are certain elements where you have local metadata that would not be useful in an aggregated environment. Check to see if any fields are populated with unknown or N/A Communicate with the service provider
Conclusion Evidence suggests that OAI-PMH is a successful endeavor Increase in number of repositories Many funded projects based on OAI eprints.org  Metadata Harvesting Initiative of the Mellon Foundation NSF National Science Digital Library (NSDL) The importance of metadata is one of the reasons that the Open Archives Initiative created the Protocol for Metadata Harvesting

Contenu connexe

Tendances

Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata HarvestingNikesh Narayanan
 
Information storage and retrieval
Information storage and  retrievalInformation storage and  retrieval
Information storage and retrievalDr. Utpal Das
 
Modes of formation of subject
Modes of formation of subjectModes of formation of subject
Modes of formation of subjectaditi bhandarkar
 
Library automation software
Library automation softwareLibrary automation software
Library automation softwareJancypriya M
 
Common communication format
Common communication formatCommon communication format
Common communication formatavid
 
collection development policy for e-resources
collection development policy for e-resourcescollection development policy for e-resources
collection development policy for e-resourcesaditi bhandarkar
 
Indexing language concept types and characteristics
Indexing language concept types and characteristicsIndexing language concept types and characteristics
Indexing language concept types and characteristicsDr. Utpal Das
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to MetadataJenn Riley
 
Staff manual,lib.survey,statistics,standards.
Staff manual,lib.survey,statistics,standards.Staff manual,lib.survey,statistics,standards.
Staff manual,lib.survey,statistics,standards.ghulamsamdani
 
INFORMATION SCIENCE
INFORMATION SCIENCEINFORMATION SCIENCE
INFORMATION SCIENCEharshaec
 

Tendances (20)

Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
Library 2.0
Library 2.0Library 2.0
Library 2.0
 
Oclc
OclcOclc
Oclc
 
Information storage and retrieval
Information storage and  retrievalInformation storage and  retrieval
Information storage and retrieval
 
Modes of formation of subject
Modes of formation of subjectModes of formation of subject
Modes of formation of subject
 
Knowledge organization system
Knowledge organization systemKnowledge organization system
Knowledge organization system
 
Canons of cataloguing
Canons of cataloguingCanons of cataloguing
Canons of cataloguing
 
Library automation software
Library automation softwareLibrary automation software
Library automation software
 
Dublin core Presentation
Dublin core PresentationDublin core Presentation
Dublin core Presentation
 
Desidoc
DesidocDesidoc
Desidoc
 
Common communication format
Common communication formatCommon communication format
Common communication format
 
collection development policy for e-resources
collection development policy for e-resourcescollection development policy for e-resources
collection development policy for e-resources
 
Indexing language concept types and characteristics
Indexing language concept types and characteristicsIndexing language concept types and characteristics
Indexing language concept types and characteristics
 
Precis
PrecisPrecis
Precis
 
Dspace
DspaceDspace
Dspace
 
Metadata harvesting Tools
Metadata harvesting ToolsMetadata harvesting Tools
Metadata harvesting Tools
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 
Staff manual,lib.survey,statistics,standards.
Staff manual,lib.survey,statistics,standards.Staff manual,lib.survey,statistics,standards.
Staff manual,lib.survey,statistics,standards.
 
SLSH ppt
SLSH pptSLSH ppt
SLSH ppt
 
INFORMATION SCIENCE
INFORMATION SCIENCEINFORMATION SCIENCE
INFORMATION SCIENCE
 

En vedette

OAI-PMH for dummies: how to build an institutional repository with limited re...
OAI-PMH for dummies: how to build an institutional repository with limited re...OAI-PMH for dummies: how to build an institutional repository with limited re...
OAI-PMH for dummies: how to build an institutional repository with limited re...Patrice Chalon
 
Visual Resources for Teaching and Learning
Visual Resources for Teaching and LearningVisual Resources for Teaching and Learning
Visual Resources for Teaching and LearningEmilia Frinculeasa
 
Grooming Presentation
Grooming PresentationGrooming Presentation
Grooming PresentationNikhil Mathur
 

En vedette (6)

OAI-PMH for dummies: how to build an institutional repository with limited re...
OAI-PMH for dummies: how to build an institutional repository with limited re...OAI-PMH for dummies: how to build an institutional repository with limited re...
OAI-PMH for dummies: how to build an institutional repository with limited re...
 
Cataloguing
CataloguingCataloguing
Cataloguing
 
Visual Resources for Teaching and Learning
Visual Resources for Teaching and LearningVisual Resources for Teaching and Learning
Visual Resources for Teaching and Learning
 
FishBase
FishBaseFishBase
FishBase
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
 
Grooming Presentation
Grooming PresentationGrooming Presentation
Grooming Presentation
 

Similaire à Metadata Harvesting via OAI-PMH

UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningNandakumar P
 
MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)Nikos Palavitsinis, PhD
 
CC Technology Summit 3 Update
CC Technology Summit 3 UpdateCC Technology Summit 3 Update
CC Technology Summit 3 UpdateNathan Yergler
 
TSPUG: Content Management in SharePoint 2010
TSPUG: Content Management in SharePoint 2010TSPUG: Content Management in SharePoint 2010
TSPUG: Content Management in SharePoint 2010Eli Robillard
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence               Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence dannyijwest
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled IntelligenceMetadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligencedannyijwest
 
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...IEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...IEEEMEMTECHSTUDENTPROJECTS
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application ModelsMarco Brambilla
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessMongoDB
 
LIS688_Group1
LIS688_Group1 LIS688_Group1
LIS688_Group1 e_chae
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsIJMER
 
Oracle data integrator training from hyderabad
Oracle data integrator training from hyderabadOracle data integrator training from hyderabad
Oracle data integrator training from hyderabadFuturePoint Technologies
 
Opinioz_intern
Opinioz_internOpinioz_intern
Opinioz_internSai Ganesh
 

Similaire à Metadata Harvesting via OAI-PMH (20)

Metadata
MetadataMetadata
Metadata
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
 
MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)
 
Meta data
Meta dataMeta data
Meta data
 
CC Technology Summit 3 Update
CC Technology Summit 3 UpdateCC Technology Summit 3 Update
CC Technology Summit 3 Update
 
CodeIgniter
CodeIgniterCodeIgniter
CodeIgniter
 
TSPUG: Content Management in SharePoint 2010
TSPUG: Content Management in SharePoint 2010TSPUG: Content Management in SharePoint 2010
TSPUG: Content Management in SharePoint 2010
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BD
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence               Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence
 
MIDESS
MIDESSMIDESS
MIDESS
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled IntelligenceMetadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence
 
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
 
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 
LIS688_Group1
LIS688_Group1 LIS688_Group1
LIS688_Group1
 
Cake PHP
Cake PHPCake PHP
Cake PHP
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result Records
 
Oracle data integrator training from hyderabad
Oracle data integrator training from hyderabadOracle data integrator training from hyderabad
Oracle data integrator training from hyderabad
 
Opinioz_intern
Opinioz_internOpinioz_intern
Opinioz_intern
 

Dernier

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 

Dernier (20)

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 

Metadata Harvesting via OAI-PMH

  • 1. Metadata Harvesting and the OAI-PMH Andrew Schenck Pamela Russell LIS 688
  • 2. What is Metadata Harvesting? An automatic metadata generating method Occurs when metadata is automatically collected from META tags Automatically gathers metadata from individual repositories
  • 3. Example Metadata Generators Metadata generators are also known as metadata extraction systems Sample metadata extraction systems available for libraries include: DC-dot MarcEdit Metaextract IBM Magic System Some are available via open source
  • 4. DC-dot DC-dot is open source and it can be redistributed or modified DC-dot creates Dublin Core metadata Metadata creation is initiated by submitting a URL Generates keywords by analyzing hyperlinked concepts and presentation encoding Does not produce description metadata Generates type, format and date metadata
  • 5. MarcEdit MarcEdit is open source MarcEdit was initially conceived as a graphical user interface designed as a batch MARC editing tool. An application suite of metadata editing tools that includes character set conversion, XML crosswalking, and metadata harvesting. It allows users to: Customize the existing data conversion rules or create new data conversion rules Harvest metadata from a supported metadata format Create conversion templates for additional metadata formats Customize existing conversion templates to reflect many variations in best practices used among projects
  • 6. Metaextract Designed for metadata extraction in the domain of math and science education for K-12 Also designed to extract Dublin Core and Gateway to Educational Materials metadata on both the item and collection levels Collection-level metadata is generated based on a collection-specific configuration Item-level metadata is extracted from the content of educational documents using three extraction modules: eQuery HTML-based modules Keyword generator module
  • 7. IBM Magic System Includes various content analytic modules for metadata generation: Audiovisual analysis modules – recognizes semantic sound categories as well as text analysis modules that extract title, keywords, and summary from text documents Facilitates content reuse and repurposing Improves interoperability Creates more timely registration of content
  • 8. Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Released in June 2002 Provides an application-independent interoperability framework based on metadata harvesting Two levels of participants in the OAI-PMH: Data providers: Administer the systems Service providers: Use the metadata harvested to build their digital collection
  • 9. OAI-PMH Key terms Harvester Operated by a service provider as a way to collect metadata from a repository Repository A network accessible server that is able to process OAI-PMH requests Managed by the data provider to allow harvesters access to its metadata
  • 10. Harvesting Problems Lack of consistency Different collections using different DC elements and controlled vocabularies Repositories may have missing data within their metadata The repository may decline to fill out elements Incorrect data Data in the wrong element Harvested metadata can be confusing Strings of names can be ordered in an inconsistent manner or ambiguously separated with commas instead of semicolons Insufficient data
  • 11. Recommendations for Improving Harvesting Establish guidelines and best practices Develop local standards Evaluate metadata Check to see if there are certain elements where you have local metadata that would not be useful in an aggregated environment. Check to see if any fields are populated with unknown or N/A Communicate with the service provider
  • 12. Conclusion Evidence suggests that OAI-PMH is a successful endeavor Increase in number of repositories Many funded projects based on OAI eprints.org Metadata Harvesting Initiative of the Mellon Foundation NSF National Science Digital Library (NSDL) The importance of metadata is one of the reasons that the Open Archives Initiative created the Protocol for Metadata Harvesting

Notes de l'éditeur

  1. Metadata harvesting and the Open Archives Initiative Protocol for Metadata Harvesting by Andrew Schenck and Pamela Russell
  2. Metadata harvesting is an automatic metadata generating method. Harvesting occurs when metadata is automatically collected from META tags found in the “header” source code of an HTML resource or encoded from another resource format. Metadata harvesting automatically gathers metadata from individual repositories where it has been produced by either automatic or manual approaches.
  3. Much like other automated tasks, there are a multitude of metadata generators available.These generators, also known as metadata extraction systems, can be extremely helpful for libraries wishing to extract metadata from various repositories. Some of the different metadata extraction systems available for libraries to use include: DC-dotMarcEditMetaextractand IBM Magic System.Some of these systems are available via open source and are free, although the people needed to run them must usually be paid.Many of the systems were created to harvest all types of metadata, and some were created to harvest metadata for very specific objects or areas of study.
  4. DC-dot was developed by Andy Powell at UKOLN at the University of Bath. DC-dot is open source and it can be redistributed or modified under the terms of the GNU General Public License as published by the Free Software Foundation.DC-dot creates Dublin Core metadata and can format output according to a number of different metadata schemas.In DC-dot, metadata creation is initiated by submitting a URL. The resource identifier metadata from the Web browser’s address prompt is copied, and metadata included in the title, keywords, description, and type fields is then harvested from the resource META tags. DC-dot will automatically generate keywords by analyzing hyperlinked concepts and presentation encoding (bolding and font size), but will not produce description metadata. DC-dot also automatically generates type, format, and date metadata
  5. MarcEdit was created by Terry Reese in 1998 and was initially conceived as a graphical user interface designed as a batch MARC editing tool. Currently, MarcEdit is an application suite of metadata editing tools that includes character set conversion, XML crosswalking, and metadata harvesting. Unlike other metadata extraction systems, MarcEdit allows users to customize the existing data conversion rules or create new data conversion rules.This allows users to harvest metadata from a supported metadata format as well as create conversion templates for additional metadata formats.It also allows users to customize existing conversion templates to reflect many variations in best practices used among projects.
  6. Metaextract is an extraction system that was designed for metadata extraction in the domain of math and science education for K-12.It was designed to extract Dublin Core and Gateway to Educational Materials metadata on both the item and collection levels using natural language processing techniques.The collection-level metadata is generated based on a collection-specific configuration and the item-level metadata is extracted from the content of educational documents using three extraction modules: eQuery, HTML-based modules, and a keyword generator module.
  7. IBM Magic System was presented in 2005 and includes various content analytic modules for metadata generation.Audiovisual analysis modules are available that recognize semantic sound categories and identify narrators and informative text segments as well as text analysis modules that extract title, keywords and summaryfrom text documents.The IBM Magic System can facilitate content reuse and repurposing, improve interoperability and create more timely registration of content by course developers and authors.
  8. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) provides an application-independent interoperability framework that is based on metadata harvesting.There are two levels of participants in the OAI-PMH: data providers and service providers.Data providers administer the systems that support the OAI-PMH as a means of supplying metadata.Service providers use the metadata harvested from the OAI-PMH to help build their digital collections.
  9. Some other key terms necessary to understand OAI-PMH are harvester and repository. A harvester is a client application that can issue any OAI-PMH requests.The harvester is operated by a service provider as a way to collect metadata from a repository. A repository is a network accessible server that is able to process OAI-PMH requests. A repository is managed by the data provider to allow harvesters access to its metadata.
  10. The most common problem with harvested metadata is a lack of consistency. For example, inconsistencies across collections can occur when data providers use some Dublin Core elements and controlled vocabularies in one collection but not in another.On a larger scale, some data providers use different Dublin Core elements in different ways throughout their repository. This can lead to similar kinds of metadata ending up in different fields when harvested. The metadata harvested from OAI-PMH has other significant problems.Many repositories have missing data within their metadata. For example, if an entire collection consisted of materials of the same format or type, the repository may decline to fill out the “format” or “type” element in Dublin Core because the information would be deemed unnecessary for the collection’s local purposes. Every item is the same type so why fill out that field? This causes problems when an OAI-PMH service provider wants to limit their search. If they wanted to limit their search using the format or type element they wouldn’t be able to do so because that particular field had been left empty by the repository.An example of incorrect data in a repository would be creator names repeated in the language element or repeating the identifier for the metadata record in the Dublin Core identifier element. Also included in incorrect data would be any misspelled words or stray characters such as dashes or hyphens.Another problem with harvested metadata is that it can be confusing. Strings of names can be ordered in an inconsistent manner or ambiguously separated with commas instead of semicolons. This type of confusing data can occur when the entries are dumped without revision into a metadata record. This may happen when records are cut and pasted from Web HTML text. Insufficient data can also cause problems with harvesting because the metadata present in the repositories is not useful when trying to limit searches and retrieve specific information.
  11. Recommendations for improving harvesting:As a repository, established guidelines should be used and local standards should be developed. Either use a guideline and best practices resource that already exists or develop and document standards to meet your local needs.Evaluate your metadata to determine if there is some that you do not want or need to share.Check to see if there are certain elements where you have local metadata that would not be useful in an aggregated environment.If you find that there are some unnecessary elements, unmap the fields before allowing them to be harvested.While checking for necessary and unnecessary fields, check to see if any fields are populated with unknown or N/A. In and aggregate environment this should not be done. It is better to leave a field blank than to use unknown or N/A in fields where harvesters might interpret them as meaningful data.Most importantly, communicate with the service provider who is harvesting your records. Review your metadata and determine if there are ways to make it cleaner and easier to understand
  12. Although the OAI-PMH is far from perfect, there is ample evidence to suggest that it is a successful endeavor.The number of repositories who make their metadata available through OAI-PMH has grown since the initial release in January of 2001.Another way to gage success is from the level of attention garnered from funding agencies. Some examples of funded projects and programs that promote or are based on the OAI are eprints.org, Metadata Harvesting Initiative of the Mellon Foundation and the NSF National Science Digital Library (NSDL).The importance of metadata is one of the reasons that the Open Archives Initiative created the Protocol for Metadata Harvesting. Although it is not a perfect process, it has been very successful in helping many libraries of all types, both large and small, to create and offer Web access to digital collections.