Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Update From OCLC Research May 2008
1. Update from OCLC Research
Eric Childress
Consulting Project Manager
OCLC Research
2. Agenda
• OCLC Programs and Research overview
• OCLC Research
• RLG Programs
• Brief descriptions of work themes and sample projects
• Updates on OCLC Research projects
• Crosswalk Web Service
• Terminology Services
• WorldCat Identities
• VIAF (Virtual International Authority File)
3. OCLC Programs and Research (PaR)
OCLC Research RLG Programs
A leading research center devoted to Provides a venue and focus for collaboration,
supporting OCLC’s public mission on behalf problem-solving, and the development of new
of libraries standards, products, and services among
research institutions
Conducts applied research
Staff with deep expertise in academic
Prototypes new systems/services – some research libraries, archives, and museums.
prototypes are eventually integrated into Partnership of ~140 libraries, archives, and
OCLC products and services museums in 14 countries with:
• Deep, rich research collections
Active in standards development – OR has
been instrumental in the development of • Mandate to make collections accessible
key technologies & standards: • Commitment to exploit technology
• Commitment to collaboration
• PURLs (Persistent URLs)
• Dublin Core • Commitment and capability to contribute to
• OAI (Open Archives Initiative) ‘commons’ (collections, expertise,
• SRU/W (Search & Retrieve) infrastructure)
4. OCLC PaR Organization
Lorcan Dempsey
Jim Michalko OCLC Research staff
RLG Programs staff
5. OCLC PaR Offices
University of Washington (1) University of St Andrews (1)
OCLC San Mateo (13) OCLC Leiden (1)
OCLC Dublin (29/3)
6. RLG Programs & RLG Partners
The RLG Partnership comprises Australia
about 140 libraries, archives and Canada
museums in 14 countries Egypt
France
RLG Program meetings and Germany
activities are open to RLG Partners
Ireland
Italy
Presentations, reports, other work
Japan
are made openly available to the
entire community New Zealand
Spain
Agenda is informed by the Partners, Switzerland
the RLG Program Council and United Arab Emirates
oversight is provided by the BOT United Kingdom
RLG Committee
United States
8. Brief descriptions
Managing the Collective Collection Renovating Descriptive &
Organizing Practices
- Explore issues and opportunities for - Consider economics of metadata creation
gaining value from shared collections and utility of metadata in a networked
environment
sample activities:
- NYARC Collection Analysis - sample activities:
characterizing aggregate collection of - RLG Programs Descriptive Metadata
selected art libraries
Practices Survey Results
• Exploring opportunities for
evidence-based collaboration
Last copies – characterizing content and
distribution of unique book holdings
• Examining system-wide holdings
for shared collection and
community profiles
9. Brief descriptions (cont’)
Measurements and Behaviors Supporting new modes of
scholarship
- Understand user behaviors
- Address changes to support
sample activities: infrastructure for scholarship
- Seeking Synchronicity: Evaluating sample activities:
Virtual Reference Services from − Investigate scholarly needs &
User, Non-User, and Librarian expectations (e.g., data-mining of digital
Perspectives (IMLS-funded work with text)
Rutgers)
− Personal collections
- Supporting usability work for − Library-Archive-Museum (“LAM”)
WorldCat.org and WorldCat Local relationships
interface design
10. Brief descriptions (cont’)
Modeling New Service Infrastructure Architecture & Standards
- Work towards shared - Contribute to standards
understandings, common development and implementation
architecture sample activities:
sample activities: - OAI-ORE (Open Archives Initiative
- Supporting work for Developer’s Object Reuse and Exchange)
Network and WorldCat API - OASIS Search Web Services
- Creating a special version of Technical Committee
OAICat for museum use - NSF Blue Ribbon Task Force on
Sustainable Digital Preservation
and Access [NYT article]
11. Updates on selected research projects
Crosswalk Web Service
Terminology services
WorldCat Identities
VIAF (Virtual International Authority File)
12. Crosswalk Web Service
Goal: Develop technology to improve the quality and efficiency of
translation of metadata from one scheme to another
Delivered:
• An innovative and novel approach to the problem
• New technology (SEEL [a new programming language], modular
design, new code)
• Being integrated into OCLC systems (e.g., Connexion, batchload)
• Part of technology under the NextGeneration cataloging services
13. 1 Transform to 2
intermediate form
STRUCTURAL TRANSFORM
File of records
in format X
SEMANTIC Translate input
TRANSLATION semantics to CORE
3
CORE
SEMANTIC Translate CORE to
TRANSLATION output semantics
5 4
STRUCTURAL TRANSFORM
Transform to
output format Y
14. Terminology Services
Goal: Offer accessible, modular, web-based terminology
services
Deliverables:
• Enhanced, modernized versions of controlled vocabularies
in MARC XML, SKOS, more
• Addressing identifier issues
• Adding cross-vocabulary mappings
• Web services to leverage power of disciplinary (‘subject’)
thesauri (e.g., DDC, LCSH, MeSH, TGM I and II, FAST, more)
15. Highlights of prototype service
• Search descriptions of controlled vocabularies
• Search for concepts/headings in a controlled vocabulary
• Retrieve a single concept/heading by its identifier
• Retrieve concepts/headings in multiple representations
including HTML, MARC XML, Zthes, and SKOS.
• Search using SRU CQL syntax
17. WorldCat Identities
• Using data mining techniques OCLC builds a summary page
for persons and corporate bodies referenced in WorldCat
bibliographic records (25 million+)
• Data is derived from bibliographic data and authority
records and holdings in WorldCat
• Special features such as a publication timeline included:
18.
19. VIAF (Virtual International Authority File)
Primary objectives:
1. Identify the same person or organization in
multiple national authority files
2. Link the corresponding authority records
Value:
• Potentially lowers the cost of authority work
• Multiple authorized forms may co-exist
• Better data-mining (including FRBR) and searching
• VIAF will made available on the Web for wide
use by libraries, search engines, others
20. VIAF (cont.)
How VIAF works:
1. National libraries supply OCLC with national bibliographic files
and authority files
2. OCLC-developed software:
• Identifies key relationships (e.g., author name to works, to co-
authors, to subjects)
• Matches same author with different names
• Distinguishes same name but different authors
3. Links between matched authority records are generated and
stored
21.
22.
23. Recent reports & related articles
• Lavoie, Brian, and Günter Waibel
An Art Resource in New York: The Collective Collection of the NYARC Art Museum Libraries.
• Proffitt, Merrilee, Arnold Arcolio, and Constance Malpas
Copyright Investigation Summary Report (March 2008)
• Smith-Yoshimura, Karen.
RLG Programs Descriptive Metadata Practices Survey Results (November 2007)
• Kaufman, Peter and Jeff Ubois.
quot;Good Terms—Improving Commercial-Noncommercial Partnerships for Mass Digitization; A
Report Prepared by Intelligent Television for RLG Programs, OCLC Programs and Research.quot;
D-Lib Magazine, 13,11/12. November 2007.
• Payne, Lizanne.
Library Storage Facilities and the Future of Print Collections in North America (November
2007)
• Erway, Ricky, and Jennifer Schaffner.
Shifting Gears: Gearing Up to Get Into the Flow (October 2007)