SlideShare a Scribd company logo
1 of 30
Download to read offline
herbert van de sompel, michael nelson, thomas krichel




        the UPS protoproto project



                   UPS 1 Meeting
             Santa Fe - October 21th 1999
project   description



demo      the UPS protoproto



 dex      the data exchange framework
project why a protoproto?
•  UPS: enable cross-archive end-user services
•  protoproto:
  –  facilitate discussions
  –  identify issues involved in creating cross-archive services
  –  experiment with digital object concepts for archive
     material
  –  does not claim to be a solution
•  protoproto is multi-disciplinary
  –  a special instance of cross-archive
  –  there is a market
  –  promotional value
project who?


•  coordination: herbert van de sompel, michael
   nelson, thomas krichel
•  involvement of:
    – Old Dominion U & NASA Langley
    – U of Surrey
    – U of Ghent
    – Los Alamos National Laboratory - Library
    – Russian Academy of Science - Siberian branch
project sponsors


•  Los Alamos National Laboratory - Research Library
•  JISC eLib WoPEc project
project datasets
 –  metadata only
 –  full text remains at archives
 –  static dumps obtained ca. July 99

                 objects     full-text !organization
 the arXiv       85,223       85,223      17,983
 CogPrints         742         659           14
 NACA             3,036       3,036         100
 NCSTRL          29,184       9,084          93
 NDLTD            1,590        951           1
 RePEc           73,367       13,582       2,453

 Total           193,142     112,535
project metadata formats


               format
 the arXiv     internal
 CogPrints     internal
 NACA          Refer
 NCSTRL        RFC1807
 NDLTD         MARC
 RePEc         ReDIF
project metadata extraction


  •  Getting metadata out of archives
     –  not all archives support metadata extraction
        •  some archives have undocumented metadata
           extraction procedures
     –  not all archives support rich criteria for
        extraction
        •  single dump concept only
  •  Intellectual property and use rights not
     always clear
project metadata quality


  •  Metadata has problems with:
     –  record duplication
     –  crucial missing fields
     –  internal errors
     –  ambiguous references to people and places,
        publications
project metadata conversion

•  all datasets converted to ReDIF:
    •  essential to have a single fomat for the creation
    of services
    •  supply by archives in a single format was not
    realistic
    •  no downgrading of data

 •  data enhancements:
     •  creation of unique identifier
     •  addition of raw subject-classification
     •  normalization of publication types
project re-creation of archives

•  creation of archives for ReDIF-ed metadata
•  using intelligent digital objects : “buckets”




                    RePEc


    arXiv                                 NCSTRL
project buckets
•  Buckets were chosen to study the implications
   of using rich, intelligent objects in UPS
•  Buckets are:
  –  DL protocol / system independent
  –  self-contained and mobile
  –  handle their own display, enforcement of terms and
     conditions, and dissemination of their contents
  –  designed for bundling multiple data representations and
     data instance types
•  The aggregative nature of buckets is well
   suited for adding valued-added services at the
   object level
project creation of end-user service

•  NCSTRL+ digital library service
•  indexing buckets in archives by requesting their
metadata
•  enhanced user-interface
•  NCSTRL+ search results point at buckets
•  buckets auto-display
•  buckets provide link to full-text in native archive
project scaling problems

  •  UPS contains 193K objects
    –  using buckets consumed inodes (~60 inodes per
       bucket)
       •  filesystem reformatted with more generous amount
          of inodes
    –  Solaris and Dienst conflict
       •  Dienst wants each object in an publishing authority
          to be in a single directory
       •  Solaris has a hard limit of 32K objects in a directory
       •  resolution: use many (100+) authorities for UPS
project addition of linking service

•  integrate the archives with the traditional
communication mechanism
•  context-sensitive linking to deliver extended
services via SFX technology
project SFX linking service



                 extended services



metadata
                evaluate metadata    metadata


system A                             system B
project SFX linking database
project addition of linking service


•  buckets for arXiv, NCSTRL and RePEc are SFX-
aware
     •  Cogprints, NACA, NDLTD not SFX-aware
•  SLAC/SPIRES is SFX-aware
•  linking services for preprint metadata + for
published version
demo the UPS protoproto

•  will be available starting beginning of November
•  UPS list will be notified
•  disclaimer “not a production system”

            http://ups.cs.odu.edu:8000/



               http://ups.cs.odu.edu
dex     some issues (I)

• data exchange framework
    • data provision vs. data implementation
    • central searching, distributed archives
•  need for a framework by which archives can
describe themselves:
    •  content
    •  terms and conditions
    •  protocols, criteria supported to extract (meta)data
    •  metadata scheme, subject classification scheme,
    material-type scheme, ...
dex     some issues (II)


•  need for an identifier scheme for archives and
archive objects
    • (cf. ISSN, ISBN, DOI)
•  metadata quality obstructs the creation of services
•  desirabile to extend metadata with citation
information
•  smart objects
    •  archived objects that are active, not passsive
dex    providing vs. implementing data


•  Providing data:
  –  publishing into an archive
  –  providing methods for metadata “harvesting”
      •  provide non-technical context for sharing
         information also
•  Implementing Data:
  –  harvest metadata from providers
  –  implement user interface to data
•  Even if provided by the same DL, these are
   distinct functions
dex         providing vs. implementing data




                                                           Native
                                                           harvesting
                                                           interface

Input        Provider   Native      Input
interface               end-user                Provider
                                    interface
                        interface

                                                            Native
                                                            end-user
                                                            interface



No machine based way to                  Machine and user interfaces
extract metadata…                        for extracting metadata….
dex      providing vs. implementing data


                       Native                           Input and harvesting
                       end-user      Implementor
                                                        interfaces optional
                       interface




                        Native
                                                                                    Native
                        harvesting
                                                                                    harvesting
                        interface                                                   interface

Input
            Provider                               Input           Provider
interface
                                                   interface

                         Native                                               Native end-user
                         end-user
                         interface
                                                                              interface optional
                                                                              (e.g., RePEc)
dex          self-describing archives


 •  Much of the learning about the constituent
    UPS archives occurred out of band…
 •  Given an unknown archive, we should be
    able to algorithmically determine the
    archive’s metadata...
                          Native
                          harvesting
                          interface
                                        Where possible, the
                                        harvesting interface
 Input
 interface
               Provider
                                        should provide the same
                                        criteria as the end-user
                           Native
                           end-user     interface
                           interface
dex      self-describing archives

•  Recommended criteria for metadata
   extraction:
      –  subject classification
      –  accession date
      –  publication date
•  Criteria for archive description
      –  metadata formats employed
      –  contact information for archive
      –  publication type scheme
      –  identifier scheme
      –  subject classification scheme
dex    identifiers


•  Useful in:
   –  reference linking
   –  can be used in citations
   –  resolving duplications
      •  UPS duplications were removed by hand
   –  tracking publication lifecycle
•  Need the ability for an object to have
   multiple unique identifiers
   –  organization, discipline, etc.
dex    smart objects
•  Premise: Objects are more important than the
   archives that hold them
      •  SODA: Smart Objects, Dumb Archives
•  Objects should be the canonical authority for
      •  metadata
      •  contents
      •  use
•  Objects should be able to grow and change
      •  correct metadata
      •  add new formats
      •  add new services
      •  reflect the lifecycle of the object
dex   smart objects

 •  It would be beneficial if the archived
    objects could be heterogenous:
      •  with their own “look-and-feel”
      •  unique functionality / services
         –  e.g., the data archiving needs of an atmospheric scientist
            can be different than that of a computer scientist, engineer
            or medical researcher

 •  yet maintained a standard API for:
      •  extracting metadata
      •  content retrieval
      •  resource discovery on the object
      •  terms and conditions
dex      lessons learned


 •  A strong distinction between the provision
    of data, and the implementation of data
      –  also, a socio-legal context for sharing metadata
 •  Open, “self-describing” archives
 •  A universal, unique identifier name space
 •  Archived objects with more intelligence and
    flexibility

More Related Content

What's hot

Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 

What's hot (7)

Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 

Viewers also liked

Open Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & ExchangeOpen Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & ExchangeHerbert Van de Sompel
 
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkAn Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkHerbert Van de Sompel
 
The Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communicationThe Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communicationHerbert Van de Sompel
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataHerbert Van de Sompel
 
Attempts at innovation in scholarly communication
Attempts at innovation in scholarly communicationAttempts at innovation in scholarly communication
Attempts at innovation in scholarly communicationHerbert Van de Sompel
 
Augmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositoriesAugmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositoriesHerbert Van de Sompel
 
An HTTP-Based Versioning Mechanism for Linked Data
An HTTP-Based Versioning Mechanism for Linked DataAn HTTP-Based Versioning Mechanism for Linked Data
An HTTP-Based Versioning Mechanism for Linked DataHerbert Van de Sompel
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Herbert Van de Sompel
 
Motivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationHerbert Van de Sompel
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordHerbert Van de Sompel
 
OAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumOAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumRobert Sanderson
 
Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
Memento: Big Leaps Towards Seamless Navigation of the Web of the PastMemento: Big Leaps Towards Seamless Navigation of the Web of the Past
Memento: Big Leaps Towards Seamless Navigation of the Web of the PastHerbert Van de Sompel
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
 

Viewers also liked (20)

Open Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & ExchangeOpen Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & Exchange
 
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkAn Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
 
The Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communicationThe Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communication
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage data
 
Attempts at innovation in scholarly communication
Attempts at innovation in scholarly communicationAttempts at innovation in scholarly communication
Attempts at innovation in scholarly communication
 
The Roof is on Fire
The Roof is on FireThe Roof is on Fire
The Roof is on Fire
 
Augmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositoriesAugmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositories
 
An HTTP-Based Versioning Mechanism for Linked Data
An HTTP-Based Versioning Mechanism for Linked DataAn HTTP-Based Versioning Mechanism for Linked Data
An HTTP-Based Versioning Mechanism for Linked Data
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 
Motivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustration
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly Record
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
The aDORe Federation Architecture
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation Architecture
 
OAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumOAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall Forum
 
Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
Memento: Big Leaps Towards Seamless Navigation of the Web of the PastMemento: Big Leaps Towards Seamless Navigation of the Web of the Past
Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiative
 

Similar to the UPS protoproto project

Emulation Bridging The Past To The Future Dirk Von Suchodoletz
Emulation Bridging  The Past To The Future Dirk Von SuchodoletzEmulation Bridging  The Past To The Future Dirk Von Suchodoletz
Emulation Bridging The Past To The Future Dirk Von SuchodoletzDigitalPreservationEurope
 
2006 Esug Omnibrowser
2006 Esug Omnibrowser2006 Esug Omnibrowser
2006 Esug Omnibrowserbergel
 
Holistic Aggregate Resource Environment
Holistic Aggregate Resource EnvironmentHolistic Aggregate Resource Environment
Holistic Aggregate Resource EnvironmentEric Van Hensbergen
 
Splunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gxSplunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gxDamien Dallimore
 
Asynchronous Javascript and Rich Internet Aplications
Asynchronous Javascript and Rich Internet AplicationsAsynchronous Javascript and Rich Internet Aplications
Asynchronous Javascript and Rich Internet AplicationsSubramanyan Murali
 
Overview Of .Net 4.0 Sanjay Vyas
Overview Of .Net 4.0   Sanjay VyasOverview Of .Net 4.0   Sanjay Vyas
Overview Of .Net 4.0 Sanjay Vyasrsnarayanan
 
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit
 
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Behar Veliqi
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Yahoo Communities Architecture Unlikely Bedfellows
Yahoo Communities Architecture Unlikely BedfellowsYahoo Communities Architecture Unlikely Bedfellows
Yahoo Communities Architecture Unlikely BedfellowsConSanFrancisco123
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012Lucas Jellema
 
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」Sho Shimizu
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure javaRoman Elizarov
 
Galaxy
GalaxyGalaxy
Galaxybosc
 
MarvinSketch and MarvinView: Tips And Tricks: US UGM 2008
MarvinSketch and MarvinView: Tips And Tricks: US UGM 2008MarvinSketch and MarvinView: Tips And Tricks: US UGM 2008
MarvinSketch and MarvinView: Tips And Tricks: US UGM 2008ChemAxon
 

Similar to the UPS protoproto project (20)

Emulation Bridging The Past To The Future Dirk Von Suchodoletz
Emulation Bridging  The Past To The Future Dirk Von SuchodoletzEmulation Bridging  The Past To The Future Dirk Von Suchodoletz
Emulation Bridging The Past To The Future Dirk Von Suchodoletz
 
2006 Esug Omnibrowser
2006 Esug Omnibrowser2006 Esug Omnibrowser
2006 Esug Omnibrowser
 
Holistic Aggregate Resource Environment
Holistic Aggregate Resource EnvironmentHolistic Aggregate Resource Environment
Holistic Aggregate Resource Environment
 
Splunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gxSplunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gx
 
Asynchronous Javascript and Rich Internet Aplications
Asynchronous Javascript and Rich Internet AplicationsAsynchronous Javascript and Rich Internet Aplications
Asynchronous Javascript and Rich Internet Aplications
 
Overview Of .Net 4.0 Sanjay Vyas
Overview Of .Net 4.0   Sanjay VyasOverview Of .Net 4.0   Sanjay Vyas
Overview Of .Net 4.0 Sanjay Vyas
 
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
 
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar Veliqi
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Yahoo Communities Architecture Unlikely Bedfellows
Yahoo Communities Architecture Unlikely BedfellowsYahoo Communities Architecture Unlikely Bedfellows
Yahoo Communities Architecture Unlikely Bedfellows
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Qcon
QconQcon
Qcon
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012
 
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
 
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
Galaxy
GalaxyGalaxy
Galaxy
 
MarvinSketch and MarvinView: Tips And Tricks: US UGM 2008
MarvinSketch and MarvinView: Tips And Tricks: US UGM 2008MarvinSketch and MarvinView: Tips And Tricks: US UGM 2008
MarvinSketch and MarvinView: Tips And Tricks: US UGM 2008
 

More from Herbert Van de Sompel

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DoneHerbert Van de Sompel
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
 
Collecting the organizational scholarly record
Collecting the organizational scholarly recordCollecting the organizational scholarly record
Collecting the organizational scholarly recordHerbert Van de Sompel
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsHerbert Van de Sompel
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Herbert Van de Sompel
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarshipHerbert Van de Sompel
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingHerbert Van de Sompel
 

More from Herbert Van de Sompel (20)

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
Collecting the organizational scholarly record
Collecting the organizational scholarly recordCollecting the organizational scholarly record
Collecting the organizational scholarly record
 
To the Rescue of Scholarly Orphans
To the Rescue of Scholarly OrphansTo the Rescue of Scholarly Orphans
To the Rescue of Scholarly Orphans
 
Almost two decades at LANL
Almost two decades at LANLAlmost two decades at LANL
Almost two decades at LANL
 
Perseverance on Persistence
Perseverance on PersistencePerseverance on Persistence
Perseverance on Persistence
 
Paul Evan Peters Lecture
Paul Evan Peters LecturePaul Evan Peters Lecture
Paul Evan Peters Lecture
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
ResourceSync Quick Overview
ResourceSync Quick OverviewResourceSync Quick Overview
ResourceSync Quick Overview
 
Memento 101
Memento 101Memento 101
Memento 101
 
ResourceSync Overview
ResourceSync OverviewResourceSync Overview
ResourceSync Overview
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

the UPS protoproto project

  • 1. herbert van de sompel, michael nelson, thomas krichel the UPS protoproto project UPS 1 Meeting Santa Fe - October 21th 1999
  • 2. project description demo the UPS protoproto dex the data exchange framework
  • 3. project why a protoproto? •  UPS: enable cross-archive end-user services •  protoproto: –  facilitate discussions –  identify issues involved in creating cross-archive services –  experiment with digital object concepts for archive material –  does not claim to be a solution •  protoproto is multi-disciplinary –  a special instance of cross-archive –  there is a market –  promotional value
  • 4. project who? •  coordination: herbert van de sompel, michael nelson, thomas krichel •  involvement of: – Old Dominion U & NASA Langley – U of Surrey – U of Ghent – Los Alamos National Laboratory - Library – Russian Academy of Science - Siberian branch
  • 5. project sponsors •  Los Alamos National Laboratory - Research Library •  JISC eLib WoPEc project
  • 6. project datasets –  metadata only –  full text remains at archives –  static dumps obtained ca. July 99 objects full-text !organization the arXiv 85,223 85,223 17,983 CogPrints 742 659 14 NACA 3,036 3,036 100 NCSTRL 29,184 9,084 93 NDLTD 1,590 951 1 RePEc 73,367 13,582 2,453 Total 193,142 112,535
  • 7. project metadata formats format the arXiv internal CogPrints internal NACA Refer NCSTRL RFC1807 NDLTD MARC RePEc ReDIF
  • 8. project metadata extraction •  Getting metadata out of archives –  not all archives support metadata extraction •  some archives have undocumented metadata extraction procedures –  not all archives support rich criteria for extraction •  single dump concept only •  Intellectual property and use rights not always clear
  • 9. project metadata quality •  Metadata has problems with: –  record duplication –  crucial missing fields –  internal errors –  ambiguous references to people and places, publications
  • 10. project metadata conversion •  all datasets converted to ReDIF: •  essential to have a single fomat for the creation of services •  supply by archives in a single format was not realistic •  no downgrading of data •  data enhancements: •  creation of unique identifier •  addition of raw subject-classification •  normalization of publication types
  • 11. project re-creation of archives •  creation of archives for ReDIF-ed metadata •  using intelligent digital objects : “buckets” RePEc arXiv NCSTRL
  • 12. project buckets •  Buckets were chosen to study the implications of using rich, intelligent objects in UPS •  Buckets are: –  DL protocol / system independent –  self-contained and mobile –  handle their own display, enforcement of terms and conditions, and dissemination of their contents –  designed for bundling multiple data representations and data instance types •  The aggregative nature of buckets is well suited for adding valued-added services at the object level
  • 13. project creation of end-user service •  NCSTRL+ digital library service •  indexing buckets in archives by requesting their metadata •  enhanced user-interface •  NCSTRL+ search results point at buckets •  buckets auto-display •  buckets provide link to full-text in native archive
  • 14. project scaling problems •  UPS contains 193K objects –  using buckets consumed inodes (~60 inodes per bucket) •  filesystem reformatted with more generous amount of inodes –  Solaris and Dienst conflict •  Dienst wants each object in an publishing authority to be in a single directory •  Solaris has a hard limit of 32K objects in a directory •  resolution: use many (100+) authorities for UPS
  • 15. project addition of linking service •  integrate the archives with the traditional communication mechanism •  context-sensitive linking to deliver extended services via SFX technology
  • 16. project SFX linking service extended services metadata evaluate metadata metadata system A system B
  • 18. project addition of linking service •  buckets for arXiv, NCSTRL and RePEc are SFX- aware •  Cogprints, NACA, NDLTD not SFX-aware •  SLAC/SPIRES is SFX-aware •  linking services for preprint metadata + for published version
  • 19. demo the UPS protoproto •  will be available starting beginning of November •  UPS list will be notified •  disclaimer “not a production system” http://ups.cs.odu.edu:8000/ http://ups.cs.odu.edu
  • 20. dex some issues (I) • data exchange framework • data provision vs. data implementation • central searching, distributed archives •  need for a framework by which archives can describe themselves: •  content •  terms and conditions •  protocols, criteria supported to extract (meta)data •  metadata scheme, subject classification scheme, material-type scheme, ...
  • 21. dex some issues (II) •  need for an identifier scheme for archives and archive objects • (cf. ISSN, ISBN, DOI) •  metadata quality obstructs the creation of services •  desirabile to extend metadata with citation information •  smart objects •  archived objects that are active, not passsive
  • 22. dex providing vs. implementing data •  Providing data: –  publishing into an archive –  providing methods for metadata “harvesting” •  provide non-technical context for sharing information also •  Implementing Data: –  harvest metadata from providers –  implement user interface to data •  Even if provided by the same DL, these are distinct functions
  • 23. dex providing vs. implementing data Native harvesting interface Input Provider Native Input interface end-user Provider interface interface Native end-user interface No machine based way to Machine and user interfaces extract metadata… for extracting metadata….
  • 24. dex providing vs. implementing data Native Input and harvesting end-user Implementor interfaces optional interface Native Native harvesting harvesting interface interface Input Provider Input Provider interface interface Native Native end-user end-user interface interface optional (e.g., RePEc)
  • 25. dex self-describing archives •  Much of the learning about the constituent UPS archives occurred out of band… •  Given an unknown archive, we should be able to algorithmically determine the archive’s metadata... Native harvesting interface Where possible, the harvesting interface Input interface Provider should provide the same criteria as the end-user Native end-user interface interface
  • 26. dex self-describing archives •  Recommended criteria for metadata extraction: –  subject classification –  accession date –  publication date •  Criteria for archive description –  metadata formats employed –  contact information for archive –  publication type scheme –  identifier scheme –  subject classification scheme
  • 27. dex identifiers •  Useful in: –  reference linking –  can be used in citations –  resolving duplications •  UPS duplications were removed by hand –  tracking publication lifecycle •  Need the ability for an object to have multiple unique identifiers –  organization, discipline, etc.
  • 28. dex smart objects •  Premise: Objects are more important than the archives that hold them •  SODA: Smart Objects, Dumb Archives •  Objects should be the canonical authority for •  metadata •  contents •  use •  Objects should be able to grow and change •  correct metadata •  add new formats •  add new services •  reflect the lifecycle of the object
  • 29. dex smart objects •  It would be beneficial if the archived objects could be heterogenous: •  with their own “look-and-feel” •  unique functionality / services –  e.g., the data archiving needs of an atmospheric scientist can be different than that of a computer scientist, engineer or medical researcher •  yet maintained a standard API for: •  extracting metadata •  content retrieval •  resource discovery on the object •  terms and conditions
  • 30. dex lessons learned •  A strong distinction between the provision of data, and the implementation of data –  also, a socio-legal context for sharing metadata •  Open, “self-describing” archives •  A universal, unique identifier name space •  Archived objects with more intelligence and flexibility