SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
NDSA Digital Preservation 2013
Integrating Repositories for
Research Data Sharing
Stephen Abrams
California Digital Library
Angela Rizk-Jackson
Julia Kochi
University of California, San Francisco
Noah Wittman
University of California, Berkeley
NDSA Digital Preservation 2013
Why is data curation important?
 Accelerating scientific progress
 Enabling appropriate scrutiny and verification of results
 Promoting integrity and debate
 Facilitating new collaborations
 Avoiding needless duplication of effort
 Increasingly, complying with institutional policies, publication
requirements, and funder mandates
Cf. White and Teds (2011), “Making the case for research data management” DCC briefing paper,
www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
NDSA Digital Preservation 2013
Merritt
 Curation repository available to the UC community and
external partners
 Preservation and access
 Content agnostic, model free
 Highly decentralized micro-services architecture
Cf. Abrams, Cruse, Kunze, and Minor (2011), “Curation micro-services: A pipeline metaphor for
repositories,” Journal of Digital Information 12(2), journals.tdl.org/jodi/article/view/1605
 26 curatorial units
 271 collections
 325,000 objects
 450,000 versions
 4,500,000 files
 13 TB
www.cdlib.org/uc3/merritt
merritt.cdlib.org
NDSA Digital Preservation 2013
Merritt
Storage node
Storage
broker
Inventory
ONEShare UNM
storage node
Storage node
UI/API
UI/API
UI/API
LDAP
LDAP
LDAP
RDBMS
Fixity
User
agent
Message
queue
RDBMS
Load
balancer
Ingest
Load
balancer
Ingest
Ingest
EZID
No-SQL
DataCite
…
DataONE
member node
RDBMS
RDBMS
DataONE
coord’ing node
…
IDF
Load
balancer
Web of
Knowledge
Primo
SAN
SDSC
cloud
NDSA Digital Preservation 2013
(Some) issues to address
 Scale
 Individual objects ranging from 0 to 47,000 files
 Individual files ranging from 0 to 14 GB
 Maintaining control
 Concern over potential loss of control over dissemination and
use of data
 User experience
 Switch from organizational to individual interaction
www.flickr.com/photos/vixon/116447718www.flickr.com/photos/traftery/4319529821www.flickr.com/photos/32195273@N05/51076852642
NDSA Digital Preservation 2013
(Some) issues to address
 Scale
 Individual objects ranging from 0 to 47,000 files
 Individual files ranging from 0 to 14 GB
 Maintaining control
 Concern over potential loss of control over dissemination and
use of data
 User experience
 Switch from organizational to individual interaction
Augment repository function by composition (when possible)
and addition (when necessary)
 Loosely-coupled integration with external community supported
systems and services
NDSA Digital Preservation 2013
Scale
 Avoiding client timeout
 ≤ 2 GB: File-based  stream-based AIP-to-DIP processing
 > 2 GB: Asynchronous delivery
 Email notification with personalized, time-limited URL
 Streamlined storage provisioning
 SDSC cloud
cloud.sdsc.edu
www.kevatron.co.uk/converting-8-24-bit-samples-in-coreaudio-on-ios www.flickr.com/photos/paulbhartzog/680749585
NDSA Digital Preservation 2013
Control
 Data use agreements (DUAs)
 Explicit assertion of license requirements and terms of use
 Curatorial and consumer notification of acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
From: no-reply-merritt@ucop.edu
Subject:Merritt DUA acceptance
Name: Stephen Abrams
Affiliation: California Digital Library
Collection: UCSF DataShare
Object: Frontotemporal Lobar Degeneration (FTLD)
Date: 2013-05-3109:50:34PDT
Terms of use: As part of this agreement, Consumer submits to the following
statements:
(1) I will receive access to de-identified data and will not attempt to establish the
identity of any of the study subjects.
(2) I will share these data only with my immediate co-workers, and I will not transfer
these data to other research groups. I understand that these data are available to
other research groups through the process by which I obtain them.
(3) I will require anyone in my group who utilizes these data, or anyone with whom I
share these data to comply with this data use agreement
...
NDSA Digital Preservation 2013
User experience
 Due to its open eligibility policy, Merritt will always provide a
more generic UX than special-purpose or disciplinary systems
 Shifting user roles, shifting expectations
 Institutional  individual researcher
 Behavioral expectations set by the commercial/mobile web
NDSA Digital Preservation 2013
User experience
 Due to its open eligibility policy, Merritt will always provide a
more generic UX than special-purpose or disciplinary systems
 Shifting user roles, shifting expectations
 Institutional  individual researcher
 Behavioral expectations set by the commercial web
 Integration with extant services that better provide the
desired UX
 DataShare
 Research Hub
NDSA Digital Preservation 2013
DataShare
 “The goal of the DataShare project is to catalyze widespread
sharing of scientific research data”
datashare.ucsf.edu
 UCSF Clinical and Translational Science Institute
ctsi.ucsf.edu
 UCSF Library
www.library.ucsf.edu
 UCSF Center for Imaging of Neurodegenerative Disease
www.radiology.ucsf.edu/cind
 Architecture
 DataShare submission client (Ruby/Rails)
 Merritt curation repository
 DataShare discovery portal (XTF/Java)
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Upload
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
DataShare
 Prepare
 Best practice advice
 Describe
 Upload
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Schema-directed
metadata editor
 DataCite schema
schema.datacite.org
 Upload
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Upload
 File browse or
drag-n-drop
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Upload
 Curate
 Manage datasets
 Discover
 Share
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Upload
 Curate
 Discover
 Faceted search and
browse
 Share
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Upload
 Curate
 Discover
 Share
 DataONE
 DataCite
 (soon) Primo
Web of Knowledge
 SEO
NDSA Digital Preservation 2013
Merritt + DataShare
Storage node
Storage
broker
Inventory
ONEShare UNM
storage node
Storage node
UI/API
UI/API
UI/API
LDAP
LDAP
LDAP
RDBMS
Fixity
User
agent
Message
queue
RDBMS
Load
balancer
Ingest
Load
balancer
Ingest
Ingest
EZID
No-SQL
DataCite
…
DataONE
member node
RDBMS
RDBMS
DataONE
coord’ing node
…
IDF
Load
balancer
Web of
Knowledge
Primo
SAN
SDSC
cloud
DataShare
upload
Collection
Atom feed
XTF
xtf.cdlib.org
DataShare
portal
Lucene
NDSA Digital Preservation 2013
Research Hub
 “Research Hub provides powerful tools for content
management and collaboration”
hub.berkeley.edu
 Alfresco CMS
www.alfresco.com
 770 projects, 3,900 users
 Personal file management
 Project collaboration
 Departmental resource pooling
 Research data management
 Desktop sync, mobile app, Adobe Creative Suite
 UC Berkeley Information Services and Technology
ist.berkeley.edu
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Acquire and
arrange
 Describe
 Upload
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Schema-directed
metadata editors
 Upload
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Direct action
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Policy-based
workflow rules
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Drag-and-drop
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Curate
 Manage datasets
 Discover
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Curate
 Discover
 Search / browse
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Curate
 Discover
 Share
 Curatorial
invitation
NDSA Digital Preservation 2013
Merritt + DataShare + Research Hub
Storage node
Storage
broker
Inventory
ONEShare UNM
storage node
Storage node
UI/API
UI/API
UI/API
LDAP
LDAP
LDAP
RDBMS
Fixity
User
agent
Message
queue
RDBMS
Load
balancer
Ingest
Load
balancer
Ingest
Ingest
EZID
No-SQL
DataCite
…
DataONE
member node
RDBMS
RDBMS
DataONE
coord’ing node
…
IDF
Load
balancer
Web of
Knowledge
Primo
SAN
SDSC
cloud
DataShare
upload
Collection
Atom feed
XTF
xtf.cdlib.org
DataShare
portal
Lucene
Research
Hub
NDSA Digital Preservation 2013
Future integrations
 UCTrust/InCommon federation
Incommon.org
 Open Context archaeological portal
opencontext.org
 Nuxeo
www.nuxeo.com
 UC system-wide DAMS
 Islandora
islandora.ca
 Fedora  Merritt
 DPN
www.dpn.org
NDSA Digital Preservation 2013
Sharing research through repositories
 Conform to institutional policy, publication requirements, and
funder mandates
 Pro-active curation of valuable research outputs
 Stable citation and access
 High visibility publication and discovery
 Use metrics
NDSA Digital Preservation 2013
Sharing research through repositories
 Conform to institutional policy, publication requirements, and
funder mandates
 Pro-active curation of valuable research outputs
 Stable citation and access
 High visibility publication and discovery
 Use metrics
 Repository layering as an appropriate division of labor
 Exploiting existing capabilities already in local use
NDSA Digital Preservation 2013
For more information
 Merritt
www.cdlib.org/uc3/merritt
uc3@ucop.edu
Stephen Abrams David Loy
Patricia Cruse Mark Reyes
Shirin Faenza Joan Starr
Scott Fisher Carly Strasser
Erik Hetzner Marisa Strong
Joshua Hubbard Bhavitavya Vedula
Greg Janée Kenneth Weiss
John Kunze Perry Willet
Rosalie Lack
 DataShare
datashare.ucsf.edu
Geoffrey Boushey Julia Kochi
Anirvan Chatterjee Angela Rizk-Jackson
Maninder Kahlon Michael Weiner
 Research Hub
hub.berkeley.edu
Ian Crew Michael McCarthy (Tribloom)
Noah Wittman Patrick McGrath
www.slideshare.net/UC3/ndsa-2013abramsintegratingrepositoriesfordatasharing

Contenu connexe

Tendances

Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...IJERA Editor
 
THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009Fiona Salvage
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it worldChris Dwan
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
Efficient and reliable hybrid cloud architecture for big database
Efficient and reliable hybrid cloud architecture for big databaseEfficient and reliable hybrid cloud architecture for big database
Efficient and reliable hybrid cloud architecture for big databaseijccsa
 
Guaranteed Availability of Cloud Data with Efficient Cost
Guaranteed Availability of Cloud Data with Efficient CostGuaranteed Availability of Cloud Data with Efficient Cost
Guaranteed Availability of Cloud Data with Efficient CostIRJET Journal
 
Developing research data management policy & services
Developing research data management policy & servicesDeveloping research data management policy & services
Developing research data management policy & servicesSarah Jones
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...datascienceiqss
 
The Extreme Data Cloud (XDC) Project
The Extreme Data Cloud (XDC) ProjectThe Extreme Data Cloud (XDC) Project
The Extreme Data Cloud (XDC) ProjectEUDAT
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Jenn Riley
 
147bc3d2e2ffdb1c4f10d673600dd786.Maintaining Integrity and Security for the D...
147bc3d2e2ffdb1c4f10d673600dd786.Maintaining Integrity and Security for the D...147bc3d2e2ffdb1c4f10d673600dd786.Maintaining Integrity and Security for the D...
147bc3d2e2ffdb1c4f10d673600dd786.Maintaining Integrity and Security for the D...Gayathri Dili
 
IRJET- Redsc: Reliablity of Data Sharing in Cloud
IRJET- Redsc: Reliablity of Data Sharing in CloudIRJET- Redsc: Reliablity of Data Sharing in Cloud
IRJET- Redsc: Reliablity of Data Sharing in CloudIRJET Journal
 
Paper id 252014139
Paper id 252014139Paper id 252014139
Paper id 252014139IJRAT
 
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud ComputingAn Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computingijtsrd
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
An proficient and Confidentiality-Preserving Multi- Keyword Ranked Search ove...
An proficient and Confidentiality-Preserving Multi- Keyword Ranked Search ove...An proficient and Confidentiality-Preserving Multi- Keyword Ranked Search ove...
An proficient and Confidentiality-Preserving Multi- Keyword Ranked Search ove...Editor IJCATR
 
A novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingA novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingJoão Gabriel Lima
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive trackGeorge Komatsoulis
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Blue BRIDGE
 

Tendances (20)

Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
 
THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
Efficient and reliable hybrid cloud architecture for big database
Efficient and reliable hybrid cloud architecture for big databaseEfficient and reliable hybrid cloud architecture for big database
Efficient and reliable hybrid cloud architecture for big database
 
Guaranteed Availability of Cloud Data with Efficient Cost
Guaranteed Availability of Cloud Data with Efficient CostGuaranteed Availability of Cloud Data with Efficient Cost
Guaranteed Availability of Cloud Data with Efficient Cost
 
Developing research data management policy & services
Developing research data management policy & servicesDeveloping research data management policy & services
Developing research data management policy & services
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
 
The Extreme Data Cloud (XDC) Project
The Extreme Data Cloud (XDC) ProjectThe Extreme Data Cloud (XDC) Project
The Extreme Data Cloud (XDC) Project
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
147bc3d2e2ffdb1c4f10d673600dd786.Maintaining Integrity and Security for the D...
147bc3d2e2ffdb1c4f10d673600dd786.Maintaining Integrity and Security for the D...147bc3d2e2ffdb1c4f10d673600dd786.Maintaining Integrity and Security for the D...
147bc3d2e2ffdb1c4f10d673600dd786.Maintaining Integrity and Security for the D...
 
IRJET- Redsc: Reliablity of Data Sharing in Cloud
IRJET- Redsc: Reliablity of Data Sharing in CloudIRJET- Redsc: Reliablity of Data Sharing in Cloud
IRJET- Redsc: Reliablity of Data Sharing in Cloud
 
Paper id 252014139
Paper id 252014139Paper id 252014139
Paper id 252014139
 
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud ComputingAn Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
An proficient and Confidentiality-Preserving Multi- Keyword Ranked Search ove...
An proficient and Confidentiality-Preserving Multi- Keyword Ranked Search ove...An proficient and Confidentiality-Preserving Multi- Keyword Ranked Search ove...
An proficient and Confidentiality-Preserving Multi- Keyword Ranked Search ove...
 
A novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingA novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computing
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive track
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 

En vedette

Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Jian Qin
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and dataAndrew Treloar
 
Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Chris Rusbridge
 
Data Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesData Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesVarsha Khodiyar
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Leon Osinski
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
Open Data Repositories
Open Data RepositoriesOpen Data Repositories
Open Data RepositoriesXavier Ochoa
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...EUDAT
 

En vedette (8)

Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and data
 
Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...
 
Data Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesData Publishing and Institutional Repositories
Data Publishing and Institutional Repositories
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
Open Data Repositories
Open Data RepositoriesOpen Data Repositories
Open Data Repositories
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
 

Similaire à Ndsa 2013-abrams-integrating-repositories-for-data-sharing

How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useMatthew Vaughn
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster LEARN Project
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management Gary Wilhelm
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourKNOWeSCAPE2014
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.pptNileshkuGiri
 
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...AIRCC Publishing Corporation
 
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...AIRCC Publishing Corporation
 
3 rd International Conference on Signal Processing, VLSI Design & Communicati...
3 rd International Conference on Signal Processing, VLSI Design & Communicati...3 rd International Conference on Signal Processing, VLSI Design & Communicati...
3 rd International Conference on Signal Processing, VLSI Design & Communicati...AIRCC Publishing Corporation
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 
Research Data Management at the University of Salford
Research Data Management at the University of SalfordResearch Data Management at the University of Salford
Research Data Management at the University of SalfordDavid Clay
 
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...DuraSpace
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVEUDAT
 
Keynote Presentation at MTSR07
Keynote Presentation at MTSR07Keynote Presentation at MTSR07
Keynote Presentation at MTSR07Gauri Salokhe
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOCMerce Crosas
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 

Similaire à Ndsa 2013-abrams-integrating-repositories-for-data-sharing (20)

Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-use
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
DataShare for UC Campuses
DataShare for UC CampusesDataShare for UC Campuses
DataShare for UC Campuses
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.ppt
 
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
 
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
 
3 rd International Conference on Signal Processing, VLSI Design & Communicati...
3 rd International Conference on Signal Processing, VLSI Design & Communicati...3 rd International Conference on Signal Processing, VLSI Design & Communicati...
3 rd International Conference on Signal Processing, VLSI Design & Communicati...
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
Research Data Management at the University of Salford
Research Data Management at the University of SalfordResearch Data Management at the University of Salford
Research Data Management at the University of Salford
 
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
 
Keynote Presentation at MTSR07
Keynote Presentation at MTSR07Keynote Presentation at MTSR07
Keynote Presentation at MTSR07
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
Uc3 ucacc-2015-11-16
Uc3 ucacc-2015-11-16Uc3 ucacc-2015-11-16
Uc3 ucacc-2015-11-16
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 

Plus de University of California Curation Center

ETDs: Electronic Thesis and Dissertation Service at the University of California
ETDs: Electronic Thesis and Dissertation Service at the University of CaliforniaETDs: Electronic Thesis and Dissertation Service at the University of California
ETDs: Electronic Thesis and Dissertation Service at the University of CaliforniaUniversity of California Curation Center
 
The UC Curation Center (UC3): Developing Tools & Services for Managing Research
The UC Curation Center (UC3): Developing Tools & Services for Managing ResearchThe UC Curation Center (UC3): Developing Tools & Services for Managing Research
The UC Curation Center (UC3): Developing Tools & Services for Managing ResearchUniversity of California Curation Center
 

Plus de University of California Curation Center (20)

ETDs: Electronic Thesis and Dissertation Service at the University of California
ETDs: Electronic Thesis and Dissertation Service at the University of CaliforniaETDs: Electronic Thesis and Dissertation Service at the University of California
ETDs: Electronic Thesis and Dissertation Service at the University of California
 
Dash UCCSC 2016
Dash UCCSC 2016Dash UCCSC 2016
Dash UCCSC 2016
 
Dash: data sharing made easy
Dash: data sharing made easyDash: data sharing made easy
Dash: data sharing made easy
 
CDL research lifecycle
CDL research lifecycleCDL research lifecycle
CDL research lifecycle
 
Ucmp 20150407
Ucmp 20150407Ucmp 20150407
Ucmp 20150407
 
What does "data publication" mean to researchers?
What does "data publication" mean to researchers?What does "data publication" mean to researchers?
What does "data publication" mean to researchers?
 
Researcher perspectives on publication and peer review of data.
Researcher perspectives on publication and peer review of data.Researcher perspectives on publication and peer review of data.
Researcher perspectives on publication and peer review of data.
 
Enhancing DMPTool: Further Streamlineing Data Mangement Planning Process
Enhancing DMPTool: Further Streamlineing Data Mangement Planning ProcessEnhancing DMPTool: Further Streamlineing Data Mangement Planning Process
Enhancing DMPTool: Further Streamlineing Data Mangement Planning Process
 
DataShare: Empowering Researcher Data Curation
DataShare: Empowering Researcher Data CurationDataShare: Empowering Researcher Data Curation
DataShare: Empowering Researcher Data Curation
 
Future of web archiving
Future of web archivingFuture of web archiving
Future of web archiving
 
Data preservation 101
Data preservation 101Data preservation 101
Data preservation 101
 
Creating superior data management plans with the DMPTool
Creating superior data management plans with the DMPToolCreating superior data management plans with the DMPTool
Creating superior data management plans with the DMPTool
 
ESA Ignite talk on the DMPTool by S Abrams
ESA Ignite talk on the DMPTool by S AbramsESA Ignite talk on the DMPTool by S Abrams
ESA Ignite talk on the DMPTool by S Abrams
 
DMPTool2 Webinar #1 for Administrators
DMPTool2 Webinar #1 for AdministratorsDMPTool2 Webinar #1 for Administrators
DMPTool2 Webinar #1 for Administrators
 
DMPTool2 Administrator Webinar #2
DMPTool2 Administrator Webinar #2DMPTool2 Administrator Webinar #2
DMPTool2 Administrator Webinar #2
 
Helping librarians use the DMPTool as a centerpiece for data management
Helping librarians use the DMPTool as a centerpiece for data managementHelping librarians use the DMPTool as a centerpiece for data management
Helping librarians use the DMPTool as a centerpiece for data management
 
The UC Curation Center (UC3): Developing Tools & Services for Managing Research
The UC Curation Center (UC3): Developing Tools & Services for Managing ResearchThe UC Curation Center (UC3): Developing Tools & Services for Managing Research
The UC Curation Center (UC3): Developing Tools & Services for Managing Research
 
Dataset Metadata Publication Through EZID
Dataset Metadata Publication Through EZIDDataset Metadata Publication Through EZID
Dataset Metadata Publication Through EZID
 
DMPTool2: Improvements and Outreach
DMPTool2: Improvements and Outreach DMPTool2: Improvements and Outreach
DMPTool2: Improvements and Outreach
 
DMPTool Webinar 11: Complementary Tools
DMPTool Webinar 11: Complementary ToolsDMPTool Webinar 11: Complementary Tools
DMPTool Webinar 11: Complementary Tools
 

Dernier

activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 

Dernier (20)

activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 

Ndsa 2013-abrams-integrating-repositories-for-data-sharing

  • 1. NDSA Digital Preservation 2013 Integrating Repositories for Research Data Sharing Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California, San Francisco Noah Wittman University of California, Berkeley
  • 2. NDSA Digital Preservation 2013 Why is data curation important?  Accelerating scientific progress  Enabling appropriate scrutiny and verification of results  Promoting integrity and debate  Facilitating new collaborations  Avoiding needless duplication of effort  Increasingly, complying with institutional policies, publication requirements, and funder mandates Cf. White and Teds (2011), “Making the case for research data management” DCC briefing paper, www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
  • 3. NDSA Digital Preservation 2013 Merritt  Curation repository available to the UC community and external partners  Preservation and access  Content agnostic, model free  Highly decentralized micro-services architecture Cf. Abrams, Cruse, Kunze, and Minor (2011), “Curation micro-services: A pipeline metaphor for repositories,” Journal of Digital Information 12(2), journals.tdl.org/jodi/article/view/1605  26 curatorial units  271 collections  325,000 objects  450,000 versions  4,500,000 files  13 TB www.cdlib.org/uc3/merritt merritt.cdlib.org
  • 4. NDSA Digital Preservation 2013 Merritt Storage node Storage broker Inventory ONEShare UNM storage node Storage node UI/API UI/API UI/API LDAP LDAP LDAP RDBMS Fixity User agent Message queue RDBMS Load balancer Ingest Load balancer Ingest Ingest EZID No-SQL DataCite … DataONE member node RDBMS RDBMS DataONE coord’ing node … IDF Load balancer Web of Knowledge Primo SAN SDSC cloud
  • 5. NDSA Digital Preservation 2013 (Some) issues to address  Scale  Individual objects ranging from 0 to 47,000 files  Individual files ranging from 0 to 14 GB  Maintaining control  Concern over potential loss of control over dissemination and use of data  User experience  Switch from organizational to individual interaction www.flickr.com/photos/vixon/116447718www.flickr.com/photos/traftery/4319529821www.flickr.com/photos/32195273@N05/51076852642
  • 6. NDSA Digital Preservation 2013 (Some) issues to address  Scale  Individual objects ranging from 0 to 47,000 files  Individual files ranging from 0 to 14 GB  Maintaining control  Concern over potential loss of control over dissemination and use of data  User experience  Switch from organizational to individual interaction Augment repository function by composition (when possible) and addition (when necessary)  Loosely-coupled integration with external community supported systems and services
  • 7. NDSA Digital Preservation 2013 Scale  Avoiding client timeout  ≤ 2 GB: File-based  stream-based AIP-to-DIP processing  > 2 GB: Asynchronous delivery  Email notification with personalized, time-limited URL  Streamlined storage provisioning  SDSC cloud cloud.sdsc.edu www.kevatron.co.uk/converting-8-24-bit-samples-in-coreaudio-on-ios www.flickr.com/photos/paulbhartzog/680749585
  • 8. NDSA Digital Preservation 2013 Control  Data use agreements (DUAs)  Explicit assertion of license requirements and terms of use  Curatorial and consumer notification of acceptance Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252- 69, doi:10.1016/j.jbi.2006.09.001 From: no-reply-merritt@ucop.edu Subject:Merritt DUA acceptance Name: Stephen Abrams Affiliation: California Digital Library Collection: UCSF DataShare Object: Frontotemporal Lobar Degeneration (FTLD) Date: 2013-05-3109:50:34PDT Terms of use: As part of this agreement, Consumer submits to the following statements: (1) I will receive access to de-identified data and will not attempt to establish the identity of any of the study subjects. (2) I will share these data only with my immediate co-workers, and I will not transfer these data to other research groups. I understand that these data are available to other research groups through the process by which I obtain them. (3) I will require anyone in my group who utilizes these data, or anyone with whom I share these data to comply with this data use agreement ...
  • 9. NDSA Digital Preservation 2013 User experience  Due to its open eligibility policy, Merritt will always provide a more generic UX than special-purpose or disciplinary systems  Shifting user roles, shifting expectations  Institutional  individual researcher  Behavioral expectations set by the commercial/mobile web
  • 10. NDSA Digital Preservation 2013 User experience  Due to its open eligibility policy, Merritt will always provide a more generic UX than special-purpose or disciplinary systems  Shifting user roles, shifting expectations  Institutional  individual researcher  Behavioral expectations set by the commercial web  Integration with extant services that better provide the desired UX  DataShare  Research Hub
  • 11. NDSA Digital Preservation 2013 DataShare  “The goal of the DataShare project is to catalyze widespread sharing of scientific research data” datashare.ucsf.edu  UCSF Clinical and Translational Science Institute ctsi.ucsf.edu  UCSF Library www.library.ucsf.edu  UCSF Center for Imaging of Neurodegenerative Disease www.radiology.ucsf.edu/cind  Architecture  DataShare submission client (Ruby/Rails)  Merritt curation repository  DataShare discovery portal (XTF/Java)
  • 12. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Upload  Curate  Discover  Share
  • 13. NDSA Digital Preservation 2013 DataShare  Prepare  Best practice advice  Describe  Upload  Curate  Discover  Share
  • 14. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Schema-directed metadata editor  DataCite schema schema.datacite.org  Upload  Curate  Discover  Share
  • 15. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Upload  File browse or drag-n-drop  Curate  Discover  Share
  • 16. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Upload  Curate  Manage datasets  Discover  Share
  • 17. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Upload  Curate  Discover  Faceted search and browse  Share
  • 18. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Upload  Curate  Discover  Share  DataONE  DataCite  (soon) Primo Web of Knowledge  SEO
  • 19. NDSA Digital Preservation 2013 Merritt + DataShare Storage node Storage broker Inventory ONEShare UNM storage node Storage node UI/API UI/API UI/API LDAP LDAP LDAP RDBMS Fixity User agent Message queue RDBMS Load balancer Ingest Load balancer Ingest Ingest EZID No-SQL DataCite … DataONE member node RDBMS RDBMS DataONE coord’ing node … IDF Load balancer Web of Knowledge Primo SAN SDSC cloud DataShare upload Collection Atom feed XTF xtf.cdlib.org DataShare portal Lucene
  • 20. NDSA Digital Preservation 2013 Research Hub  “Research Hub provides powerful tools for content management and collaboration” hub.berkeley.edu  Alfresco CMS www.alfresco.com  770 projects, 3,900 users  Personal file management  Project collaboration  Departmental resource pooling  Research data management  Desktop sync, mobile app, Adobe Creative Suite  UC Berkeley Information Services and Technology ist.berkeley.edu
  • 21. NDSA Digital Preservation 2013 Research Hub  Prepare  Acquire and arrange  Describe  Upload  Curate  Discover  Share
  • 22. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Schema-directed metadata editors  Upload  Curate  Discover  Share
  • 23. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Direct action  Curate  Discover  Share
  • 24. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Policy-based workflow rules  Curate  Discover  Share
  • 25. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Drag-and-drop  Curate  Discover  Share
  • 26. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Curate  Manage datasets  Discover  Share
  • 27. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Curate  Discover  Search / browse  Share
  • 28. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Curate  Discover  Share  Curatorial invitation
  • 29. NDSA Digital Preservation 2013 Merritt + DataShare + Research Hub Storage node Storage broker Inventory ONEShare UNM storage node Storage node UI/API UI/API UI/API LDAP LDAP LDAP RDBMS Fixity User agent Message queue RDBMS Load balancer Ingest Load balancer Ingest Ingest EZID No-SQL DataCite … DataONE member node RDBMS RDBMS DataONE coord’ing node … IDF Load balancer Web of Knowledge Primo SAN SDSC cloud DataShare upload Collection Atom feed XTF xtf.cdlib.org DataShare portal Lucene Research Hub
  • 30. NDSA Digital Preservation 2013 Future integrations  UCTrust/InCommon federation Incommon.org  Open Context archaeological portal opencontext.org  Nuxeo www.nuxeo.com  UC system-wide DAMS  Islandora islandora.ca  Fedora  Merritt  DPN www.dpn.org
  • 31. NDSA Digital Preservation 2013 Sharing research through repositories  Conform to institutional policy, publication requirements, and funder mandates  Pro-active curation of valuable research outputs  Stable citation and access  High visibility publication and discovery  Use metrics
  • 32. NDSA Digital Preservation 2013 Sharing research through repositories  Conform to institutional policy, publication requirements, and funder mandates  Pro-active curation of valuable research outputs  Stable citation and access  High visibility publication and discovery  Use metrics  Repository layering as an appropriate division of labor  Exploiting existing capabilities already in local use
  • 33. NDSA Digital Preservation 2013 For more information  Merritt www.cdlib.org/uc3/merritt uc3@ucop.edu Stephen Abrams David Loy Patricia Cruse Mark Reyes Shirin Faenza Joan Starr Scott Fisher Carly Strasser Erik Hetzner Marisa Strong Joshua Hubbard Bhavitavya Vedula Greg Janée Kenneth Weiss John Kunze Perry Willet Rosalie Lack  DataShare datashare.ucsf.edu Geoffrey Boushey Julia Kochi Anirvan Chatterjee Angela Rizk-Jackson Maninder Kahlon Michael Weiner  Research Hub hub.berkeley.edu Ian Crew Michael McCarthy (Tribloom) Noah Wittman Patrick McGrath www.slideshare.net/UC3/ndsa-2013abramsintegratingrepositoriesfordatasharing

Notes de l'éditeur

  1. Copyright © 2013 by The Regents of the University of CaliforniaThis work is made available under the terms of the Creative Commons Attribution-ShareAlike 3.0 license
  2. MatijaGrguric, Scale up! – Curved elements!, http://www.flickr.com/photos/32195273@N05/5107685264Tom Rafferty, Remote control!, http://www.flickr.com/photos/67945918@N00/4319529821Barry Egan, File rio 2006, http://www.flickr.com/photos/vixon/116447718
  3. Kevin Smith, Converting 8.24 bit samples in CoreAudio on iOS, http://www.kevatron.co.uk/converting-8-24-bit-samples-in-coreaudio-on-ios/Paul B. Hartzog, Server room, http://www.flickr.com/photos/paulbhartzog/680749585