2. Where we’re going
Background
Demo of UCSF DataShare
Technical details
Other details
Future plans
Q&A
From Flickr by Leo Hidalgo
3.
4. Goal
How
Catalyze widespread research data
sharing
Develop a system that lowers data
sharing barriers and builds an engaged
user community
5. Survey of users by Angela Rizk-‐Jackson
Has your research
group provided public
access to data?
Why?
Yes
No
How?
Other
Other
Journal
required
Funder
required
Repository
Website
n = 114
7. Repository choices…
Repositories
for data
Discipline-‐specific
General content
Institutional
Non-‐institutional
Publishers/for-‐profits
Short-‐term projects
8. Repository choices…
Which is more
important?
Depends
Institutional
• All data associated with
a paper
• Tells a story
• Clearinghouse for
researcher’s works
?
Which should a
researcher use?
Both
Discipline-‐specific
• Some of data for a
given paper
• Discoverable
• Integrated systems
• Collection policies
9. Institutional
• All data associated with
a paper
• Tells a story
• Clearinghouse for
researcher’s works
10. IR’s are SO
2002.
From Flickr by Colin ZHU
From Flickr by johnsons531
From Flickr by Ludie Cochrane
From Flickr by Kapil Karekar
11. Last
year…
… “Federal agencies investing in research and
development (more than $100 million in annual
expenditures) must have clear and coordinated
policies for increasing public access to research
products.”
13. But…
From Flickr by jackcheng
Not always self-‐service
Sometimes complicated
Data?
“Old” user interfaces
14. Simplify data deposit for UC
researchers
Simple metadata
Self-‐service upload and download
Branded for campus
Most Important:
Institutional Control Over Data
15. Background
Demo of UCSF DataShare
Technical details
Other details
Future plans
Q&A
From Flickr by Leo Hidalgo
16. Background
Demo of UCSF DataShare
Technical details
Other details
Future plans
Q&A
From Flickr by Leo Hidalgo
17. Technical goals
• Easy submission
• Persistent citation
• Preservation assurance
• Effective discovery
From www.dimensionsinfo.com
• Control over terms of use
• All the benefits of a centrally
hosted service, while
maintaining campus branding
and identity
From Flickr by Eric Peacock
18. System components
• Easy submission
UCSF drag-‐n-‐drop client
• Persistent citation
• Preservation assurance
• Effective discovery
• Control over terms of use
Data use agreements (DUAs)
• All the benefits of a centrally DNS, Apache, CSS, and
campus Shibboleth IdPs
hosted service, while
maintaining campus branding datashare.berkeley.edu
datashare.ucdavis.edu
and identity
datashare.uci.edu
datashare.ucla.edu
…
19. Deposit interactions
Researcher
(data producer)
datashare.campus.edu
DataShare portal
Campus
IdP
Authenticate
with campus
credentials
Shib
Drag-‐n-‐drop
client
Assemble dataset
Add metadata
Submit to Merritt
SDSC cloud
Preservation storage
Merritt
CSS
Atom
Discovery
Populate XTF index
(XTF)
Request DOI
Register metadata
Assign DOI
Data use
agreement
EZID
Request DOI
Register metadata
Assign DOI
Primo
Harvest for A&I discovery
DataCite
Data Citation
Index
Harvest for A&I discovery
20. Download interactions
Researcher
Synchronous for
small datasets;
asynchronous for
large (> 500 MB)
Campus
IdP
Download data
(data consumer)
datashare.campus.edu
DataShare portal
Drag-‐n-‐drop
client
Merritt
CSS
Discovery
(XTF)
Faceted search / browse
SDSC cloud
EZID
Retrieve data
Primo
Faceted search / browse
Data use
agreement
Accept DUA terms
DataCite
Data Citation
Index
Faceted search / browse
21. Background
Demo of UCSF DataShare
Technical details
Other details
Future plans
Q&A
From Flickr by Leo Hidalgo
22. Campus Library
Delivers service to community
Shapes user interface, URL, branding
Customizes key components
Develops help, training
Roles
UC3 / CDL
Guides the campus
Preserves content in Merritt
Connects to EZID
Deploys XTF for discovery
Works with vendors
SDSC
Maintains production storage
infrastructure
Holds three independent
copies of content
23. Branding &
Customization
From Flickr by Diorama Sky
•
•
•
•
Logo
URL
Contact information
Other…?
25. Cost
Anticipated cost of providing all campus ladder-‐track
faculty with 5 GBs for 10 years
Campus
Faculty
Threshold
Paid-‐up cost
Berkeley
1,260
10 TB
$ 29,300
Davis
1,240
10 TB
$ 29,300
Irvine
1,051
10 TB
$ 29,300
Los Angeles
1,701
10 TB
$ 29,300
Merced
159
1 TB
$ 2,930
Riverside
561
5 TB
$ 14,650
San Diego
1,109
10 TB
$ 29,300
San Francisco
366
2 TB
$ 5,860
Santa Barbara
746
5 TB
$ 14,650
Santa Cruz
485
5 TB
$ 14,650
Source: http://legacy-‐its.ucop.edu/uwnews/stat/headcount_fte/oct2013/welcome.html
26. Governance
& Agreements
Goal:
Simplify & Scale Data Use &
Deposit Agreements
27. Governance
& Agreements
Data
User
ODL or
similar
CDL
Terms of
service
UC Campus
ODL or similar
Terms of
service
Data
Depositor
28. Background
Demo of UCSF DataShare
Technical details
Other details
Next steps & future plans
Q&A
From Flickr by Leo Hidalgo
29. Who
Decides?
• CDL to work with each campus to
implement & shape service
• Campus-‐to-‐campus interaction
• Group meetings as needed
• SAG1 check-‐ins
• Communication (…)
37. DASH:
Helping Community
T Repositories
ob
eR
evi
seD
What Makes DASH Unique:
• Modern, intuitive user interface for superior user experience
• Freely available code for download and use by anyone
• User-‐friendly API(s) to ensure interoperability with existing
repositories (e.g., SWORD for deposit; Atom, OAI-‐PMH,
ResourceSync for populating the discovery index).
• Customizable interfaces that can be altered easily to reflect service
provider branding
• Authentication via institutional Identity Management Systems
38. Next Steps –
Next 2 Weeks
• details to be established
– who’s interested
– tech contact for interested
campuses
– communication lines
From Flickr by Themactep
39. Next Steps –
Next 2 Months
• get DataShare up and running
– Shibboleth configuration &
other authentication
– Domains/URLs established
– Customizations – logos etc.
From Flickr by Themactep
40. Next Steps –
Longer term
• in-‐person meeting?
• CDL camp?
• communication/outreach?
From Flickr by Themactep