Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Supporting Data-Rich Research on Many Fronts
1. Suppor&ng
Data-‐Rich
Research
on
Many
Fronts
2 1
M a y
2 0 1 2
U n i v e r s i t y
o f
C a l i f o r n i a
C u r a & o n
C e n t e r
C a l i f o r n i a
D i g i t a l
L i b r a r y
2. California
Digital
Library
Serving
the
University
of
California
CDL
supports
the
research
lifecycle
• 10
campuses
• Collec&ons
• 360K
students,
faculty,
and
staff
• Digital
Special
Collec&ons
• 100’s
of
museums,
art
galleries,
• Discovery
&
Delivery
observatories,
marine
centers,
• Publishing
Group
botanical
gardens
• UC
Cura&on
Center
(UC3)
• 5
medical
centers
• 5
law
schools
• 3
Na&onal
Laboratories
4. Our
environment
circa
2002-‐2008
Focus
on
preserva&on
For
memory
organiza&ons
Infrastructure:
sta&c
Services:
hosted
Content:
museum
&
library
Sustainability:
?
5. Our
environment
since
2008
Focus
on
preserva&on
cura%on
(lifecycle)
For
memory
organiza&ons
and
now
data
producers
Infrastructure:
sta&c
+
cloud,
VM,
bitbucket
Services:
hosted
+
partnered,
self-‐serve
Content:
museum
&
library
+
research,
web
crawls
Sustainability:
?
cost
recovery,
pay
once
6. Today’s
journey
Data
service
basics
at
CDL
• Stable
storage
(Merri)
• Stable
iden&fiers
(EZID)
• Data
cita&on
(DataCite)
• Management
(DMPTool)
• Preserva&on
cost
modeling
...
that
enable
• Federa&on
(DataONE)
• Data
papers
• Capture
(WAS
web
archiving)
• Excel
add-‐in
(DCXL)
7. The
scien&fic
record
is
at
risk
Data
dissemina&on
is
rare,
risky,
expensive,
labor-‐intensive,
domain-‐specific,
and
receives
lile
credit
as
research
output
Global
Change
Galac&c
Change
8. The
changing
landscape
• Ever
increasing
number,
size,
and
diversity
of
content
• Ever
increasing
diversity
of
partners,
and
stakeholders
• Decreasing
resources
• Inevitability
of
disrup&ve
change
– Technology
– Ins&tu&onal
mission
R ESOURCES
T IME
9. Stable
storage:
Merri
repository
• Cura&on
repository
open
to
the
UC
community
and
beyond
• Discipline
/
content
agnos&c
• Micro-‐services
architecture
• Easy-‐to-‐use
UI
or
API
• Hosted
or
locally
deployed
Primary
FuncAons
1.
Deposit
2.
Manage
(metadata,
versions,
etc)
3.
Access
(expose)
4.
Share
(with
other
researchers)
5.
Preserve
10. EZID:
Long
term
iden%fiers
made
easy
• Precise
iden&fica&on
of
a
dataset
(DOI
or
ARK)
• Credit
to
data
producers
and
data
publishers
• A
link
from
the
tradi&onal
literature
to
the
data
(DataCite)
• Exposure
and
research
metrics
for
datasets
(Web
of
Knowledge,
Google)
Take
control
of
the
Primary
FuncAons
management
and
distribu%on
of
1.
Create
persistent
iden&fiers
your
research,
share
and
get
2.
Manage
iden&fiers
(and
associated
credit
for
it,
and
build
your
metadata)
over
&me
reputa%on
through
its
collec%on
and
documenta%on
3.
Resolve
iden&fiers
11. Discovery:
DataCite
consor&um
• Technische
Informa&onsbibliothek
(TIB),
• Canada
Ins&tute
for
Scien&fic
and
Germany
Technical
Informa&on
(CISTI)
• L’Ins&tut
de
l’Informa&on
Scien&fique
• Australian
Na&onal
Data
Service
(ANDS)
et
Technique
(INIST),
France
• The
Bri&sh
Library
• Library
or
the
ETH
Zürich
• California
Digital
Library,
USA
• Library
of
TU
Delk,
The
Netherlands
• Office
of
ScienAfic
and
Technical
InformaAon,
US
Department
of
Energy
• Purdue
University,
USA
• Technical
Informa&on
Center
of
Denmark
12. DMPTool
Mee&ng
funding
agencies
data
management
plan
requirements
• Connect
researchers
to
resources
to
create
a
data
management
plan
• NSF
and
directorates,
NIH,
NEH,
IMLS,
founda&ons
plus
• Customizable
Primary
FuncAons
1.
Step-‐by-‐step
“wizard”
2.
Templates
and
examples
3.
Links
to
ins&tu&onal
resources
and
agency
informa&on
4.
Plan
publica&on
and
sharing
14. Cost
Model
1:
Pay
as
you
go
• Billed/paid
annually
{ P
if
year = 0
0
if
year > 0
– Costs
for
archival
System
(A ),
Workflows
(W ),
Content
Types
(C ),
Monitoring
(M ),
and
Interven%ons
(V )
are
considered
common
goods,
and
are
appor&oned
equally
across
all
n
Producers
(P )
• Model
components
are
represented
by
two
terms:
the
number
of
units
and
the
per-‐unit
cost,
e.g.,
k ·S
– Storage
cost
(S )
accounted
on
a
per-‐Producer
basis
15. Model
2:
Pay
once,
preserve
for
“ T”
years
• Paid-‐up
price
for
fixed
term T
– A
func&on
of
r,
the
annual
investment
return,
and
d,
the
annual
decrease
in
unit
cost
of
preserva&on
– G
is
the
cost
of
providing
a
year’s
preserva&on
service;
G0
includes
the
added
first
year
expense
of
Producer
engagement
and
registra&on
– Sepng
T
=
∞
calculates
the
price
for
“forever”
16. New
distributed
framework
CoordinaAng
Nodes
Flexible,
scalable,
Member
Nodes
• retain
complete
metadata
sustainable
network
•
catalog
ins&tu&ons
diverse
• subset
of
all
data
•
serve
local
community
• perform
basic
indexing
•
provide
network-‐wide
•
provide
resources
for
managing
their
data
services
• ensure
data
availability
(preserva&on)
• provide
replica&on
services
19. Need
to
save
data
+
processing
Algorithms
+
Data
Structures
=
Programs
20. Vision
for
a
“data
paper”
• Wrap
the
unfamiliar
in
a
familiar
façade
• A
“data
paper”
is
minimally
a
cover
sheet
and
a
set
of
links
to
archived
ar&facts
• Cover
sheet
contains
familiar
elements:
&tle,
date,
authors,
abstract,
and
persistent
iden&fier
(DOI,
ARK,
etc.)
• Just
enough
to
permit
basic
exposure
and
discovery
– Building
a
basic
data
cita&on
– Indexing
by
services
such
as
Web
of
Science,
Google
Scholar
– Ins&lling
confidence
in
the
iden&fier’s
stability
21. 43 public archives
120+ archives total
58K crawls
7,500 + sites
600 million + URLs
40+ TB
24 institutions
Developed with LoC support by CDL, UNT, and others
22. What
are
people
using
WAS
for?
Archiving
at-‐risk
government
websites
and
publica&ons
Archiving
their
own
university
domains
Building
web
archives
to
complement
library
collec&ons
Documen&ng
web
coverage
of
significant
events
23. Data
cura%on
for
Excel
• Excel
is
the
database
of
choice
for
many
researchers
• Make
it
easy
to
share,
archive,
and
publish
data
• Keep
up
to
date
at
dcxl.cdlib.org
Primary
FuncAons
Surveyed
users
and
found:
• Most
researchers
are
unaware
of
1.
An
Excel
add-‐in
and
web
preserva&on
op&ons
applica&on
• Documenta&on
prac&ces
are
poor
2.
Metadata
descrip&on
(through
• Excel
is
just
one
tool
in
workflows
extrac&on
and
augmenta&on)
3.
Check
for
good
data
prac&ces
3.
Transfer
to
repository
24. A
data
cura&on
approach
at
CDL
• New
“data
paper”
publishing
model
[GBMF]
• DataCite
consor&um
and
cita&on
standards
• Other
fronts:
• DataONE
global
data
network
[NSF]
• Merri:
general-‐purpose
data
repository
• EZID:
scheme-‐agnos&c
&
de-‐coupled
crea&on,
resolu&on,
and
management
of
persistent
ids
• Data
management
plan
generator
• Web
archiving
service
[Library
of
Congress]
• Open-‐source
Excel
add-‐in
[MS
Research
&
GBMF]