SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Suppor&ng	
  Data-­‐Rich	
  
Research	
  on	
  Many	
  Fronts	
  
                                 2 1 	
   M a y 	
   2 0 1 2 	
  

  U n i v e r s i t y 	
   o f 	
   C a l i f o r n i a 	
   C u r a & o n 	
   C e n t e r 	
  
                C a l i f o r n i a 	
   D i g i t a l 	
   L i b r a r y 	
  
California	
  Digital	
  Library	
  
Serving	
  the	
  University	
  of	
  California	
     CDL	
  supports	
  the	
  research	
  lifecycle	
  	
  
•  10	
  campuses	
                                    •  Collec&ons	
  
•  360K	
  students,	
  faculty,	
  and	
  staff	
      •  Digital	
  Special	
  Collec&ons	
  
•  100’s	
  of	
  museums,	
  art	
  galleries,	
      •  Discovery	
  &	
  Delivery	
  
   observatories,	
  marine	
  centers,	
              •  Publishing	
  Group	
  
   botanical	
  gardens	
                              •  UC	
  Cura&on	
  Center	
  (UC3)	
  
•  5	
  medical	
  centers	
  
•  5	
  law	
  schools	
  
•  3	
  Na&onal	
  Laboratories	
  
California	
  Digital	
  Library	
  (CDL)	
  
Our	
  environment	
  circa	
  2002-­‐2008	
  
Focus	
  on	
  preserva&on	
  
For	
  memory	
  organiza&ons	
  
Infrastructure:	
  sta&c	
  
Services:	
  hosted	
  
Content:	
  museum	
  &	
  library	
  
Sustainability:	
  ?	
  
Our	
  environment	
  since	
  2008	
  
Focus	
  on	
  preserva&on	
           	
  cura%on	
  (lifecycle)	
  
For	
  memory	
  organiza&ons	
    	
  	
  and	
  now	
  data	
  producers	
  
Infrastructure:	
  sta&c	
             	
  	
  +	
  cloud,	
  VM,	
  bitbucket	
  	
  
Services:	
  hosted	
                   	
  	
  +	
  partnered,	
  self-­‐serve	
  
Content:	
  museum	
  &	
  library	
    	
  	
  +	
  research,	
  web	
  crawls	
  
Sustainability:	
  ?	
                 	
  	
  cost	
  recovery,	
  pay	
  once	
  
Today’s	
  journey	
  
          Data	
  service	
  basics	
  at	
  CDL	
  
               • Stable	
  storage	
  (Merri)	
  
               • Stable	
  iden&fiers	
  (EZID)	
  
               • Data	
  cita&on	
  (DataCite)	
  
               • Management	
  (DMPTool)	
  
               • Preserva&on	
  cost	
  modeling	
  
          ...	
  that	
  enable	
  
               • Federa&on	
  (DataONE)	
  
               • Data	
  papers	
  
               • Capture	
  (WAS	
  web	
  archiving)	
  
               • Excel	
  add-­‐in	
  (DCXL)	
  
The	
  scien&fic	
  record	
  is	
  at	
  risk	
  
Data	
  dissemina&on	
  is	
  rare,	
  risky,	
  expensive,	
  
 labor-­‐intensive,	
  domain-­‐specific,	
  and	
  
 receives	
  lile	
  credit	
  as	
  research	
  output	
  




                   Global	
  Change	
   Galac&c	
  Change	
  
The	
  changing	
  landscape	
  
•  Ever	
  increasing	
  number,	
  size,	
  and	
  
   diversity	
  of	
  content	
  
•  Ever	
  increasing	
  diversity	
  of	
  
   partners,	
  and	
  stakeholders	
  
•  Decreasing	
  resources	
  
•  Inevitability	
  of	
  disrup&ve	
  change	
  
     – Technology	
  
     – Ins&tu&onal	
  mission	
  

                                                       R ESOURCES	
  


                                                                        T IME	
  
Stable	
  storage:	
  	
  Merri	
  repository	
  
               •  Cura&on	
  repository	
  open	
  to	
  the	
  UC	
  
                  community	
  and	
  beyond	
  
               •  Discipline	
  /	
  content	
  agnos&c	
  	
  
               •  Micro-­‐services	
  architecture	
  
               •  Easy-­‐to-­‐use	
  UI	
  or	
  API	
  
               •  Hosted	
  or	
  locally	
  deployed	
  
                Primary	
  FuncAons	
  
                1.	
  Deposit	
  	
  
                2.	
  Manage	
  (metadata,	
  versions,	
  etc)	
  
                3.	
  Access	
  (expose)	
  
                4.	
  Share	
  (with	
  other	
  researchers)	
  
                5.	
  Preserve	
  
EZID:	
  Long	
  term	
  iden%fiers	
  made	
  easy	
  
 •  Precise	
  iden&fica&on	
  of	
  a	
  dataset	
  
    (DOI	
  or	
  ARK)	
  
 •  Credit	
  to	
  data	
  producers	
  and	
  
    data	
  publishers	
  
 •  A	
  link	
  from	
  the	
  tradi&onal	
  
    literature	
  to	
  the	
  data	
  (DataCite)	
  
 •  Exposure	
  and	
  research	
  metrics	
  
    for	
  datasets	
  
    (Web	
  of	
  Knowledge,	
  Google)	
  

                                                        Take	
  control	
  of	
  the	
  
Primary	
  FuncAons	
  
                                                        management	
  and	
  distribu%on	
  of	
  
1.	
  Create	
  persistent	
  iden&fiers	
               your	
  research,	
  share	
  and	
  get	
  
2.	
  Manage	
  iden&fiers	
  (and	
  associated	
       credit	
  for	
  it,	
  and	
  build	
  your	
  
      metadata)	
  over	
  &me	
                        reputa%on	
  through	
  its	
  collec%on	
  
                                                        and	
  documenta%on	
  
3.	
  Resolve	
  iden&fiers	
  
Discovery:	
  DataCite	
  consor&um	
  
•    Technische	
  Informa&onsbibliothek	
  (TIB),	
   •           Canada	
  Ins&tute	
  for	
  Scien&fic	
  and	
  
     Germany	
                                                     Technical	
  Informa&on	
  (CISTI)	
  
                                                              •    L’Ins&tut	
  de	
  l’Informa&on	
  Scien&fique	
  
•    Australian	
  Na&onal	
  Data	
  Service	
  (ANDS)	
  
                                                                   et	
  Technique	
  (INIST),	
  France	
  
•    The	
  Bri&sh	
  Library	
  
                                                              •    Library	
  or	
  the	
  ETH	
  Zürich	
  
•    California	
  Digital	
  Library,	
  USA	
               •    Library	
  of	
  TU	
  Delk,	
  The	
  Netherlands	
  
                                                              •    Office	
  of	
  ScienAfic	
  and	
  Technical	
  
                                                                   InformaAon,	
  US	
  Department	
  of	
  Energy	
  
                                                              •    Purdue	
  University,	
  USA	
  
                                                              •    Technical	
  Informa&on	
  Center	
  of	
  
                                                                   Denmark	
  
DMPTool	
  
  Mee&ng	
  funding	
  agencies	
  data	
  management	
  plan	
  requirements	
  
 •  Connect	
  researchers	
  to	
  resources	
  to	
  
    create	
  a	
  data	
  management	
  plan	
  
 •  NSF	
  and	
  directorates,	
  NIH,	
  NEH,	
  
    IMLS,	
  founda&ons	
  plus	
  
 •  Customizable	
  


Primary	
  FuncAons	
  
1.	
  Step-­‐by-­‐step	
  “wizard”	
  
2.	
  Templates	
  and	
  examples	
  
3.	
  Links	
  to	
  ins&tu&onal	
  resources	
  
      and	
  agency	
  informa&on	
  
4.	
  Plan	
  publica&on	
  and	
  sharing	
  
Number	
  of	
  Plans	
  Created	
  	
  
  Oct	
  2011	
  –	
  Feb	
  2012	
  
Cost	
  Model	
  1:	
  Pay	
  as	
  you	
  go	
  
•  Billed/paid	
  annually	
  

                                                                            {   P 	
  if	
  year = 0
                                                                                	
  0	
  	
  	
  if	
  year > 0


   –  Costs	
  for	
  archival	
  System	
  (A ),	
  Workflows	
  (W ),	
  Content	
  
      Types	
  (C ),	
  Monitoring	
  (M ),	
  and	
  Interven%ons	
  (V )	
  are	
  
      considered	
  common	
  goods,	
  and	
  are	
  appor&oned	
  equally	
  
      across	
  all	
  n	
  Producers	
  (P )	
  
        •  Model	
  components	
  are	
  represented	
  by	
  two	
  terms:	
  the	
  number	
  of	
  
           units	
  and	
  the	
  per-­‐unit	
  cost,	
  e.g.,	
  k ·S
   –  Storage	
  cost	
  (S )	
  accounted	
  on	
  a	
  per-­‐Producer	
  basis	
  
Model	
  2:	
  Pay	
  once,	
  preserve	
  for	
  “ T”	
  years	
  

•  Paid-­‐up	
  price	
  for	
  fixed	
  term T	
  	
      	
  




     –  A	
  func&on	
  of	
  r,	
  the	
  annual	
  investment	
  return,	
  and	
  d,	
  the	
  
        annual	
  decrease	
  in	
  unit	
  cost	
  of	
  preserva&on	
  
     –  G	
   is	
  the	
  cost	
  of	
  providing	
  a	
  year’s	
  preserva&on	
  service;	
  	
  	
  	
  
             	
  



        G0	
  includes	
  the	
  added	
  first	
  year	
  expense	
  of	
  Producer	
  
        engagement	
  and	
  registra&on	
  
     –  Sepng	
  T	
  =	
  ∞	
  calculates	
  the	
  price	
  for	
  “forever”	
  
New	
  distributed	
  framework	
  
           CoordinaAng	
  Nodes	
              Flexible,	
  scalable,	
  
              Member	
  Nodes	
  
•  retain	
  complete	
  metadata	
  
                                              sustainable	
  network	
  
• 	
  catalog	
  	
   ins&tu&ons	
  
      	
  diverse	
  
•  subset	
  of	
  all	
  data	
  
• 	
  	
  serve	
  local	
  community	
  
•  perform	
  basic	
  indexing	
  
• 	
  provide	
  network-­‐wide	
  
•  	
  provide	
  resources	
  for	
  
managing	
  their	
  data	
  
     services	
  
•  ensure	
  data	
  availability	
  
     (preserva&on)	
  	
  	
  
•  provide	
  replica&on	
  
     services	
  
Tradi&onal	
  ar&cles	
  vs	
  data	
  papers	
  
The	
  collec&ve	
  data	
  product	
  
Need	
  to	
  save	
  data	
  +	
  processing	
  




      Algorithms	
  +	
  Data	
  Structures	
  =	
  Programs	
  	
  
Vision	
  for	
  a	
  “data	
  paper”	
  	
  
•  Wrap	
  the	
  unfamiliar	
  in	
  a	
  familiar	
  façade	
  
•  A	
  “data	
  paper”	
  is	
  minimally	
  a	
  cover	
  sheet	
  
   and	
  a	
  set	
  of	
  links	
  to	
  archived	
  ar&facts	
  	
  
•  Cover	
  sheet	
  contains	
  familiar	
  elements:	
  
   &tle,	
  date,	
  authors,	
  abstract,	
  and	
  
   persistent	
  iden&fier	
  (DOI,	
  ARK,	
  etc.)	
  
•  Just	
  enough	
  to	
  permit	
  basic	
  exposure	
  and	
  
   discovery	
  
–  Building	
  a	
  basic	
  data	
  cita&on	
  	
  
–  Indexing	
  by	
  services	
  such	
  as	
  Web	
  of	
  
   Science,	
  Google	
  Scholar	
  
–  Ins&lling	
  	
  confidence	
  in	
  the	
  iden&fier’s	
  	
  
   stability	
  	
  
43 public archives
                                            120+ archives total
                                            58K crawls
                                            7,500 + sites
                                            600 million + URLs
                                            40+ TB
                                            24 institutions




Developed with LoC support by CDL, UNT, and others
What	
  are	
  people	
  using	
  WAS	
  for?	
  
       Archiving	
  at-­‐risk	
  government	
  websites	
  and	
  publica&ons	
  
                 Archiving	
  their	
  own	
  university	
  domains	
  
       Building	
  web	
  archives	
  to	
  complement	
  library	
  collec&ons	
  
           Documen&ng	
  web	
  coverage	
  of	
  significant	
  events	
  
Data	
  cura%on	
  for	
  Excel	
  
•  Excel	
  is	
  the	
  database	
  of	
  choice	
  for	
  many	
  researchers	
  
•  Make	
  it	
  easy	
  to	
  share,	
  archive,	
  	
  and	
  publish	
  data	
  
•  Keep	
  up	
  to	
  date	
  at	
  dcxl.cdlib.org	
  

Primary	
  FuncAons	
                                Surveyed	
  users	
  and	
  found:	
  
                                                     •  Most	
  researchers	
  are	
  unaware	
  of	
  
1.	
  An	
  Excel	
  add-­‐in	
  and	
  web	
  
                                                        preserva&on	
  op&ons	
  
    applica&on	
                                     •  Documenta&on	
  prac&ces	
  are	
  poor	
  
2.	
  Metadata	
  descrip&on	
  (through	
           •  Excel	
  is	
  just	
  one	
  tool	
  in	
  workflows	
  
    extrac&on	
  and	
  augmenta&on)	
  
3.	
  Check	
  for	
  good	
  data	
  prac&ces	
  
3.	
  Transfer	
  to	
  repository	
  	
  
A	
  data	
  cura&on	
  approach	
  at	
  CDL	
  
•  New	
  “data	
  paper”	
  publishing	
  model	
  [GBMF]	
  
•  DataCite	
  consor&um	
  and	
  cita&on	
  standards	
  
•  Other	
  fronts:	
  
   •  DataONE	
  global	
  data	
  network	
  [NSF]	
  
   •  Merri:	
  general-­‐purpose	
  data	
  repository	
  
   •  EZID:	
  scheme-­‐agnos&c	
  &	
  de-­‐coupled	
  crea&on,	
  
      resolu&on,	
  and	
  management	
  of	
  persistent	
  ids	
  
   •  Data	
  management	
  plan	
  generator	
  
   •  Web	
  archiving	
  service	
  [Library	
  of	
  Congress]	
  
   •  Open-­‐source	
  Excel	
  add-­‐in	
  [MS	
  Research	
  &	
  GBMF]	
  
Ques&ons?	
  

John.Kunze@ucop.edu	
  

California	
  Digital	
  Library	
  
 hp://www.cdlib.org/	
  

Contenu connexe

Tendances

‘If a tree falls in the forest’: recording and sharing digital preservation k...
‘If a tree falls in the forest’: recording and sharing digital preservation k...‘If a tree falls in the forest’: recording and sharing digital preservation k...
‘If a tree falls in the forest’: recording and sharing digital preservation k...
National Library of Australia
 
Practical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levelsPractical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levels
Chris Rusbridge
 

Tendances (20)

Digital preservation
Digital preservationDigital preservation
Digital preservation
 
Digitisation Overview
Digitisation OverviewDigitisation Overview
Digitisation Overview
 
Creation of LSE Digital Library
Creation of LSE Digital LibraryCreation of LSE Digital Library
Creation of LSE Digital Library
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Preparation, Proceed and Review of preservation of Digital Library
Preparation, Proceed and Review of preservation of Digital Library Preparation, Proceed and Review of preservation of Digital Library
Preparation, Proceed and Review of preservation of Digital Library
 
Digital preservation from a records management perspective
Digital preservation from a records management perspectiveDigital preservation from a records management perspective
Digital preservation from a records management perspective
 
Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introduction
 
Digital Preservation in the Wild
Digital Preservation in the WildDigital Preservation in the Wild
Digital Preservation in the Wild
 
Intro to Digital Preservation
Intro to Digital PreservationIntro to Digital Preservation
Intro to Digital Preservation
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Digital preservation
Digital preservationDigital preservation
Digital preservation
 
Natalie Harrower - Digital Preservation: Let's do it together!
Natalie Harrower - Digital Preservation: Let's do it together!Natalie Harrower - Digital Preservation: Let's do it together!
Natalie Harrower - Digital Preservation: Let's do it together!
 
‘If a tree falls in the forest’: recording and sharing digital preservation k...
‘If a tree falls in the forest’: recording and sharing digital preservation k...‘If a tree falls in the forest’: recording and sharing digital preservation k...
‘If a tree falls in the forest’: recording and sharing digital preservation k...
 
Research Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghResearch Data Management at the University of Edinburgh
Research Data Management at the University of Edinburgh
 
Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineering
 
Brief Introduction to Digital Preservation
Brief Introduction to Digital PreservationBrief Introduction to Digital Preservation
Brief Introduction to Digital Preservation
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data Stores
 
Practical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levelsPractical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levels
 
An Introduction to digital preservation at the Library of Congress
An Introduction to digital preservation at the Library of CongressAn Introduction to digital preservation at the Library of Congress
An Introduction to digital preservation at the Library of Congress
 

En vedette (8)

Nomina compu
Nomina compuNomina compu
Nomina compu
 
VistaNational Resource Library
VistaNational Resource LibraryVistaNational Resource Library
VistaNational Resource Library
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do Today
 
CV Estela Rojas 2
CV Estela Rojas 2CV Estela Rojas 2
CV Estela Rojas 2
 
тренинг по продукту от кузнецова сергея
тренинг по продукту от кузнецова сергеятренинг по продукту от кузнецова сергея
тренинг по продукту от кузнецова сергея
 
Equipo 2 diabetes en el embarazo
Equipo 2  diabetes en el embarazoEquipo 2  diabetes en el embarazo
Equipo 2 diabetes en el embarazo
 
Art
ArtArt
Art
 
Треугольник продаж от кузнецова сергея
Треугольник продаж от кузнецова сергеяТреугольник продаж от кузнецова сергея
Треугольник продаж от кузнецова сергея
 

Similaire à Supporting Data-Rich Research on Many Fronts

The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
John Kunze
 

Similaire à Supporting Data-Rich Research on Many Fronts (20)

The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management Ecosystem
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
RDM Programme at University of Edinburgh
RDM Programme at University of EdinburghRDM Programme at University of Edinburgh
RDM Programme at University of Edinburgh
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
The e-Ciber Superfacility Project
The e-Ciber Superfacility ProjectThe e-Ciber Superfacility Project
The e-Ciber Superfacility Project
 
Ariadne overview
Ariadne overviewAriadne overview
Ariadne overview
 
RCUK Cloud Workshop
RCUK Cloud WorkshopRCUK Cloud Workshop
RCUK Cloud Workshop
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12
 
Virtualization for HPC at NCI
Virtualization for HPC at NCIVirtualization for HPC at NCI
Virtualization for HPC at NCI
 
Ariadne: Lifecycles
Ariadne: LifecyclesAriadne: Lifecycles
Ariadne: Lifecycles
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
 
GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06
 

Plus de John Kunze

DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014
John Kunze
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout group
John Kunze
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich Research
John Kunze
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long Tail
John Kunze
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History Collections
John Kunze
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data Citations
John Kunze
 
Pairtrees for object storage
Pairtrees for object storagePairtrees for object storage
Pairtrees for object storage
John Kunze
 

Plus de John Kunze (20)

The YAMZ Metadictionary
The YAMZ MetadictionaryThe YAMZ Metadictionary
The YAMZ Metadictionary
 
YAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderYAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary Builder
 
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
 
EZID and N2T at CDL
EZID and N2T at CDLEZID and N2T at CDL
EZID and N2T at CDL
 
YAMZ.net: better, faster, cheaper taxonomy building
YAMZ.net:  better, faster, cheaper taxonomy buildingYAMZ.net:  better, faster, cheaper taxonomy building
YAMZ.net: better, faster, cheaper taxonomy building
 
A Vocabulary for Persistence
A Vocabulary for PersistenceA Vocabulary for Persistence
A Vocabulary for Persistence
 
Identifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesIdentifiers obey Resolvers not Schemes
Identifiers obey Resolvers not Schemes
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
 
ARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forward
 
YAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyYAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabulary
 
DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout group
 
Annotating Research Datasets
Annotating Research DatasetsAnnotating Research Datasets
Annotating Research Datasets
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich Research
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long Tail
 
Pamwg 2012ahm
Pamwg 2012ahmPamwg 2012ahm
Pamwg 2012ahm
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History Collections
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data Citations
 
Pairtrees for object storage
Pairtrees for object storagePairtrees for object storage
Pairtrees for object storage
 

Dernier

Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
allensay1
 
Mckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for ViewingMckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for Viewing
Nauman Safdar
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in OmanMifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
instagramfab782445
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
daisycvs
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
ZurliaSoop
 

Dernier (20)

BeMetals Investor Presentation_May 3, 2024.pdf
BeMetals Investor Presentation_May 3, 2024.pdfBeMetals Investor Presentation_May 3, 2024.pdf
BeMetals Investor Presentation_May 3, 2024.pdf
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 Updated
 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
 
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...
joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024
 
Mckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for ViewingMckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for Viewing
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Buy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail AccountsBuy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail Accounts
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
 
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in OmanMifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
Mifepristone Available in Muscat +918761049707^^ €€ Buy Abortion Pills in Oman
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
 
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGParadip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 

Supporting Data-Rich Research on Many Fronts

  • 1. Suppor&ng  Data-­‐Rich   Research  on  Many  Fronts   2 1   M a y   2 0 1 2   U n i v e r s i t y   o f   C a l i f o r n i a   C u r a & o n   C e n t e r   C a l i f o r n i a   D i g i t a l   L i b r a r y  
  • 2. California  Digital  Library   Serving  the  University  of  California   CDL  supports  the  research  lifecycle     •  10  campuses   •  Collec&ons   •  360K  students,  faculty,  and  staff   •  Digital  Special  Collec&ons   •  100’s  of  museums,  art  galleries,   •  Discovery  &  Delivery   observatories,  marine  centers,   •  Publishing  Group   botanical  gardens   •  UC  Cura&on  Center  (UC3)   •  5  medical  centers   •  5  law  schools   •  3  Na&onal  Laboratories  
  • 4. Our  environment  circa  2002-­‐2008   Focus  on  preserva&on   For  memory  organiza&ons   Infrastructure:  sta&c   Services:  hosted   Content:  museum  &  library   Sustainability:  ?  
  • 5. Our  environment  since  2008   Focus  on  preserva&on      cura%on  (lifecycle)   For  memory  organiza&ons        and  now  data  producers   Infrastructure:  sta&c       +  cloud,  VM,  bitbucket     Services:  hosted        +  partnered,  self-­‐serve   Content:  museum  &  library        +  research,  web  crawls   Sustainability:  ?       cost  recovery,  pay  once  
  • 6. Today’s  journey   Data  service  basics  at  CDL   • Stable  storage  (Merri)   • Stable  iden&fiers  (EZID)   • Data  cita&on  (DataCite)   • Management  (DMPTool)   • Preserva&on  cost  modeling   ...  that  enable   • Federa&on  (DataONE)   • Data  papers   • Capture  (WAS  web  archiving)   • Excel  add-­‐in  (DCXL)  
  • 7. The  scien&fic  record  is  at  risk   Data  dissemina&on  is  rare,  risky,  expensive,   labor-­‐intensive,  domain-­‐specific,  and   receives  lile  credit  as  research  output   Global  Change   Galac&c  Change  
  • 8. The  changing  landscape   •  Ever  increasing  number,  size,  and   diversity  of  content   •  Ever  increasing  diversity  of   partners,  and  stakeholders   •  Decreasing  resources   •  Inevitability  of  disrup&ve  change   – Technology   – Ins&tu&onal  mission   R ESOURCES   T IME  
  • 9. Stable  storage:    Merri  repository   •  Cura&on  repository  open  to  the  UC   community  and  beyond   •  Discipline  /  content  agnos&c     •  Micro-­‐services  architecture   •  Easy-­‐to-­‐use  UI  or  API   •  Hosted  or  locally  deployed   Primary  FuncAons   1.  Deposit     2.  Manage  (metadata,  versions,  etc)   3.  Access  (expose)   4.  Share  (with  other  researchers)   5.  Preserve  
  • 10. EZID:  Long  term  iden%fiers  made  easy   •  Precise  iden&fica&on  of  a  dataset   (DOI  or  ARK)   •  Credit  to  data  producers  and   data  publishers   •  A  link  from  the  tradi&onal   literature  to  the  data  (DataCite)   •  Exposure  and  research  metrics   for  datasets   (Web  of  Knowledge,  Google)   Take  control  of  the   Primary  FuncAons   management  and  distribu%on  of   1.  Create  persistent  iden&fiers   your  research,  share  and  get   2.  Manage  iden&fiers  (and  associated   credit  for  it,  and  build  your   metadata)  over  &me   reputa%on  through  its  collec%on   and  documenta%on   3.  Resolve  iden&fiers  
  • 11. Discovery:  DataCite  consor&um   •  Technische  Informa&onsbibliothek  (TIB),   •  Canada  Ins&tute  for  Scien&fic  and   Germany   Technical  Informa&on  (CISTI)   •  L’Ins&tut  de  l’Informa&on  Scien&fique   •  Australian  Na&onal  Data  Service  (ANDS)   et  Technique  (INIST),  France   •  The  Bri&sh  Library   •  Library  or  the  ETH  Zürich   •  California  Digital  Library,  USA   •  Library  of  TU  Delk,  The  Netherlands   •  Office  of  ScienAfic  and  Technical   InformaAon,  US  Department  of  Energy   •  Purdue  University,  USA   •  Technical  Informa&on  Center  of   Denmark  
  • 12. DMPTool   Mee&ng  funding  agencies  data  management  plan  requirements   •  Connect  researchers  to  resources  to   create  a  data  management  plan   •  NSF  and  directorates,  NIH,  NEH,   IMLS,  founda&ons  plus   •  Customizable   Primary  FuncAons   1.  Step-­‐by-­‐step  “wizard”   2.  Templates  and  examples   3.  Links  to  ins&tu&onal  resources   and  agency  informa&on   4.  Plan  publica&on  and  sharing  
  • 13. Number  of  Plans  Created     Oct  2011  –  Feb  2012  
  • 14. Cost  Model  1:  Pay  as  you  go   •  Billed/paid  annually   { P  if  year = 0  0      if  year > 0 –  Costs  for  archival  System  (A ),  Workflows  (W ),  Content   Types  (C ),  Monitoring  (M ),  and  Interven%ons  (V )  are   considered  common  goods,  and  are  appor&oned  equally   across  all  n  Producers  (P )   •  Model  components  are  represented  by  two  terms:  the  number  of   units  and  the  per-­‐unit  cost,  e.g.,  k ·S –  Storage  cost  (S )  accounted  on  a  per-­‐Producer  basis  
  • 15. Model  2:  Pay  once,  preserve  for  “ T”  years   •  Paid-­‐up  price  for  fixed  term T       –  A  func&on  of  r,  the  annual  investment  return,  and  d,  the   annual  decrease  in  unit  cost  of  preserva&on   –  G   is  the  cost  of  providing  a  year’s  preserva&on  service;           G0  includes  the  added  first  year  expense  of  Producer   engagement  and  registra&on   –  Sepng  T  =  ∞  calculates  the  price  for  “forever”  
  • 16. New  distributed  framework   CoordinaAng  Nodes   Flexible,  scalable,   Member  Nodes   •  retain  complete  metadata   sustainable  network   •   catalog     ins&tu&ons    diverse   •  subset  of  all  data   •     serve  local  community   •  perform  basic  indexing   •   provide  network-­‐wide   •   provide  resources  for   managing  their  data   services   •  ensure  data  availability   (preserva&on)       •  provide  replica&on   services  
  • 17. Tradi&onal  ar&cles  vs  data  papers  
  • 18. The  collec&ve  data  product  
  • 19. Need  to  save  data  +  processing   Algorithms  +  Data  Structures  =  Programs    
  • 20. Vision  for  a  “data  paper”     •  Wrap  the  unfamiliar  in  a  familiar  façade   •  A  “data  paper”  is  minimally  a  cover  sheet   and  a  set  of  links  to  archived  ar&facts     •  Cover  sheet  contains  familiar  elements:   &tle,  date,  authors,  abstract,  and   persistent  iden&fier  (DOI,  ARK,  etc.)   •  Just  enough  to  permit  basic  exposure  and   discovery   –  Building  a  basic  data  cita&on     –  Indexing  by  services  such  as  Web  of   Science,  Google  Scholar   –  Ins&lling    confidence  in  the  iden&fier’s     stability    
  • 21. 43 public archives 120+ archives total 58K crawls 7,500 + sites 600 million + URLs 40+ TB 24 institutions Developed with LoC support by CDL, UNT, and others
  • 22. What  are  people  using  WAS  for?   Archiving  at-­‐risk  government  websites  and  publica&ons   Archiving  their  own  university  domains   Building  web  archives  to  complement  library  collec&ons   Documen&ng  web  coverage  of  significant  events  
  • 23. Data  cura%on  for  Excel   •  Excel  is  the  database  of  choice  for  many  researchers   •  Make  it  easy  to  share,  archive,    and  publish  data   •  Keep  up  to  date  at  dcxl.cdlib.org   Primary  FuncAons   Surveyed  users  and  found:   •  Most  researchers  are  unaware  of   1.  An  Excel  add-­‐in  and  web   preserva&on  op&ons   applica&on   •  Documenta&on  prac&ces  are  poor   2.  Metadata  descrip&on  (through   •  Excel  is  just  one  tool  in  workflows   extrac&on  and  augmenta&on)   3.  Check  for  good  data  prac&ces   3.  Transfer  to  repository    
  • 24. A  data  cura&on  approach  at  CDL   •  New  “data  paper”  publishing  model  [GBMF]   •  DataCite  consor&um  and  cita&on  standards   •  Other  fronts:   •  DataONE  global  data  network  [NSF]   •  Merri:  general-­‐purpose  data  repository   •  EZID:  scheme-­‐agnos&c  &  de-­‐coupled  crea&on,   resolu&on,  and  management  of  persistent  ids   •  Data  management  plan  generator   •  Web  archiving  service  [Library  of  Congress]   •  Open-­‐source  Excel  add-­‐in  [MS  Research  &  GBMF]  
  • 25. Ques&ons?   John.Kunze@ucop.edu   California  Digital  Library   hp://www.cdlib.org/