SlideShare une entreprise Scribd logo
1  sur  27
MacKenzie Smith
Associate Director for Technology, MIT Libraries
Science Commons Research Fellow, Creative Commons
CrossRef Annual Meeting ©2010, MIT
2
3
The world’s first hard drive (5Mb)
IBM Almaden Research Center, 1952-1954
That Was Then
4
CrossRef Annual Meeting ©2010, MIT
Current capacity hard drive (>2Tb)
Google Data Center, 2010
This is Now
5
CrossRef Annual Meeting ©2010, MIT
How Much Information?
“IDC research shows that the digital universe —information that is
either created, captured, or replicated in digital form — was 281
exabytes in 2007. In 2011, the amount of digital information
produced in the year should equal nearly 1,800 exabytes, or 10
times that produced in 2006. The compound annual growth rate
between now and 2011 is expected to be almost 60%”
The Diverse and Exploding Digital Universe, 2008 IDC White Paper
6
CrossRef Annual Meeting ©2010, MIT
How Much Information?
7
Sequence Submissions to DNA DataBank of Japan 1993-2005
CrossRef Annual Meeting ©2010, MIT
What Is Research Data?
Observational e.g. sensor, telemetry, survey, sample data
Experimental e.g. genetic sequences, chromatograms
Simulation e.g. climate, economic, 3-D models
Media e.g. images, audio, video
Derived/compiled e.g. text/data mining, compiled databases
Often expensive or impossible to reproduce
8
CrossRef Annual Meeting ©2010, MIT
What Is Research Data?
Text e.g. flat text files, Word, PDF
Numerical e.g. SPSS, STATA, Excel, MySQL
Media e.g. jpeg, tiff, dicom, mpeg, quicktime
Models e.g. 3D, statistical
Software e.g. Java, C programs
Domain-specific e.g. FITS in astronomy, CIF in chemistry
Instrument-specific e.g. Olympus con-focal microscope
Not always in neat packages like books
9
CrossRef Annual Meeting ©2010, MIT
What Do Researchers Do With
Data?
 Analyze (e.g. process, visualize)
 Share
 Review (evaluate methods)
 Annotate
 Cite
 Re-use (reproduce results)
 Re-purpose (e.g. integrate)
CrossRef Annual Meeting ©2010, MIT
Data Sharing Innovations
 New-fangled Hybrid Articles
 Integrate text, data and tools
 Enhanced PDFs
 Linked Open Data
 Access to data via Web standards to
encourage large-scale interoperability
 “Data Papers”
CrossRef Annual Meeting ©2010, MIT
Issues in Data Curation
 Storage very large scale
 Metadata what standard to use?
 Provenance research methods
 Identifiers scalability, persistence
 Preservation see slide #5 on formats
 Sharing laws confusing, not interoperable
CrossRef Annual Meeting ©2010, MIT
Data Sharing Trends
“The NIH expects and supports the timely release
and sharing of final research data from NIH-
supported studies for use by other
researchers.” NIH grant proposal guide
Similar data management, sharing mandates from
US NSF, other funding agencies worldwide
Journals mandating deposit
(e.g. Journal of Evolutionary Biology)
13
CrossRef Annual Meeting ©2010, MIT
Data Interoperability
IPR and data licenses
 Lots of data not copyrightable since facts cannot be
copyrighted
 UK, EU, some other countries have sui generis
data rights
 Laws not “interoperable”
Big problem for international scientific
collaborations and data re-purposing
14
CrossRef Annual Meeting ©2010, MIT
BWIN presentation ©2010, MIT 15
Libraries and Data
Established curation for some data types
statistical (Harvard-MIT Data Center)
geospatial (Geodata Repository)
bioinformatics (via NLM NCBI)
digital media (e.g. images, videos)
datasets (IR digital archives)
16
CrossRef Annual Meeting ©2010, MIT
BWIN presentation ©2010, MIT
17
Libraries and Data
Applies to both faculty-authored and
externally-acquired data
 Consultation services (in-person, via Website)
 Liaise with data archives (e.g. ICPSR)
 Develop (meta)data standards (e.g. DDI)
 Manage and preserve data
18
CrossRef Annual Meeting ©2010, MIT
BWIN presentation ©2010, MIT 19
Robotics Data in DSpace@MIT
The Library:
 Defined local taxonomy for metadata values
 Customized metadata records
 Adapted/simplified deposit workflow
 Loaded data from previous repository
 Added CC0 licenses
Review of new deposits done by community
20
CrossRef Annual Meeting ©2010, MIT
CrossRef Annual Meeting ©2010, MIT
22
Researcher’s Role: Data Provision
e.g. Sage Commons
“The Sage Commons is a novel information platform being built by an
international partnership of researchers and stakeholders to define the
molecular basis of disease and guide the development of effective
human therapeutics and diagnostics.
The Sage Commons will be used to integrate diverse molecular mega-data
sets, to build predictive bionetworks and to offer advanced tools proven
to provide unique new insights into human disease biology. Users will
also be contributors that advance the knowledge base and tools
through their cumulative participation.
The public access mission of the Sage Commons requires the development
of a new strategic and legal framework to protect the rights of
contributors while providing widespread access to integrative genomics
resources.”
23
CrossRef Annual Meeting ©2010, MIT
Library’s Role: Data
Curation
 Data organization and annotation
e.g. ontologies and metadata
 Data archiving, preservation
e.g. perpetual access
Outreach and support to local researchers
24
CrossRef Annual Meeting ©2010, MIT
Publisher’s Role: Data Accreditation
 Require data deposit to archives
 Publish data journals
 Manage peer review (quality control)
 Provide credit for data publishing
(evolution of promotion & tenure system)
25
CrossRef Annual Meeting ©2010, MIT
Data Papers Revisited
“a formal publication whose primary purpose is to
expose and describe data, as opposed to
analyze and draw conclusions from it.”
1. Organize peer-review, establish quality-control measures
2. Create citable entity
3. Establish cross-linking mechanisms with traditional papers, to enforce
separation of concerns (methodology vs analysis)
4. Specify required documentation to make data re-usable, re-purposable
5. Apply standard interoperable legal license
(CC0 or PDDL with normative attribution, CC-By with URI attribution)
6. Ensure archiving strategy in place
Jonathan Rees, Recommendations for independent scholarly publication of data sets, Creative
Commons Working Paper, March 2010,
http://neurocommons.org/report/data-publication.pdf
CrossRef Annual Meeting ©2010, MIT
Questions?
CrossRef Annual Meeting ©2010, MIT

Contenu connexe

Tendances

Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodei
ASIS&T
 
Comparison of Elementary Dynamic Network Models Using Empirical Data
Comparison of Elementary Dynamic Network Models Using Empirical DataComparison of Elementary Dynamic Network Models Using Empirical Data
Comparison of Elementary Dynamic Network Models Using Empirical Data
Richard Oliver Legendi
 

Tendances (20)

November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
 
Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodei
 
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
 
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
 
Smith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case StudiesSmith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case Studies
 
Altman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data ManagementAltman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data Management
 
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
 
Enabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data InfrastructuresEnabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data Infrastructures
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Burton - Security, Privacy and Trust
Burton - Security, Privacy and TrustBurton - Security, Privacy and Trust
Burton - Security, Privacy and Trust
 
Delivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageDelivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information age
 
Research Data Management and the brave new world, By Paul Ayris
Research Data Management and the brave new world, By Paul AyrisResearch Data Management and the brave new world, By Paul Ayris
Research Data Management and the brave new world, By Paul Ayris
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
The Needs of stakeholders in the RDM process - the role of LEARN
The Needs of stakeholders in the RDM process - the role of LEARNThe Needs of stakeholders in the RDM process - the role of LEARN
The Needs of stakeholders in the RDM process - the role of LEARN
 
A Blueprint for the Research Data Landscape
A Blueprint for the Research Data LandscapeA Blueprint for the Research Data Landscape
A Blueprint for the Research Data Landscape
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?
 
NIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - HandoutNIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - Handout
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhury
 
Comparison of Elementary Dynamic Network Models Using Empirical Data
Comparison of Elementary Dynamic Network Models Using Empirical DataComparison of Elementary Dynamic Network Models Using Empirical Data
Comparison of Elementary Dynamic Network Models Using Empirical Data
 
SHARE Update for CNI, April 2015
SHARE Update for CNI, April 2015SHARE Update for CNI, April 2015
SHARE Update for CNI, April 2015
 

En vedette

The Good, the Bad and the Ugly: What Retractions Tell Us About Scientific Tra...
The Good, the Bad and the Ugly: What Retractions Tell Us About Scientific Tra...The Good, the Bad and the Ugly: What Retractions Tell Us About Scientific Tra...
The Good, the Bad and the Ugly: What Retractions Tell Us About Scientific Tra...
Crossref
 
Industry Project Briefings: PIE-J (2011 CrossRef Workshops)
Industry Project Briefings: PIE-J (2011 CrossRef Workshops)Industry Project Briefings: PIE-J (2011 CrossRef Workshops)
Industry Project Briefings: PIE-J (2011 CrossRef Workshops)
Crossref
 
Support Update (2011 CrossRef Workshops)
Support Update (2011 CrossRef Workshops)Support Update (2011 CrossRef Workshops)
Support Update (2011 CrossRef Workshops)
Crossref
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRef
Crossref
 
COUNTER Update (2011 CrossRef Workshops)
COUNTER Update (2011 CrossRef Workshops)COUNTER Update (2011 CrossRef Workshops)
COUNTER Update (2011 CrossRef Workshops)
Crossref
 

En vedette (6)

The Good, the Bad and the Ugly: What Retractions Tell Us About Scientific Tra...
The Good, the Bad and the Ugly: What Retractions Tell Us About Scientific Tra...The Good, the Bad and the Ugly: What Retractions Tell Us About Scientific Tra...
The Good, the Bad and the Ugly: What Retractions Tell Us About Scientific Tra...
 
Industry Project Briefings: PIE-J (2011 CrossRef Workshops)
Industry Project Briefings: PIE-J (2011 CrossRef Workshops)Industry Project Briefings: PIE-J (2011 CrossRef Workshops)
Industry Project Briefings: PIE-J (2011 CrossRef Workshops)
 
Support Update (2011 CrossRef Workshops)
Support Update (2011 CrossRef Workshops)Support Update (2011 CrossRef Workshops)
Support Update (2011 CrossRef Workshops)
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRef
 
COUNTER Update (2011 CrossRef Workshops)
COUNTER Update (2011 CrossRef Workshops)COUNTER Update (2011 CrossRef Workshops)
COUNTER Update (2011 CrossRef Workshops)
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
 

Similaire à Communicating with Data 2010 Annual Meeting

e-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE Francee-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE France
Jean-François Lutz
 

Similaire à Communicating with Data 2010 Annual Meeting (20)

How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
Linking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual ArchivesLinking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual Archives
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
British Library Datasets Programme Feb 2011
British Library Datasets Programme Feb 2011British Library Datasets Programme Feb 2011
British Library Datasets Programme Feb 2011
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - final
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
 
e-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE Francee-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE France
 
Cologne open access slides dec 2010
Cologne open access slides dec 2010Cologne open access slides dec 2010
Cologne open access slides dec 2010
 
Malcolm Read: Drivers for Open Access and Data - a funder's perspective
Malcolm Read: Drivers for Open Access and Data - a funder's perspectiveMalcolm Read: Drivers for Open Access and Data - a funder's perspective
Malcolm Read: Drivers for Open Access and Data - a funder's perspective
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott Library
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 

Plus de Crossref

Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021  Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref
 
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
Crossref
 
Introduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usarIntroduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usar
Crossref
 

Plus de Crossref (20)

Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
 
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021  Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
 
Seminario web ‘Crossmark’, en español
Seminario web ‘Crossmark’, en español Seminario web ‘Crossmark’, en español
Seminario web ‘Crossmark’, en español
 
Working with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowWorking with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to know
 
Преимущества и варианты использования метаданных в Crossref / The Value and ...
Преимущества и варианты использования метаданных в Crossref /  The Value and ...Преимущества и варианты использования метаданных в Crossref /  The Value and ...
Преимущества и варианты использования метаданных в Crossref / The Value and ...
 
Seminario web ‘Similarity Check’, en español
Seminario web ‘Similarity Check’, en españolSeminario web ‘Similarity Check’, en español
Seminario web ‘Similarity Check’, en español
 
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
 
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
 
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
 
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
 
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
 
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
 Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ... Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
 
Los Metadatos Para la Comunidad de Investigacion
Los Metadatos Para la Comunidad de InvestigacionLos Metadatos Para la Comunidad de Investigacion
Los Metadatos Para la Comunidad de Investigacion
 
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
 
Content Registration, Crossref ALJEBI, Indonesia
Content Registration, Crossref ALJEBI, IndonesiaContent Registration, Crossref ALJEBI, Indonesia
Content Registration, Crossref ALJEBI, Indonesia
 
crossmark update
crossmark updatecrossmark update
crossmark update
 
Participation reports webinar December 2020
Participation reports webinar December 2020Participation reports webinar December 2020
Participation reports webinar December 2020
 
Participation reports webinar November 2020
Participation reports webinar November 2020Participation reports webinar November 2020
Participation reports webinar November 2020
 
Introduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usarIntroduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usar
 
Crossref LIVE UK Online
Crossref LIVE UK OnlineCrossref LIVE UK Online
Crossref LIVE UK Online
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Communicating with Data 2010 Annual Meeting

  • 1. MacKenzie Smith Associate Director for Technology, MIT Libraries Science Commons Research Fellow, Creative Commons CrossRef Annual Meeting ©2010, MIT
  • 2. 2
  • 3. 3
  • 4. The world’s first hard drive (5Mb) IBM Almaden Research Center, 1952-1954 That Was Then 4 CrossRef Annual Meeting ©2010, MIT
  • 5. Current capacity hard drive (>2Tb) Google Data Center, 2010 This is Now 5 CrossRef Annual Meeting ©2010, MIT
  • 6. How Much Information? “IDC research shows that the digital universe —information that is either created, captured, or replicated in digital form — was 281 exabytes in 2007. In 2011, the amount of digital information produced in the year should equal nearly 1,800 exabytes, or 10 times that produced in 2006. The compound annual growth rate between now and 2011 is expected to be almost 60%” The Diverse and Exploding Digital Universe, 2008 IDC White Paper 6 CrossRef Annual Meeting ©2010, MIT
  • 7. How Much Information? 7 Sequence Submissions to DNA DataBank of Japan 1993-2005 CrossRef Annual Meeting ©2010, MIT
  • 8. What Is Research Data? Observational e.g. sensor, telemetry, survey, sample data Experimental e.g. genetic sequences, chromatograms Simulation e.g. climate, economic, 3-D models Media e.g. images, audio, video Derived/compiled e.g. text/data mining, compiled databases Often expensive or impossible to reproduce 8 CrossRef Annual Meeting ©2010, MIT
  • 9. What Is Research Data? Text e.g. flat text files, Word, PDF Numerical e.g. SPSS, STATA, Excel, MySQL Media e.g. jpeg, tiff, dicom, mpeg, quicktime Models e.g. 3D, statistical Software e.g. Java, C programs Domain-specific e.g. FITS in astronomy, CIF in chemistry Instrument-specific e.g. Olympus con-focal microscope Not always in neat packages like books 9 CrossRef Annual Meeting ©2010, MIT
  • 10. What Do Researchers Do With Data?  Analyze (e.g. process, visualize)  Share  Review (evaluate methods)  Annotate  Cite  Re-use (reproduce results)  Re-purpose (e.g. integrate) CrossRef Annual Meeting ©2010, MIT
  • 11. Data Sharing Innovations  New-fangled Hybrid Articles  Integrate text, data and tools  Enhanced PDFs  Linked Open Data  Access to data via Web standards to encourage large-scale interoperability  “Data Papers” CrossRef Annual Meeting ©2010, MIT
  • 12. Issues in Data Curation  Storage very large scale  Metadata what standard to use?  Provenance research methods  Identifiers scalability, persistence  Preservation see slide #5 on formats  Sharing laws confusing, not interoperable CrossRef Annual Meeting ©2010, MIT
  • 13. Data Sharing Trends “The NIH expects and supports the timely release and sharing of final research data from NIH- supported studies for use by other researchers.” NIH grant proposal guide Similar data management, sharing mandates from US NSF, other funding agencies worldwide Journals mandating deposit (e.g. Journal of Evolutionary Biology) 13 CrossRef Annual Meeting ©2010, MIT
  • 14. Data Interoperability IPR and data licenses  Lots of data not copyrightable since facts cannot be copyrighted  UK, EU, some other countries have sui generis data rights  Laws not “interoperable” Big problem for international scientific collaborations and data re-purposing 14 CrossRef Annual Meeting ©2010, MIT
  • 16. Libraries and Data Established curation for some data types statistical (Harvard-MIT Data Center) geospatial (Geodata Repository) bioinformatics (via NLM NCBI) digital media (e.g. images, videos) datasets (IR digital archives) 16 CrossRef Annual Meeting ©2010, MIT
  • 18. Libraries and Data Applies to both faculty-authored and externally-acquired data  Consultation services (in-person, via Website)  Liaise with data archives (e.g. ICPSR)  Develop (meta)data standards (e.g. DDI)  Manage and preserve data 18 CrossRef Annual Meeting ©2010, MIT
  • 20. Robotics Data in DSpace@MIT The Library:  Defined local taxonomy for metadata values  Customized metadata records  Adapted/simplified deposit workflow  Loaded data from previous repository  Added CC0 licenses Review of new deposits done by community 20 CrossRef Annual Meeting ©2010, MIT
  • 21. CrossRef Annual Meeting ©2010, MIT
  • 22. 22
  • 23. Researcher’s Role: Data Provision e.g. Sage Commons “The Sage Commons is a novel information platform being built by an international partnership of researchers and stakeholders to define the molecular basis of disease and guide the development of effective human therapeutics and diagnostics. The Sage Commons will be used to integrate diverse molecular mega-data sets, to build predictive bionetworks and to offer advanced tools proven to provide unique new insights into human disease biology. Users will also be contributors that advance the knowledge base and tools through their cumulative participation. The public access mission of the Sage Commons requires the development of a new strategic and legal framework to protect the rights of contributors while providing widespread access to integrative genomics resources.” 23 CrossRef Annual Meeting ©2010, MIT
  • 24. Library’s Role: Data Curation  Data organization and annotation e.g. ontologies and metadata  Data archiving, preservation e.g. perpetual access Outreach and support to local researchers 24 CrossRef Annual Meeting ©2010, MIT
  • 25. Publisher’s Role: Data Accreditation  Require data deposit to archives  Publish data journals  Manage peer review (quality control)  Provide credit for data publishing (evolution of promotion & tenure system) 25 CrossRef Annual Meeting ©2010, MIT
  • 26. Data Papers Revisited “a formal publication whose primary purpose is to expose and describe data, as opposed to analyze and draw conclusions from it.” 1. Organize peer-review, establish quality-control measures 2. Create citable entity 3. Establish cross-linking mechanisms with traditional papers, to enforce separation of concerns (methodology vs analysis) 4. Specify required documentation to make data re-usable, re-purposable 5. Apply standard interoperable legal license (CC0 or PDDL with normative attribution, CC-By with URI attribution) 6. Ensure archiving strategy in place Jonathan Rees, Recommendations for independent scholarly publication of data sets, Creative Commons Working Paper, March 2010, http://neurocommons.org/report/data-publication.pdf CrossRef Annual Meeting ©2010, MIT