SlideShare une entreprise Scribd logo
1  sur  34
Data Publishing Workflows
with Dataverse
Mercè Crosas, Ph.D.
Twitter: @mercecrosas
Director of Data Science
Institute for Quantitative Social Science, Harvard University
MIT, May 6, 2014
Intro to our Data Science
Team and Projects
Data Science at the Institute for Quantitative Social Science
http://datascience.iq.harvard.edu
Combines Expertise
Data Science
Applications
and Tools
Researchers
Software
Engineers
Information
Scientists
Statistical
Innovation
Tool Building &
Computer Science
Data Curation &
Stewardship
With a Team of 20
Mercè Crosas,
Director of Data Science
Statistics and Analytics
James Honaker
Christine Choirat
Vito d’Orazio
Software Development
Gustavo Durand
Robert Treacy
Ellen Kraffmiller
Michael Bar-Sinai
Leonid Andreev
Phil Durbin
Steve Kraffmiller
Xiangqing Yang
Raman Prasad (BARI)
Data Curation and Archiving
Sonia Barbosa
Eleni Castro
Dwayne Liburd
QA
Kevin Condon
Elda Sotiri
Usability and UI
Elizabeth Quigley
Michael Heppler
Gary King,
Director of IQSS
Cris Rothfuss,
Excutive Director
Two widely-Used Frameworks
Developed in the last Decade
A framework that allows analysts to use and interpret
a large body of R statistical models from
heterogeneous contributors through a common
interface.
A data publishing framework that allows researchers
to share, preserve, cite and analyze data, while
keeping control and gaining credit for their data.
New Tools that Integrate with our Initial Work
An interactive web interface that allows users at all
levels of statistical expertise to explore their data and
appropriately construct statistical models.
Integrates with Zelig and Dataverse.
A framework that allows data contributors to set a
level of sensitivity for their dataset based on legal
regulations, which defines how the data can be
stored and shared.
Integrates with Dataverse.
In collaboration with NSF Privacy Tools project
Expanding in other Areas
A web application that assists researchers to discover
new clusters to categorize large document sets,
leveraging all the clustering methods in the literature.
An application that provides a continuous
integration build solution for R packages shared in
Git to archived published code in CRAN.
Support Throughout the Research Cycle
Develop
Quantitative
Methods
Analyze
Quantitative
Datasets
Analyze
Unstructured
Text
Publish Data
Cite Data from
Published Results
Explore,
reanalyze and
reuse dataShare Sensitive Data
Develop > Analyze > Share > Explore > Validate & Reuse
Current Research Interests and Efforts
Reproducible and Reusable Science: “encourage open data and
methodological transparency, and promote and enable data
citation” (with Dataverse, Zelig and SolaFide)
Computationally Assisted Exploration: “with Consilience and SolaFide,
assist researchers to understand and discover new insights from their
data”
Interdisciplinary Quantitative Scientific Scope: “our tools and research
frameworks address broad methodological issues in quantitative
science and are often employed in other domains”
When Data are Not Open: “solutions to preserve privacy, while still
providing science the fundamental ability to learn, access and
replicate findings, with DataTags and PrivateZelig”
Large-Scale Data Sets: “will handle large-scale data sets, as Big Data
science reaches all disciplines: Consilience for millions of text
documents, and Zelig and Dataverse to handle TB-PB-scale data sets.”
Harvard Dataverse
The Harvard Dataverse Repository
 In collaboration with the Harvard Library, Harvard hosts a
Dataverse instance free and open to all researchers.
 It currently holds > 53,000 datasets, with 735,000 files.
 Find or deposit data at: http://thedata.harvard.edu
Collaborations with MIT
 Membership through the Harvard-MIT Data Center (e.g.,
statistics training, access to ICPSR collection)
 The MIT Libraries Dataverse disseminates data purchased
by the MIT Libraries (with Kate McNeill):
 http://thedata.harvard.edu/dvn/dv/mit
 MIT faculty and research groups are already
disseminating their data through the Harvard Dataverse
 Research collaborations (with Micah Altman):
 Integration of Publications with Data (Funded by Sloan):
http://projects.iq.harvard.edu/ojs-dvn
 Privacy Tools for Sharing Research Data (Funded by NSF):
http://privacytools.seas.harvard.edu/
Dataverse 4.0
Target release date: June 23
• New UI
• New rich, faceted search
• New data file ingest
(excel, CSV, R, Stata,
SPSS)
• New metadata for social
sciences, astronomy,
biomedical sciences.
• Integration with SolaFide.
SolaFide Demo
Data Publishing Workflows
Data Publishing Guidelines
Three pillars to Data Publishing:
 A trusted data repository to guarantee long-term access
 A formal data citation*
 Sufficient information to understand and reuse the data
(metadata, documentation, code)
* Data Citation Principles: https://www.force11.org/datacitation
A Rigorous Publishing Workflow
Release Version 1
A Published Dataset
cannot be deleted
(only deaccession, if
legally needed)
Push Version 1.1: small metadata
change; citation doesn’t change
Push Version 2: big metadata
change, or file change; citation
changes
Authors, Title, Year, DOI Repository, UNF, V1
Authors, Title, Year, DOI Repository, UNF, V2
Workflows that Integrate with Journals
1. Publish a dataset to your Dataverse, then provide the
Data Citation to the journal.
2. Contribute to a journal Dataverse:
1. Add dataset to Journal Dataverse as a draft.
2. Journal Editor reviews it, and approves it for release.
3. Dataset is published with Data Citation and link from journal
article to the data.
3. Seamless Integration between journal system and
Dataverse.
OJS and Dataverse Integration
 Sloan funded project to integrate PKP’s Open Journal
System with the Dataverse software.
 Pilot with ~ 50 journals
 OJS Dataverse plugin now available with latest OJS
release
 http://projects.iq.harvard.edu/ojs-dvn
Detailed System Integration
 XML file: AtomPub "entry" with Dublin Core Terms (e.g., title, creator)
 Zip file: All data files associated with that dataset.
 HTTP header "In-Progress: false" to publish datasets.
 Support HTTP verbs: GET, PUT, POST, and DELETE.
 XML file: “Deposit Receipt”
 HTTP status code: 200, 201, 204, 404, 405, 406, 412, 415
Client can query repository (server) any time to get status
Deposit API based on SWORD
 Follows SWORD2 specifications
 SWORD is known and supported within academic
publishing; a “profile” of the AtomPub standard.
 The SWORD project provides client libraries for Python,
Java, Ruby, and PHP:
 OJS uses the PHP client library
 OSF uses the Python client library
 DataUp and DVN-R built a custom Dataverse client
How it differs from SWORD
 Dataverse does not use SWORD download API:
 Use own Data API
 Plan to add this support in the future
 Add XML attribute to pass article citation from client:
 Allow DCterms:isReferencedby to contain attributes such as
HoldingsURI to link back to article from Dataverse
 This is now part of the SWORD PHP client library
 Use “In-Progress: false” to indicate that dataset is ready
to be published (In SWORD spec means deposit
complete)
Support for Metadata Standards
 A core or citation metadata that applies to all datasets –
Supported currently by Data Deposit API
 Extensible metadata blocks for specific domains:
 Social sciences:
 Maps to DDI schema;
 file metadata extracted from tabular data file
 Astronomy:
 Maps to VO schema;
 partially extracted from FITS file
 Biomedical sciences:
 Maps to ISA-tab schema
 Controlled vocabularies maps to EFO, OBI, and Ontology of
Clinical Research
 Extended and managed using SKOS (support taxonomies
within the framework of the semantic web)
Upcoming
Expanding to Larger and More Types
of Data
 Sharing sensitive data with DataTags and Secure
Dataverse
 Integration with other systems:
 OSF
 DataUp
 WorldMap
 DataBridge
 ORCID
 DASH (at Harvard)
 Expand to Larger data sets
DataTags: For Sharing Sensitive Data
THANKS
mcrosas@iq.harvard.edu Twitter: mercecrosas
http://datascience.iq.harvard.edu (Beta)

Contenu connexe

Tendances

Introduction to Altmetrics
Introduction to Altmetrics Introduction to Altmetrics
Introduction to Altmetrics Linda Galloway
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...Susanna-Assunta Sansone
 
Scopus:Workshops on Scopus for Literature Searching and Research Impact
Scopus:Workshops on Scopus for Literature Searching and Research ImpactScopus:Workshops on Scopus for Literature Searching and Research Impact
Scopus:Workshops on Scopus for Literature Searching and Research Impactmotqin
 
Introduction to Altmetrics for Medical and Special Librarians
Introduction to Altmetrics for Medical and Special LibrariansIntroduction to Altmetrics for Medical and Special Librarians
Introduction to Altmetrics for Medical and Special LibrariansLinda Galloway
 
Joining the ‘buzz’ : the role of social media in raising research visibility
Joining the ‘buzz’ : the role of social media in raising research visibilityJoining the ‘buzz’ : the role of social media in raising research visibility
Joining the ‘buzz’ : the role of social media in raising research visibility Eileen Shepherd
 
Citation analysis: State of the art, good practices, and future developments
Citation analysis: State of the art, good practices, and future developmentsCitation analysis: State of the art, good practices, and future developments
Citation analysis: State of the art, good practices, and future developmentsLudo Waltman
 
Effective search of bibliographic databases
Effective search of bibliographic databasesEffective search of bibliographic databases
Effective search of bibliographic databasesTarek Tawfik Amin
 
Altmetrics apples and oranges
Altmetrics apples and orangesAltmetrics apples and oranges
Altmetrics apples and orangesPat Loria
 
Mendeley Open API
Mendeley Open APIMendeley Open API
Mendeley Open APIBen Dowling
 
Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13DataDryad
 
Google Scholar Citations... Own your profile!
Google Scholar Citations... Own your profile!Google Scholar Citations... Own your profile!
Google Scholar Citations... Own your profile!Linda Galloway
 
Are the scientists on to something altmetrics 6 16
Are the scientists on to something altmetrics 6 16Are the scientists on to something altmetrics 6 16
Are the scientists on to something altmetrics 6 16Katie Brown
 
Scientometric Mapping of Library and Information Science in Web of Science
Scientometric Mapping of Library and Information Science in Web of Science Scientometric Mapping of Library and Information Science in Web of Science
Scientometric Mapping of Library and Information Science in Web of Science 8638812142
 
Presentation on Scopus
Presentation on ScopusPresentation on Scopus
Presentation on ScopusRajiv Mahmud
 
Citation Management Using Mendeley Software
Citation Management  Using Mendeley SoftwareCitation Management  Using Mendeley Software
Citation Management Using Mendeley SoftwareDave Marcial
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014Susanna-Assunta Sansone
 

Tendances (20)

Introduction to Altmetrics
Introduction to Altmetrics Introduction to Altmetrics
Introduction to Altmetrics
 
Sociology 270 final
Sociology 270 finalSociology 270 final
Sociology 270 final
 
Scientometrics
Scientometrics Scientometrics
Scientometrics
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
Scopus:Workshops on Scopus for Literature Searching and Research Impact
Scopus:Workshops on Scopus for Literature Searching and Research ImpactScopus:Workshops on Scopus for Literature Searching and Research Impact
Scopus:Workshops on Scopus for Literature Searching and Research Impact
 
Introduction to Altmetrics for Medical and Special Librarians
Introduction to Altmetrics for Medical and Special LibrariansIntroduction to Altmetrics for Medical and Special Librarians
Introduction to Altmetrics for Medical and Special Librarians
 
Joining the ‘buzz’ : the role of social media in raising research visibility
Joining the ‘buzz’ : the role of social media in raising research visibilityJoining the ‘buzz’ : the role of social media in raising research visibility
Joining the ‘buzz’ : the role of social media in raising research visibility
 
Citation analysis: State of the art, good practices, and future developments
Citation analysis: State of the art, good practices, and future developmentsCitation analysis: State of the art, good practices, and future developments
Citation analysis: State of the art, good practices, and future developments
 
Effective search of bibliographic databases
Effective search of bibliographic databasesEffective search of bibliographic databases
Effective search of bibliographic databases
 
Altmetrics apples and oranges
Altmetrics apples and orangesAltmetrics apples and oranges
Altmetrics apples and oranges
 
Mendeley Open API
Mendeley Open APIMendeley Open API
Mendeley Open API
 
Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13
 
Google Scholar Citations... Own your profile!
Google Scholar Citations... Own your profile!Google Scholar Citations... Own your profile!
Google Scholar Citations... Own your profile!
 
Are the scientists on to something altmetrics 6 16
Are the scientists on to something altmetrics 6 16Are the scientists on to something altmetrics 6 16
Are the scientists on to something altmetrics 6 16
 
Scientometric Mapping of Library and Information Science in Web of Science
Scientometric Mapping of Library and Information Science in Web of Science Scientometric Mapping of Library and Information Science in Web of Science
Scientometric Mapping of Library and Information Science in Web of Science
 
Presentation on Scopus
Presentation on ScopusPresentation on Scopus
Presentation on Scopus
 
Citation Management Using Mendeley Software
Citation Management  Using Mendeley SoftwareCitation Management  Using Mendeley Software
Citation Management Using Mendeley Software
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
 
Presentation1
Presentation1Presentation1
Presentation1
 
hack4knowledge - Mendeley API
hack4knowledge - Mendeley APIhack4knowledge - Mendeley API
hack4knowledge - Mendeley API
 

En vedette

Dataverse 4.0 UX by Elizabeth Quigley
Dataverse 4.0 UX by Elizabeth QuigleyDataverse 4.0 UX by Elizabeth Quigley
Dataverse 4.0 UX by Elizabeth Quigleydatascienceiqss
 
Dataverse Netowrk Project
Dataverse Netowrk ProjectDataverse Netowrk Project
Dataverse Netowrk ProjectJulie Goldman
 
Dataverse: Helping Researchers Publish Their Data Through Automation
Dataverse: Helping Researchers Publish Their Data Through Automation�Dataverse: Helping Researchers Publish Their Data Through Automation�
Dataverse: Helping Researchers Publish Their Data Through AutomationEleni Castro, MLIS
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgmandatascienceiqss
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 
APLIC 2014 - Dataverse Project
APLIC 2014 - Dataverse ProjectAPLIC 2014 - Dataverse Project
APLIC 2014 - Dataverse ProjectAPLICwebmaster
 

En vedette (6)

Dataverse 4.0 UX by Elizabeth Quigley
Dataverse 4.0 UX by Elizabeth QuigleyDataverse 4.0 UX by Elizabeth Quigley
Dataverse 4.0 UX by Elizabeth Quigley
 
Dataverse Netowrk Project
Dataverse Netowrk ProjectDataverse Netowrk Project
Dataverse Netowrk Project
 
Dataverse: Helping Researchers Publish Their Data Through Automation
Dataverse: Helping Researchers Publish Their Data Through Automation�Dataverse: Helping Researchers Publish Their Data Through Automation�
Dataverse: Helping Researchers Publish Their Data Through Automation
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgman
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
APLIC 2014 - Dataverse Project
APLIC 2014 - Dataverse ProjectAPLIC 2014 - Dataverse Project
APLIC 2014 - Dataverse Project
 

Similaire à Data Publishing Workflows with Dataverse

Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCarole Goble
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSMicah Altman
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOCMerce Crosas
 
Effective research data management
Effective research data managementEffective research data management
Effective research data managementCatherine Gold
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for JournalsMerce Crosas
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...DeVonne Parks, CEM
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordJisc
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas
 
ODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific DataODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific Datadatacite
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Laurent Alquier
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 

Similaire à Data Publishing Workflows with Dataverse (20)

Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for Journals
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
ODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific DataODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific Data
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 

Plus de Micah Altman

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesMicah Altman
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 

Plus de Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 

Dernier

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Dernier (20)

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Data Publishing Workflows with Dataverse

  • 1. Data Publishing Workflows with Dataverse Mercè Crosas, Ph.D. Twitter: @mercecrosas Director of Data Science Institute for Quantitative Social Science, Harvard University MIT, May 6, 2014
  • 2. Intro to our Data Science Team and Projects
  • 3. Data Science at the Institute for Quantitative Social Science http://datascience.iq.harvard.edu
  • 4. Combines Expertise Data Science Applications and Tools Researchers Software Engineers Information Scientists Statistical Innovation Tool Building & Computer Science Data Curation & Stewardship
  • 5. With a Team of 20 Mercè Crosas, Director of Data Science Statistics and Analytics James Honaker Christine Choirat Vito d’Orazio Software Development Gustavo Durand Robert Treacy Ellen Kraffmiller Michael Bar-Sinai Leonid Andreev Phil Durbin Steve Kraffmiller Xiangqing Yang Raman Prasad (BARI) Data Curation and Archiving Sonia Barbosa Eleni Castro Dwayne Liburd QA Kevin Condon Elda Sotiri Usability and UI Elizabeth Quigley Michael Heppler Gary King, Director of IQSS Cris Rothfuss, Excutive Director
  • 6. Two widely-Used Frameworks Developed in the last Decade A framework that allows analysts to use and interpret a large body of R statistical models from heterogeneous contributors through a common interface. A data publishing framework that allows researchers to share, preserve, cite and analyze data, while keeping control and gaining credit for their data.
  • 7. New Tools that Integrate with our Initial Work An interactive web interface that allows users at all levels of statistical expertise to explore their data and appropriately construct statistical models. Integrates with Zelig and Dataverse. A framework that allows data contributors to set a level of sensitivity for their dataset based on legal regulations, which defines how the data can be stored and shared. Integrates with Dataverse. In collaboration with NSF Privacy Tools project
  • 8. Expanding in other Areas A web application that assists researchers to discover new clusters to categorize large document sets, leveraging all the clustering methods in the literature. An application that provides a continuous integration build solution for R packages shared in Git to archived published code in CRAN.
  • 9. Support Throughout the Research Cycle Develop Quantitative Methods Analyze Quantitative Datasets Analyze Unstructured Text Publish Data Cite Data from Published Results Explore, reanalyze and reuse dataShare Sensitive Data Develop > Analyze > Share > Explore > Validate & Reuse
  • 10. Current Research Interests and Efforts Reproducible and Reusable Science: “encourage open data and methodological transparency, and promote and enable data citation” (with Dataverse, Zelig and SolaFide) Computationally Assisted Exploration: “with Consilience and SolaFide, assist researchers to understand and discover new insights from their data” Interdisciplinary Quantitative Scientific Scope: “our tools and research frameworks address broad methodological issues in quantitative science and are often employed in other domains” When Data are Not Open: “solutions to preserve privacy, while still providing science the fundamental ability to learn, access and replicate findings, with DataTags and PrivateZelig” Large-Scale Data Sets: “will handle large-scale data sets, as Big Data science reaches all disciplines: Consilience for millions of text documents, and Zelig and Dataverse to handle TB-PB-scale data sets.”
  • 12. The Harvard Dataverse Repository  In collaboration with the Harvard Library, Harvard hosts a Dataverse instance free and open to all researchers.  It currently holds > 53,000 datasets, with 735,000 files.  Find or deposit data at: http://thedata.harvard.edu
  • 13. Collaborations with MIT  Membership through the Harvard-MIT Data Center (e.g., statistics training, access to ICPSR collection)  The MIT Libraries Dataverse disseminates data purchased by the MIT Libraries (with Kate McNeill):  http://thedata.harvard.edu/dvn/dv/mit  MIT faculty and research groups are already disseminating their data through the Harvard Dataverse  Research collaborations (with Micah Altman):  Integration of Publications with Data (Funded by Sloan): http://projects.iq.harvard.edu/ojs-dvn  Privacy Tools for Sharing Research Data (Funded by NSF): http://privacytools.seas.harvard.edu/
  • 14. Dataverse 4.0 Target release date: June 23 • New UI • New rich, faceted search • New data file ingest (excel, CSV, R, Stata, SPSS) • New metadata for social sciences, astronomy, biomedical sciences. • Integration with SolaFide.
  • 17. Data Publishing Guidelines Three pillars to Data Publishing:  A trusted data repository to guarantee long-term access  A formal data citation*  Sufficient information to understand and reuse the data (metadata, documentation, code) * Data Citation Principles: https://www.force11.org/datacitation
  • 18. A Rigorous Publishing Workflow Release Version 1 A Published Dataset cannot be deleted (only deaccession, if legally needed) Push Version 1.1: small metadata change; citation doesn’t change Push Version 2: big metadata change, or file change; citation changes Authors, Title, Year, DOI Repository, UNF, V1 Authors, Title, Year, DOI Repository, UNF, V2
  • 19. Workflows that Integrate with Journals 1. Publish a dataset to your Dataverse, then provide the Data Citation to the journal. 2. Contribute to a journal Dataverse: 1. Add dataset to Journal Dataverse as a draft. 2. Journal Editor reviews it, and approves it for release. 3. Dataset is published with Data Citation and link from journal article to the data. 3. Seamless Integration between journal system and Dataverse.
  • 20. OJS and Dataverse Integration  Sloan funded project to integrate PKP’s Open Journal System with the Dataverse software.  Pilot with ~ 50 journals  OJS Dataverse plugin now available with latest OJS release  http://projects.iq.harvard.edu/ojs-dvn
  • 21. Detailed System Integration  XML file: AtomPub "entry" with Dublin Core Terms (e.g., title, creator)  Zip file: All data files associated with that dataset.  HTTP header "In-Progress: false" to publish datasets.  Support HTTP verbs: GET, PUT, POST, and DELETE.  XML file: “Deposit Receipt”  HTTP status code: 200, 201, 204, 404, 405, 406, 412, 415 Client can query repository (server) any time to get status
  • 22. Deposit API based on SWORD  Follows SWORD2 specifications  SWORD is known and supported within academic publishing; a “profile” of the AtomPub standard.  The SWORD project provides client libraries for Python, Java, Ruby, and PHP:  OJS uses the PHP client library  OSF uses the Python client library  DataUp and DVN-R built a custom Dataverse client
  • 23. How it differs from SWORD  Dataverse does not use SWORD download API:  Use own Data API  Plan to add this support in the future  Add XML attribute to pass article citation from client:  Allow DCterms:isReferencedby to contain attributes such as HoldingsURI to link back to article from Dataverse  This is now part of the SWORD PHP client library  Use “In-Progress: false” to indicate that dataset is ready to be published (In SWORD spec means deposit complete)
  • 24. Support for Metadata Standards  A core or citation metadata that applies to all datasets – Supported currently by Data Deposit API  Extensible metadata blocks for specific domains:  Social sciences:  Maps to DDI schema;  file metadata extracted from tabular data file  Astronomy:  Maps to VO schema;  partially extracted from FITS file  Biomedical sciences:  Maps to ISA-tab schema  Controlled vocabularies maps to EFO, OBI, and Ontology of Clinical Research  Extended and managed using SKOS (support taxonomies within the framework of the semantic web)
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 31. Expanding to Larger and More Types of Data  Sharing sensitive data with DataTags and Secure Dataverse  Integration with other systems:  OSF  DataUp  WorldMap  DataBridge  ORCID  DASH (at Harvard)  Expand to Larger data sets
  • 32. DataTags: For Sharing Sensitive Data
  • 33.