SlideShare a Scribd company logo
1 of 130
Download to read offline
The Rise of Data Publishing
in the Digital World
(and how Dataverse and DataTags help)
Mercè Crosas, Ph.D.
Chief Data Science and Technology Officer
Institute for Quantitive Social Science
Harvard University
@mercecrosas
NDSR 2016 Symposium
From 1665 to late 20th century:
A steady increase in size and
complexity of research output
The number of journals doubles every 20 years
since 1750s, with growth of number of scientists
1665 1765 1865 1965
100
10000
Mabe, 2003
The number of journals doubles every 20 years
since 1750s, with growth of number of scientists
1700: 3 journals
1665 1765 1865 1965
100
10000
Mabe, 2003
The number of journals doubles every 20 years
since 1750s, with growth of number of scientists
1700: 3 journals
1800: ~10 journals
1665 1765 1865 1965
100
10000
Mabe, 2003
The number of journals doubles every 20 years
since 1750s, with growth of number of scientists
1700: 3 journals
1800: ~10 journals
1900: ~400 journals
1665 1765 1865 1965
100
10000
Mabe, 2003
The number of journals doubles every 20 years
since 1750s, with growth of number of scientists
1700: 3 journals
1800: ~10 journals
1900: ~400 journals
2000: ~14,000 journals
(peer-reviewed)
1665 1765 1865 1965
100
10000
Mabe, 2003
1665 1765 1865 1965
100
10000
Data Tables andVisuals Become Increasingly
Common, and part of the Scientific Argument
1665 1765 1865 1965
100
10000
Data Tables andVisuals Become Increasingly
Common, and part of the Scientific Argument
a few tables &
visuals, as part of
the text
1665 1765 1865 1965
100
10000
Data Tables andVisuals Become Increasingly
Common, and part of the Scientific Argument
a few tables &
visuals, as part of
the text 50% cite previous
work
1665 1765 1865 1965
100
10000
Data Tables andVisuals Become Increasingly
Common, and part of the Scientific Argument
a few tables &
visuals, as part of
the text 50% cite previous
work
First Line Graphs
and bar charts
(Playfair, 1786)
1665 1765 1865 1965
100
10000
Data Tables andVisuals Become Increasingly
Common, and part of the Scientific Argument
a few tables &
visuals, as part of
the text
50% of articles have
tables & figures
50% cite previous
work
First Line Graphs
and bar charts
(Playfair, 1786)
1665 1765 1865 1965
100
10000
Data Tables andVisuals Become Increasingly
Common, and part of the Scientific Argument
a few tables &
visuals, as part of
the text
50% of articles have
tables & figures
50% cite previous
work
method sections
appear
First Line Graphs
and bar charts
(Playfair, 1786)
1665 1765 1865 1965
100
10000
Data Tables andVisuals Become Increasingly
Common, and part of the Scientific Argument
a few tables &
visuals, as part of
the text
50% of articles have
tables & figures
50% cite previous
work
method sections
appear
First Line Graphs
and bar charts
(Playfair, 1786)
First Scatterplots
(Hershel,1833;
Galton 1896)
1665 1765 1865 1965
100
10000
Data Tables andVisuals Become Increasingly
Common, and part of the Scientific Argument
a few tables &
visuals, as part of
the text
50% of articles have
tables & figures
most articles
have tables &
figures, often
standalone
50% cite previous
work
method sections
appear
First Line Graphs
and bar charts
(Playfair, 1786)
First Scatterplots
(Hershel,1833;
Galton 1896)
1665 1765 1865 1965
100
10000
Data Tables andVisuals Become Increasingly
Common, and part of the Scientific Argument
a few tables &
visuals, as part of
the text
50% of articles have
tables & figures
most articles
have tables &
figures, often
standalone
50% cite previous
work
100% with citations
(1 per 100 words)
part of scholarly credit
method sections
appear
First Line Graphs
and bar charts
(Playfair, 1786)
First Scatterplots
(Hershel,1833;
Galton 1896)
1665 1765 1865 1965
100
10000
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
• 19th century:
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
• 19th century:
• explain data instead of establish observations of facts
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
• 19th century:
• explain data instead of establish observations of facts
• wide use of visuals, high citation density, methods section
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
• 19th century:
• explain data instead of establish observations of facts
• wide use of visuals, high citation density, methods section
• 20th century:
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
• 19th century:
• explain data instead of establish observations of facts
• wide use of visuals, high citation density, methods section
• 20th century:
• structured quantitative data with increased use of statistics
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
• 19th century:
• explain data instead of establish observations of facts
• wide use of visuals, high citation density, methods section
• 20th century:
• structured quantitative data with increased use of statistics
• wide range of data types with new technologies
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
• 19th century:
• explain data instead of establish observations of facts
• wide use of visuals, high citation density, methods section
• 20th century:
• structured quantitative data with increased use of statistics
• wide range of data types with new technologies
• Number of scientists increases from 100s to a few millions
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
• 19th century:
• explain data instead of establish observations of facts
• wide use of visuals, high citation density, methods section
• 20th century:
• structured quantitative data with increased use of statistics
• wide range of data types with new technologies
• Number of scientists increases from 100s to a few millions
• Science becomes extremely specialized:
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
• 19th century:
• explain data instead of establish observations of facts
• wide use of visuals, high citation density, methods section
• 20th century:
• structured quantitative data with increased use of statistics
• wide range of data types with new technologies
• Number of scientists increases from 100s to a few millions
• Science becomes extremely specialized:
• from 1 journal to 14,000 peer-reviewed journals
Scholarly Publishing Adapts to the
Increase of Cognitive Complexity (Gross et al 2001)
• 18th century:
• formal components appear in articles (introduction,
conclusions, table, figures, citations)
• 19th century:
• explain data instead of establish observations of facts
• wide use of visuals, high citation density, methods section
• 20th century:
• structured quantitative data with increased use of statistics
• wide range of data types with new technologies
• Number of scientists increases from 100s to a few millions
• Science becomes extremely specialized:
• from 1 journal to 14,000 peer-reviewed journals
• one new journal for each 150 authors, read by 500
In the last decades, more
and more publications
and data
A Steeper Growth of Scholarly Output
Since 1950, the total number of journals doubles every ~15 years
2010: 80,000 journals
2010: 33,000 peer-reviewed
An Outburst of Research Data and Specialization,
Results into > 1000 Community Repositories
An Outburst of Research Data and Specialization,
Results into > 1000 Community Repositories
1920 - 1950s
An Outburst of Research Data and Specialization,
Results into > 1000 Community Repositories
First Social Science
Data Archives
(ODUM, ICPSR, ...)
1920 - 1950s
An Outburst of Research Data and Specialization,
Results into > 1000 Community Repositories
First Social Science
Data Archives
(ODUM, ICPSR, ...)
1920 - 1950s 1970 - 1980s
An Outburst of Research Data and Specialization,
Results into > 1000 Community Repositories
First Social Science
Data Archives
(ODUM, ICPSR, ...)
First Biomedical
Databases
(PDB, GenBank, ...)
1920 - 1950s 1970 - 1980s
An Outburst of Research Data and Specialization,
Results into > 1000 Community Repositories
First Social Science
Data Archives
(ODUM, ICPSR, ...)
First Biomedical
Databases
(PDB, GenBank, ...)
1920 - 1950s 1970 - 1980s 2016
An Outburst of Research Data and Specialization,
Results into > 1000 Community Repositories
First Social Science
Data Archives
(ODUM, ICPSR, ...)
A wide range of
Research Data
Repositories
First Biomedical
Databases
(PDB, GenBank, ...)
1920 - 1950s 1970 - 1980s 2016
An Outburst of Research Data and Specialization,
Results into > 1000 Community Repositories
First Social Science
Data Archives
(ODUM, ICPSR, ...)
A wide range of
Research Data
Repositories
First Biomedical
Databases
(PDB, GenBank, ...)
1500 repositories listed in re3data.org
1920 - 1950s 1970 - 1980s 2016
Data Publishing Emerges as the Union of
Scholarly Publishing and Data Archiving
Data Publishing Emerges as the Union of
Scholarly Publishing and Data Archiving
Scholarly publishing:
Distribute research output
Data Publishing Emerges as the Union of
Scholarly Publishing and Data Archiving
Scholarly publishing:
Distribute research output
• Attribution and credit
Data Publishing Emerges as the Union of
Scholarly Publishing and Data Archiving
Scholarly publishing:
Distribute research output
• Attribution and credit
• Dissemination
Data Publishing Emerges as the Union of
Scholarly Publishing and Data Archiving
Scholarly publishing:
Distribute research output
• Attribution and credit
• Dissemination
• Finding & Reuse
Data Publishing Emerges as the Union of
Scholarly Publishing and Data Archiving
Scholarly publishing:
Distribute research output
• Attribution and credit
• Dissemination
• Finding & Reuse
Data Archiving:
Long-term access to data
Data Publishing Emerges as the Union of
Scholarly Publishing and Data Archiving
Scholarly publishing:
Distribute research output
• Attribution and credit
• Dissemination
• Finding & Reuse
Data Archiving:
Long-term access to data
• Accessibility
Data Publishing Emerges as the Union of
Scholarly Publishing and Data Archiving
Scholarly publishing:
Distribute research output
• Attribution and credit
• Dissemination
• Finding & Reuse
Data Archiving:
Long-term access to data
• Accessibility
• Preservation
Data Publishing Emerges as the Union of
Scholarly Publishing and Data Archiving
Scholarly publishing:
Distribute research output
• Attribution and credit
• Dissemination
• Finding & Reuse
Data Archiving:
Long-term access to data
• Accessibility
• Preservation
• Finding & Reuse
Why Data Publishing now?
Why Data Publishing now?
Extending Gross et al. thesis, data publishing accommodates the
complexity of research input and output in the digital world.
Why Data Publishing now?
Extending Gross et al. thesis, data publishing accommodates the
complexity of research input and output in the digital world.
Why Data Publishing now?
• Data (and software) have become common input and
output of research
Extending Gross et al. thesis, data publishing accommodates the
complexity of research input and output in the digital world.
Why Data Publishing now?
• Data (and software) have become common input and
output of research
• A scholarly article cannot hold or describe accurately these
vast amounts of data and software
Extending Gross et al. thesis, data publishing accommodates the
complexity of research input and output in the digital world.
Why Data Publishing now?
• Data (and software) have become common input and
output of research
• A scholarly article cannot hold or describe accurately these
vast amounts of data and software
• As input and output of research, data must be citable and
accessible to enable validation and reuse, with attribution
Extending Gross et al. thesis, data publishing accommodates the
complexity of research input and output in the digital world.
What is needed for FAIR Data Publishing
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
• Persistent id to
reference data uniquely
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
• Persistent id to
reference data uniquely
• Support for versions
and fixity
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
• Persistent id to
reference data uniquely
• Support for versions
and fixity
• Attribution to authors
and repository
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
• Persistent id to
reference data uniquely
• Support for versions
and fixity
• Attribution to authors
and repository
Metadata
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
• Persistent id to
reference data uniquely
• Support for versions
and fixity
• Attribution to authors
and repository
Metadata
• Catalog to discover and
locate the data
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
• Persistent id to
reference data uniquely
• Support for versions
and fixity
• Attribution to authors
and repository
Metadata
• Catalog to discover and
locate the data
• Sufficient information to
understand and reuse the
data
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
• Persistent id to
reference data uniquely
• Support for versions
and fixity
• Attribution to authors
and repository
Metadata
• Catalog to discover and
locate the data
• Sufficient information to
understand and reuse the
data
Repository
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
• Persistent id to
reference data uniquely
• Support for versions
and fixity
• Attribution to authors
and repository
Metadata
• Catalog to discover and
locate the data
• Sufficient information to
understand and reuse the
data
Repository
• Digital access to metadata
and data
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
• Persistent id to
reference data uniquely
• Support for versions
and fixity
• Attribution to authors
and repository
Metadata
• Catalog to discover and
locate the data
• Sufficient information to
understand and reuse the
data
Repository
• Digital access to metadata
and data
• Archive and preservation for
long-term access
FAIR = Findable Accessible Interoperable Reusable
What is needed for FAIR Data Publishing
Data Citation
• Persistent id to
reference data uniquely
• Support for versions
and fixity
• Attribution to authors
and repository
Metadata
• Catalog to discover and
locate the data
• Sufficient information to
understand and reuse the
data
Repository
• Digital access to metadata
and data
• Archive and preservation for
long-term access
• Interoperability through
standards and APIs
FAIR = Findable Accessible Interoperable Reusable
A data repository system that serves as a
solution for publishing FAIR research data
Around the World
Dataverse repositories serve a community, an institution, an archive, ...
Around the World
Harvard Dataverse:
Generic data repository open
to researchers world wide
Dataverse repositories serve a community, an institution, an archive, ...
Dataverses contain datasets,
datasets contain metadata and data files
Data Citation in Dataverse
Data Citation in Dataverse
Published
Year
Dataset
Title
Global
Persistent
Identifier
Repository
= Data Publisher
Version (or
time range)
Authors
Data Citation Basics
Force11, Joint Declaration of Data Citation Principles; Starr et al, 2015
Data Citation Basics
Force11, Joint Declaration of Data Citation Principles; Starr et al, 2015
The dataset landing page is accessible and guaranteed by the repository
(or data publisher), even when data are restricted or deaccessioned
Metadata In Dataverse
Metadata In Dataverse
Citation Metadata
author, title, repository,
year published, version,
etc
• Dublin Core
• DataCite
Domain-specific
Metadata
data collection info
(methods, organism,
observation, survey,
experiment, etc)
• DDI (social sciences)
• ISA-Tab BioCaddie (biomed)
• Virtual Observatory (astro)
• + Custom metadata blocks
File-level Metadata
metadata inside the data
file (variables, instrument
details, geospatial info,
etc)
• DDI (for variables),
• + more to be determined
Fields StandardsMetadata Level
Metadata In Dataverse
Citation Metadata
author, title, repository,
year published, version,
etc
• Dublin Core
• DataCite
Domain-specific
Metadata
data collection info
(methods, organism,
observation, survey,
experiment, etc)
• DDI (social sciences)
• ISA-Tab BioCaddie (biomed)
• Virtual Observatory (astro)
• + Custom metadata blocks
File-level Metadata
metadata inside the data
file (variables, instrument
details, geospatial info,
etc)
• DDI (for variables),
• + more to be determined
Fields StandardsMetadata Level
Metadata In Dataverse
Citation Metadata
author, title, repository,
year published, version,
etc
• Dublin Core
• DataCite
Domain-specific
Metadata
data collection info
(methods, organism,
observation, survey,
experiment, etc)
• DDI (social sciences)
• ISA-Tab BioCaddie (biomed)
• Virtual Observatory (astro)
• + Custom metadata blocks
File-level Metadata
metadata inside the data
file (variables, instrument
details, geospatial info,
etc)
• DDI (for variables),
• + more to be determined
Fields StandardsMetadata Level
Metadata In Dataverse
Citation Metadata
author, title, repository,
year published, version,
etc
• Dublin Core
• DataCite
Domain-specific
Metadata
data collection info
(methods, organism,
observation, survey,
experiment, etc)
• DDI (social sciences)
• ISA-Tab BioCaddie (biomed)
• Virtual Observatory (astro)
• + Custom metadata blocks
File-level Metadata
metadata inside the data
file (variables, instrument
details, geospatial info,
etc)
• DDI (for variables),
• + more to be determined
Fields StandardsMetadata Level
DataverseJSONSchema
Information Extraction:Tabular Files
Information Extraction:Tabular Files
RData
Stata
SPSS
Excel
CSV
var 1 var 2 var 3
obs 1 2 a 0
obs 2 4 c 0
obs 3 6 b 1
obs 4 1 e 0
obs 5 2 a 1
obs 6 3 b 1
Information Extraction:Tabular Files
RData
Stata
SPSS
Excel
CSV
var 1 var 2 var 3
obs 1 2 a 0
obs 2 4 c 0
obs 3 6 b 1
obs 4 1 e 0
obs 5 2 a 1
obs 6 3 b 1
Variable Metadata:
Variable name, label,
type, stats, geospatial
coordinates
Information Extraction:Tabular Files
RData
Stata
SPSS
Excel
CSV
var 1 var 2 var 3
obs 1 2 a 0
obs 2 4 c 0
obs 3 6 b 1
obs 4 1 e 0
obs 5 2 a 1
obs 6 3 b 1
Variable Metadata:
Variable name, label,
type, stats, geospatial
coordinates
2 a 0
4 c 0
6 b 1
1 e 0
2 a 1
3 b 1
DataValues:
Independent of format
Information Extraction:Tabular Files
RData
Stata
SPSS
Excel
CSV
var 1 var 2 var 3
obs 1 2 a 0
obs 2 4 c 0
obs 3 6 b 1
obs 4 1 e 0
obs 5 2 a 1
obs 6 3 b 1
Variable Metadata:
Variable name, label,
type, stats, geospatial
coordinates
2 a 0
4 c 0
6 b 1
1 e 0
2 a 1
3 b 1
DataValues:
Independent of format
Universal Numerical Fingerprint (UNF):
checksum on data values, from canonical format
Information Extraction: FITS (astro) Files
Information Extraction: FITS (astro) Files
Information Extraction: FITS (astro) Files
Header Metadata:
coordinates (R.A.,
declination),
photometric info, ...
Information Extraction: FITS (astro) Files
Header Metadata:
coordinates (R.A.,
declination),
photometric info, ...
Data Objects:
•Image Files
•Spectra
•Data cubes
•Tables
•...
In addition to data citation and
metadata features, Dataverse
has a rich set of features that
facilitate data publishing
Tiered Access
Tiered Access
Open (default):
CC0
Open Open Click to Download
GuestBook Open Open
Fill in guestbook before
download
Terms of Use Open Open
Click through terms of
use before download
Data Restricted Open Restricted Request Access via
click through
Data Restricted Open Restricted
Request Access via
application
Metadata Files How to Access
Tiered Access
Open (default):
CC0
Open Open Click to Download
GuestBook Open Open
Fill in guestbook before
download
Terms of Use Open Open
Click through terms of
use before download
Data Restricted Open Restricted Request Access via
click through
Data Restricted Open Restricted
Request Access via
application
Metadata Files How to Access
Tiered Access
Open (default):
CC0
Open Open Click to Download
GuestBook Open Open
Fill in guestbook before
download
Terms of Use Open Open
Click through terms of
use before download
Data Restricted Open Restricted Request Access via
click through
Data Restricted Open Restricted
Request Access via
application
Metadata Files How to Access
Tiered Access
Open (default):
CC0
Open Open Click to Download
GuestBook Open Open
Fill in guestbook before
download
Terms of Use Open Open
Click through terms of
use before download
Data Restricted Open Restricted Request Access via
click through
Data Restricted Open Restricted
Request Access via
application
Metadata Files How to Access
Tiered Access
Open (default):
CC0
Open Open Click to Download
GuestBook Open Open
Fill in guestbook before
download
Terms of Use Open Open
Click through terms of
use before download
Data Restricted Open Restricted Request Access via
click through
Data Restricted Open Restricted
Request Access via
application
Metadata Files How to Access
Data Publishing Workflows
Data Publishing Workflows
Create Dataset
(landing page
restricted)
Data Publishing Workflows
Create Dataset
(landing page
restricted)
Review
(collaborators or
anonymous reviewers)
Data Publishing Workflows
Create Dataset
(landing page
restricted)
Publish v. 1
Review
(collaborators or
anonymous reviewers)
Data Publishing Workflows
Create Dataset
(landing page
restricted)
Publish v. 1
Review
(collaborators or
anonymous reviewers)
Minor change
(metadata only)
Data Publishing Workflows
Create Dataset
(landing page
restricted)
Publish v. 1
Review
(collaborators or
anonymous reviewers)
Minor change
(metadata only)
Data Publishing Workflows
Create Dataset
(landing page
restricted)
Publish v. 1
Review
(collaborators or
anonymous reviewers)
Minor change
(metadata only)
Publish v. 1.1
Data Publishing Workflows
Create Dataset
(landing page
restricted)
Publish v. 1
Review
(collaborators or
anonymous reviewers)
Minor change
(metadata only)
Publish v. 1.1
Major change
(might include new
data file)
Data Publishing Workflows
Create Dataset
(landing page
restricted)
Publish v. 1
Review
(collaborators or
anonymous reviewers)
Minor change
(metadata only)
Publish v. 1.1
Major change
(might include new
data file)
Data Publishing Workflows
Create Dataset
(landing page
restricted)
Publish v. 1
Review
(collaborators or
anonymous reviewers)
Minor change
(metadata only)
Publish v. 1.1
Major change
(might include new
data file)
Publish v. 2
And more at dataverse.org guides ...
Biomedical Dataverse addresses data
publication of large files: SBGridData
The Biomedical Dataverse at Harvard Medical School -
also tested as a persistent repository for LINCS data
(NIH Library of Integrated Network based Cellular Signatures)
Collaboration with Piotr Sliz and Caroline Shamu (HMS)
(NIH Library of Integrated Network-based Cellular Signatures)
The Biomedical Dataverse at Harvard Medical School -
also tested as a persistent repository for LINCS data
(NIH Library of Integrated Network based Cellular Signatures)
Collaboration with Piotr Sliz and Caroline Shamu (HMS)
(NIH Library of Integrated Network-based Cellular Signatures)
An additional challenge
for data publishing:
Sensitive Data
“User	
  Uploads	
  must	
  be	
  void	
  of	
  all	
  iden4fiable	
  
informa4on,	
  such	
  that	
  re-­‐iden4fica4on	
  of	
  any	
  subjects	
  
from	
  the	
  amalgama4on	
  of	
  the	
  informa4on	
  available	
  
from	
  all	
  of	
  the	
  materials	
  (across	
  datasets	
  and	
  
dataverses)	
  uploaded	
  under	
  any	
  one	
  author	
  and/or	
  
user	
  should	
  not	
  be	
  possible.”
“SubmiCer	
  represents	
  and	
  warrants	
  that	
  the	
  Content	
  
does	
  not	
  contain	
  any	
  informa4on	
  (i)	
  which	
  iden4fies,	
  or	
  
which	
  can	
  be	
  used	
  in	
  conjunc4on	
  with	
  other	
  publicly	
  
available	
  informa4on	
  to	
  personally	
  iden4fy,	
  any	
  
individual;”
“If	
  you	
  are	
  submiHng	
  human	
  sequences	
  to	
  GenBank,	
  
do	
  not	
  include	
  any	
  data	
  that	
  could	
  reveal	
  the	
  personal	
  
iden4ty	
  of	
  the	
  source.	
  It	
  is	
  our	
  assump4on	
  that	
  you	
  
have	
  received	
  any	
  necessary	
  informed	
  consent	
  
authoriza4ons	
  that	
  your	
  organiza4ons	
  require	
  prior	
  to	
  
submiHng	
  your	
  sequences.”
GenBank
How can we maximize
publishing sensitive data while
being mindful of privacy?
Sweeney	
  L,	
  Crosas	
  M,	
  Bar-­‐Sinai	
  M.	
  Sharing	
  Sensi4ve	
  Data	
  with	
  Confidence:	
  The	
  DataTags	
  System.	
  Technology	
  Science.	
  2015101601.	
  
October	
  16,	
  2015.	
  hCp://techscience.org/a/2015101601
The DataTags System
A datatag is a set of security features and access
requirements for file handling
A datatag is a set of security features and access
requirements for file handling
A datatags repository is one that stores and shares
data files in accordance with a standardized and
ordered levels of security and access requirements
Datatags&Levels&
Tag$Type$ Descrip-on$ Security$Features$ Access$Requirements$
Blue$ Public& Clear&storage&
Clear&transmission&
&
Open&
Green$ Controlled$
public&
Clear&storage&
Clear&transmission&
Email,&OAuth&verified&
registra:on&
Yellow$ Accountable& Clear&storage&
Encrypted&transmit&
Password,&Registered&,&
Approval,&Click&DUA&
Orange$ More$
accountable&
Encrypted&storage&
Encrypted&transmit&
Password,&Registered,&
Approval,&Signed&DUA&
Red$ Fully$
accountable&
Encrypted&storage&
Encrypted&transmit&
TwoDfactor&authen:ca:on,&
Approval,&Signed&DUA&
Crimson$ Maximally$
restricted&
Mul:Encrypt&store&
Encrypted&transmit&
TwoDfactor&authen:ca:on,&
Approval,&Signed&DUA&
DataTags Workflow in a Dataverse Repository
(under development)
Data$File$
Inges-on$
Sensi-ve$
Dataset$
Direct$
Access$
Privacy$
Preserving$
Access$
Automa-c$
Interview$$
Review$Board$
Approval$
hCp://datatags.org
hCp://privacytools.seas.harvard.edu
Two-­‐factor	
  
Authen4ca4on;
Signed	
  DUA
Example of DataTags Interview
Example of DataTags Interview
Example of DataTags Interview
Example of DataTags Interview
Example of DataTags Interview
Example of DataTags Interview
Thanks!
And join us to this year’s
Dataverse Community Meeting
References
• http://dataverse.org
• http://dataverse.harvard.edu
• http://datatags.org
• Sweeney L, Crosas M, Bar-Sinai M. 2015, Sharing
Sensitive Data with Confidence:The DataTags System.
Technology Science, hCp://techscience.org/a/2015101601
• Gross Harmon, Reidy, 2001, Communicating Science
• Mabe,	
  2003,	
  The	
  Growth	
  and	
  Number	
  of	
  Journals
• Friendly,	
  2006,	
  A	
  Brief	
  History	
  of	
  Data	
  Visualiza4on

More Related Content

Similar to The Rise of Data Publishing in the Digital World

Managing Scholarly Research Output The Smithsonian Institution Experience: An...
Managing Scholarly Research Output The Smithsonian Institution Experience: An...Managing Scholarly Research Output The Smithsonian Institution Experience: An...
Managing Scholarly Research Output The Smithsonian Institution Experience: An...Martin Kalfatovic
 
Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...Sarah Anna Stewart
 
Periodicals Archive Online: Past, Present, and Future
Periodicals Archive Online: Past, Present, and FuturePeriodicals Archive Online: Past, Present, and Future
Periodicals Archive Online: Past, Present, and FutureProQuest
 
How We Used to Build the Future: 30 Years of Collection Development Trends
How We Used to Build the Future: 30 Years of Collection Development TrendsHow We Used to Build the Future: 30 Years of Collection Development Trends
How We Used to Build the Future: 30 Years of Collection Development TrendsNASIG
 
Gauging Research Output and Influence
Gauging Research Output and InfluenceGauging Research Output and Influence
Gauging Research Output and Influenceguestb4248d
 
The three infrastructure crises in science
The three infrastructure crises in scienceThe three infrastructure crises in science
The three infrastructure crises in scienceBjörn Brembs
 
World Archaeology Congress paper
World Archaeology Congress paperWorld Archaeology Congress paper
World Archaeology Congress paperdejp3
 
Scientometric Mapping of Library and Information Science in Web of Science
Scientometric Mapping of Library and Information Science in Web of Science Scientometric Mapping of Library and Information Science in Web of Science
Scientometric Mapping of Library and Information Science in Web of Science 8638812142
 
Quantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesQuantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesEric Meyer
 
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...Martin Kalfatovic
 
General & Multidisciplinary Science and Technology Resources
General & Multidisciplinary Science and Technology ResourcesGeneral & Multidisciplinary Science and Technology Resources
General & Multidisciplinary Science and Technology ResourcesAlyson Gamble
 
Springer Nature: main
Springer Nature: mainSpringer Nature: main
Springer Nature: mainbntulibrary
 
Haustein, S. (2017). The evolution of scholarly communication and the reward ...
Haustein, S. (2017). The evolution of scholarly communication and the reward ...Haustein, S. (2017). The evolution of scholarly communication and the reward ...
Haustein, S. (2017). The evolution of scholarly communication and the reward ...Stefanie Haustein
 

Similar to The Rise of Data Publishing in the Digital World (20)

Managing Scholarly Research Output The Smithsonian Institution Experience: An...
Managing Scholarly Research Output The Smithsonian Institution Experience: An...Managing Scholarly Research Output The Smithsonian Institution Experience: An...
Managing Scholarly Research Output The Smithsonian Institution Experience: An...
 
Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...
 
The Virtual Research Environment and Libraries
The Virtual Research Environment and LibrariesThe Virtual Research Environment and Libraries
The Virtual Research Environment and Libraries
 
Periodicals Archive Online: Past, Present, and Future
Periodicals Archive Online: Past, Present, and FuturePeriodicals Archive Online: Past, Present, and Future
Periodicals Archive Online: Past, Present, and Future
 
How We Used to Build the Future: 30 Years of Collection Development Trends
How We Used to Build the Future: 30 Years of Collection Development TrendsHow We Used to Build the Future: 30 Years of Collection Development Trends
How We Used to Build the Future: 30 Years of Collection Development Trends
 
Gauging Research Output and Influence
Gauging Research Output and InfluenceGauging Research Output and Influence
Gauging Research Output and Influence
 
Industrial Revolution
Industrial RevolutionIndustrial Revolution
Industrial Revolution
 
Chemistrypresentation
ChemistrypresentationChemistrypresentation
Chemistrypresentation
 
The Google Scholar Revolution: a big data bibliometric tool
The Google Scholar Revolution:  a big data bibliometric toolThe Google Scholar Revolution:  a big data bibliometric tool
The Google Scholar Revolution: a big data bibliometric tool
 
The three infrastructure crises in science
The three infrastructure crises in scienceThe three infrastructure crises in science
The three infrastructure crises in science
 
World Archaeology Congress paper
World Archaeology Congress paperWorld Archaeology Congress paper
World Archaeology Congress paper
 
Scientometric Mapping of Library and Information Science in Web of Science
Scientometric Mapping of Library and Information Science in Web of Science Scientometric Mapping of Library and Information Science in Web of Science
Scientometric Mapping of Library and Information Science in Web of Science
 
Quantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesQuantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archives
 
Isi Introduction
Isi IntroductionIsi Introduction
Isi Introduction
 
101 This is Digital Scholarship 2016
101 This is Digital Scholarship 2016101 This is Digital Scholarship 2016
101 This is Digital Scholarship 2016
 
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...
 
General & Multidisciplinary Science and Technology Resources
General & Multidisciplinary Science and Technology ResourcesGeneral & Multidisciplinary Science and Technology Resources
General & Multidisciplinary Science and Technology Resources
 
Springer Nature: main
Springer Nature: mainSpringer Nature: main
Springer Nature: main
 
Haustein, S. (2017). The evolution of scholarly communication and the reward ...
Haustein, S. (2017). The evolution of scholarly communication and the reward ...Haustein, S. (2017). The evolution of scholarly communication and the reward ...
Haustein, S. (2017). The evolution of scholarly communication and the reward ...
 
Citation metrics
Citation metricsCitation metrics
Citation metrics
 

More from Merce Crosas

Practical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataversePractical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataverseMerce Crosas
 
Research Data Management @Harvard
Research Data Management @HarvardResearch Data Management @Harvard
Research Data Management @HarvardMerce Crosas
 
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudCloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudMerce Crosas
 
Can data access combat fake news?
Can data access combat fake news?Can data access combat fake news?
Can data access combat fake news?Merce Crosas
 
Data Repositories Impact
Data Repositories ImpactData Repositories Impact
Data Repositories ImpactMerce Crosas
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingMerce Crosas
 
The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)Merce Crosas
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasMerce Crosas
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
 
Connecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleConnecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleMerce Crosas
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at DataverseMerce Crosas
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOCMerce Crosas
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
 
Dataverse hpdm symposium
Dataverse   hpdm symposiumDataverse   hpdm symposium
Dataverse hpdm symposiumMerce Crosas
 
Collaboration in science and technology it summit
Collaboration in science and technology   it summitCollaboration in science and technology   it summit
Collaboration in science and technology it summitMerce Crosas
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for JournalsMerce Crosas
 
Collaboration in science and technology
Collaboration in science and technologyCollaboration in science and technology
Collaboration in science and technologyMerce Crosas
 
Force11 jddcp intro
Force11  jddcp introForce11  jddcp intro
Force11 jddcp introMerce Crosas
 

More from Merce Crosas (20)

Practical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataversePractical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with Dataverse
 
Research Data Management @Harvard
Research Data Management @HarvardResearch Data Management @Harvard
Research Data Management @Harvard
 
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudCloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
 
Can data access combat fake news?
Can data access combat fake news?Can data access combat fake news?
Can data access combat fake news?
 
Data Repositories Impact
Data Repositories ImpactData Repositories Impact
Data Repositories Impact
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)
 
Cloud Dataverse
Cloud DataverseCloud Dataverse
Cloud Dataverse
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with Confidence
 
Connecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleConnecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life Cycle
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at Dataverse
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
Dataverse hpdm symposium
Dataverse   hpdm symposiumDataverse   hpdm symposium
Dataverse hpdm symposium
 
Collaboration in science and technology it summit
Collaboration in science and technology   it summitCollaboration in science and technology   it summit
Collaboration in science and technology it summit
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for Journals
 
Collaboration in science and technology
Collaboration in science and technologyCollaboration in science and technology
Collaboration in science and technology
 
Force11 jddcp intro
Force11  jddcp introForce11  jddcp intro
Force11 jddcp intro
 

Recently uploaded

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

The Rise of Data Publishing in the Digital World

  • 1. The Rise of Data Publishing in the Digital World (and how Dataverse and DataTags help) Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitive Social Science Harvard University @mercecrosas NDSR 2016 Symposium
  • 2. From 1665 to late 20th century: A steady increase in size and complexity of research output
  • 3. The number of journals doubles every 20 years since 1750s, with growth of number of scientists 1665 1765 1865 1965 100 10000 Mabe, 2003
  • 4. The number of journals doubles every 20 years since 1750s, with growth of number of scientists 1700: 3 journals 1665 1765 1865 1965 100 10000 Mabe, 2003
  • 5. The number of journals doubles every 20 years since 1750s, with growth of number of scientists 1700: 3 journals 1800: ~10 journals 1665 1765 1865 1965 100 10000 Mabe, 2003
  • 6. The number of journals doubles every 20 years since 1750s, with growth of number of scientists 1700: 3 journals 1800: ~10 journals 1900: ~400 journals 1665 1765 1865 1965 100 10000 Mabe, 2003
  • 7. The number of journals doubles every 20 years since 1750s, with growth of number of scientists 1700: 3 journals 1800: ~10 journals 1900: ~400 journals 2000: ~14,000 journals (peer-reviewed) 1665 1765 1865 1965 100 10000 Mabe, 2003
  • 8. 1665 1765 1865 1965 100 10000
  • 9. Data Tables andVisuals Become Increasingly Common, and part of the Scientific Argument 1665 1765 1865 1965 100 10000
  • 10. Data Tables andVisuals Become Increasingly Common, and part of the Scientific Argument a few tables & visuals, as part of the text 1665 1765 1865 1965 100 10000
  • 11. Data Tables andVisuals Become Increasingly Common, and part of the Scientific Argument a few tables & visuals, as part of the text 50% cite previous work 1665 1765 1865 1965 100 10000
  • 12. Data Tables andVisuals Become Increasingly Common, and part of the Scientific Argument a few tables & visuals, as part of the text 50% cite previous work First Line Graphs and bar charts (Playfair, 1786) 1665 1765 1865 1965 100 10000
  • 13. Data Tables andVisuals Become Increasingly Common, and part of the Scientific Argument a few tables & visuals, as part of the text 50% of articles have tables & figures 50% cite previous work First Line Graphs and bar charts (Playfair, 1786) 1665 1765 1865 1965 100 10000
  • 14. Data Tables andVisuals Become Increasingly Common, and part of the Scientific Argument a few tables & visuals, as part of the text 50% of articles have tables & figures 50% cite previous work method sections appear First Line Graphs and bar charts (Playfair, 1786) 1665 1765 1865 1965 100 10000
  • 15. Data Tables andVisuals Become Increasingly Common, and part of the Scientific Argument a few tables & visuals, as part of the text 50% of articles have tables & figures 50% cite previous work method sections appear First Line Graphs and bar charts (Playfair, 1786) First Scatterplots (Hershel,1833; Galton 1896) 1665 1765 1865 1965 100 10000
  • 16. Data Tables andVisuals Become Increasingly Common, and part of the Scientific Argument a few tables & visuals, as part of the text 50% of articles have tables & figures most articles have tables & figures, often standalone 50% cite previous work method sections appear First Line Graphs and bar charts (Playfair, 1786) First Scatterplots (Hershel,1833; Galton 1896) 1665 1765 1865 1965 100 10000
  • 17. Data Tables andVisuals Become Increasingly Common, and part of the Scientific Argument a few tables & visuals, as part of the text 50% of articles have tables & figures most articles have tables & figures, often standalone 50% cite previous work 100% with citations (1 per 100 words) part of scholarly credit method sections appear First Line Graphs and bar charts (Playfair, 1786) First Scatterplots (Hershel,1833; Galton 1896) 1665 1765 1865 1965 100 10000
  • 18. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001)
  • 19. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century:
  • 20. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations)
  • 21. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations) • 19th century:
  • 22. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations) • 19th century: • explain data instead of establish observations of facts
  • 23. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations) • 19th century: • explain data instead of establish observations of facts • wide use of visuals, high citation density, methods section
  • 24. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations) • 19th century: • explain data instead of establish observations of facts • wide use of visuals, high citation density, methods section • 20th century:
  • 25. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations) • 19th century: • explain data instead of establish observations of facts • wide use of visuals, high citation density, methods section • 20th century: • structured quantitative data with increased use of statistics
  • 26. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations) • 19th century: • explain data instead of establish observations of facts • wide use of visuals, high citation density, methods section • 20th century: • structured quantitative data with increased use of statistics • wide range of data types with new technologies
  • 27. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations) • 19th century: • explain data instead of establish observations of facts • wide use of visuals, high citation density, methods section • 20th century: • structured quantitative data with increased use of statistics • wide range of data types with new technologies • Number of scientists increases from 100s to a few millions
  • 28. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations) • 19th century: • explain data instead of establish observations of facts • wide use of visuals, high citation density, methods section • 20th century: • structured quantitative data with increased use of statistics • wide range of data types with new technologies • Number of scientists increases from 100s to a few millions • Science becomes extremely specialized:
  • 29. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations) • 19th century: • explain data instead of establish observations of facts • wide use of visuals, high citation density, methods section • 20th century: • structured quantitative data with increased use of statistics • wide range of data types with new technologies • Number of scientists increases from 100s to a few millions • Science becomes extremely specialized: • from 1 journal to 14,000 peer-reviewed journals
  • 30. Scholarly Publishing Adapts to the Increase of Cognitive Complexity (Gross et al 2001) • 18th century: • formal components appear in articles (introduction, conclusions, table, figures, citations) • 19th century: • explain data instead of establish observations of facts • wide use of visuals, high citation density, methods section • 20th century: • structured quantitative data with increased use of statistics • wide range of data types with new technologies • Number of scientists increases from 100s to a few millions • Science becomes extremely specialized: • from 1 journal to 14,000 peer-reviewed journals • one new journal for each 150 authors, read by 500
  • 31. In the last decades, more and more publications and data
  • 32. A Steeper Growth of Scholarly Output Since 1950, the total number of journals doubles every ~15 years 2010: 80,000 journals 2010: 33,000 peer-reviewed
  • 33. An Outburst of Research Data and Specialization, Results into > 1000 Community Repositories
  • 34. An Outburst of Research Data and Specialization, Results into > 1000 Community Repositories 1920 - 1950s
  • 35. An Outburst of Research Data and Specialization, Results into > 1000 Community Repositories First Social Science Data Archives (ODUM, ICPSR, ...) 1920 - 1950s
  • 36. An Outburst of Research Data and Specialization, Results into > 1000 Community Repositories First Social Science Data Archives (ODUM, ICPSR, ...) 1920 - 1950s 1970 - 1980s
  • 37. An Outburst of Research Data and Specialization, Results into > 1000 Community Repositories First Social Science Data Archives (ODUM, ICPSR, ...) First Biomedical Databases (PDB, GenBank, ...) 1920 - 1950s 1970 - 1980s
  • 38. An Outburst of Research Data and Specialization, Results into > 1000 Community Repositories First Social Science Data Archives (ODUM, ICPSR, ...) First Biomedical Databases (PDB, GenBank, ...) 1920 - 1950s 1970 - 1980s 2016
  • 39. An Outburst of Research Data and Specialization, Results into > 1000 Community Repositories First Social Science Data Archives (ODUM, ICPSR, ...) A wide range of Research Data Repositories First Biomedical Databases (PDB, GenBank, ...) 1920 - 1950s 1970 - 1980s 2016
  • 40. An Outburst of Research Data and Specialization, Results into > 1000 Community Repositories First Social Science Data Archives (ODUM, ICPSR, ...) A wide range of Research Data Repositories First Biomedical Databases (PDB, GenBank, ...) 1500 repositories listed in re3data.org 1920 - 1950s 1970 - 1980s 2016
  • 41. Data Publishing Emerges as the Union of Scholarly Publishing and Data Archiving
  • 42. Data Publishing Emerges as the Union of Scholarly Publishing and Data Archiving Scholarly publishing: Distribute research output
  • 43. Data Publishing Emerges as the Union of Scholarly Publishing and Data Archiving Scholarly publishing: Distribute research output • Attribution and credit
  • 44. Data Publishing Emerges as the Union of Scholarly Publishing and Data Archiving Scholarly publishing: Distribute research output • Attribution and credit • Dissemination
  • 45. Data Publishing Emerges as the Union of Scholarly Publishing and Data Archiving Scholarly publishing: Distribute research output • Attribution and credit • Dissemination • Finding & Reuse
  • 46. Data Publishing Emerges as the Union of Scholarly Publishing and Data Archiving Scholarly publishing: Distribute research output • Attribution and credit • Dissemination • Finding & Reuse Data Archiving: Long-term access to data
  • 47. Data Publishing Emerges as the Union of Scholarly Publishing and Data Archiving Scholarly publishing: Distribute research output • Attribution and credit • Dissemination • Finding & Reuse Data Archiving: Long-term access to data • Accessibility
  • 48. Data Publishing Emerges as the Union of Scholarly Publishing and Data Archiving Scholarly publishing: Distribute research output • Attribution and credit • Dissemination • Finding & Reuse Data Archiving: Long-term access to data • Accessibility • Preservation
  • 49. Data Publishing Emerges as the Union of Scholarly Publishing and Data Archiving Scholarly publishing: Distribute research output • Attribution and credit • Dissemination • Finding & Reuse Data Archiving: Long-term access to data • Accessibility • Preservation • Finding & Reuse
  • 51. Why Data Publishing now? Extending Gross et al. thesis, data publishing accommodates the complexity of research input and output in the digital world.
  • 52. Why Data Publishing now? Extending Gross et al. thesis, data publishing accommodates the complexity of research input and output in the digital world.
  • 53. Why Data Publishing now? • Data (and software) have become common input and output of research Extending Gross et al. thesis, data publishing accommodates the complexity of research input and output in the digital world.
  • 54. Why Data Publishing now? • Data (and software) have become common input and output of research • A scholarly article cannot hold or describe accurately these vast amounts of data and software Extending Gross et al. thesis, data publishing accommodates the complexity of research input and output in the digital world.
  • 55. Why Data Publishing now? • Data (and software) have become common input and output of research • A scholarly article cannot hold or describe accurately these vast amounts of data and software • As input and output of research, data must be citable and accessible to enable validation and reuse, with attribution Extending Gross et al. thesis, data publishing accommodates the complexity of research input and output in the digital world.
  • 56. What is needed for FAIR Data Publishing FAIR = Findable Accessible Interoperable Reusable
  • 57. What is needed for FAIR Data Publishing Data Citation FAIR = Findable Accessible Interoperable Reusable
  • 58. What is needed for FAIR Data Publishing Data Citation • Persistent id to reference data uniquely FAIR = Findable Accessible Interoperable Reusable
  • 59. What is needed for FAIR Data Publishing Data Citation • Persistent id to reference data uniquely • Support for versions and fixity FAIR = Findable Accessible Interoperable Reusable
  • 60. What is needed for FAIR Data Publishing Data Citation • Persistent id to reference data uniquely • Support for versions and fixity • Attribution to authors and repository FAIR = Findable Accessible Interoperable Reusable
  • 61. What is needed for FAIR Data Publishing Data Citation • Persistent id to reference data uniquely • Support for versions and fixity • Attribution to authors and repository Metadata FAIR = Findable Accessible Interoperable Reusable
  • 62. What is needed for FAIR Data Publishing Data Citation • Persistent id to reference data uniquely • Support for versions and fixity • Attribution to authors and repository Metadata • Catalog to discover and locate the data FAIR = Findable Accessible Interoperable Reusable
  • 63. What is needed for FAIR Data Publishing Data Citation • Persistent id to reference data uniquely • Support for versions and fixity • Attribution to authors and repository Metadata • Catalog to discover and locate the data • Sufficient information to understand and reuse the data FAIR = Findable Accessible Interoperable Reusable
  • 64. What is needed for FAIR Data Publishing Data Citation • Persistent id to reference data uniquely • Support for versions and fixity • Attribution to authors and repository Metadata • Catalog to discover and locate the data • Sufficient information to understand and reuse the data Repository FAIR = Findable Accessible Interoperable Reusable
  • 65. What is needed for FAIR Data Publishing Data Citation • Persistent id to reference data uniquely • Support for versions and fixity • Attribution to authors and repository Metadata • Catalog to discover and locate the data • Sufficient information to understand and reuse the data Repository • Digital access to metadata and data FAIR = Findable Accessible Interoperable Reusable
  • 66. What is needed for FAIR Data Publishing Data Citation • Persistent id to reference data uniquely • Support for versions and fixity • Attribution to authors and repository Metadata • Catalog to discover and locate the data • Sufficient information to understand and reuse the data Repository • Digital access to metadata and data • Archive and preservation for long-term access FAIR = Findable Accessible Interoperable Reusable
  • 67. What is needed for FAIR Data Publishing Data Citation • Persistent id to reference data uniquely • Support for versions and fixity • Attribution to authors and repository Metadata • Catalog to discover and locate the data • Sufficient information to understand and reuse the data Repository • Digital access to metadata and data • Archive and preservation for long-term access • Interoperability through standards and APIs FAIR = Findable Accessible Interoperable Reusable
  • 68.
  • 69. A data repository system that serves as a solution for publishing FAIR research data
  • 70. Around the World Dataverse repositories serve a community, an institution, an archive, ...
  • 71. Around the World Harvard Dataverse: Generic data repository open to researchers world wide Dataverse repositories serve a community, an institution, an archive, ...
  • 72. Dataverses contain datasets, datasets contain metadata and data files
  • 73. Data Citation in Dataverse
  • 74. Data Citation in Dataverse Published Year Dataset Title Global Persistent Identifier Repository = Data Publisher Version (or time range) Authors
  • 75. Data Citation Basics Force11, Joint Declaration of Data Citation Principles; Starr et al, 2015
  • 76. Data Citation Basics Force11, Joint Declaration of Data Citation Principles; Starr et al, 2015 The dataset landing page is accessible and guaranteed by the repository (or data publisher), even when data are restricted or deaccessioned
  • 78. Metadata In Dataverse Citation Metadata author, title, repository, year published, version, etc • Dublin Core • DataCite Domain-specific Metadata data collection info (methods, organism, observation, survey, experiment, etc) • DDI (social sciences) • ISA-Tab BioCaddie (biomed) • Virtual Observatory (astro) • + Custom metadata blocks File-level Metadata metadata inside the data file (variables, instrument details, geospatial info, etc) • DDI (for variables), • + more to be determined Fields StandardsMetadata Level
  • 79. Metadata In Dataverse Citation Metadata author, title, repository, year published, version, etc • Dublin Core • DataCite Domain-specific Metadata data collection info (methods, organism, observation, survey, experiment, etc) • DDI (social sciences) • ISA-Tab BioCaddie (biomed) • Virtual Observatory (astro) • + Custom metadata blocks File-level Metadata metadata inside the data file (variables, instrument details, geospatial info, etc) • DDI (for variables), • + more to be determined Fields StandardsMetadata Level
  • 80. Metadata In Dataverse Citation Metadata author, title, repository, year published, version, etc • Dublin Core • DataCite Domain-specific Metadata data collection info (methods, organism, observation, survey, experiment, etc) • DDI (social sciences) • ISA-Tab BioCaddie (biomed) • Virtual Observatory (astro) • + Custom metadata blocks File-level Metadata metadata inside the data file (variables, instrument details, geospatial info, etc) • DDI (for variables), • + more to be determined Fields StandardsMetadata Level
  • 81. Metadata In Dataverse Citation Metadata author, title, repository, year published, version, etc • Dublin Core • DataCite Domain-specific Metadata data collection info (methods, organism, observation, survey, experiment, etc) • DDI (social sciences) • ISA-Tab BioCaddie (biomed) • Virtual Observatory (astro) • + Custom metadata blocks File-level Metadata metadata inside the data file (variables, instrument details, geospatial info, etc) • DDI (for variables), • + more to be determined Fields StandardsMetadata Level DataverseJSONSchema
  • 83. Information Extraction:Tabular Files RData Stata SPSS Excel CSV var 1 var 2 var 3 obs 1 2 a 0 obs 2 4 c 0 obs 3 6 b 1 obs 4 1 e 0 obs 5 2 a 1 obs 6 3 b 1
  • 84. Information Extraction:Tabular Files RData Stata SPSS Excel CSV var 1 var 2 var 3 obs 1 2 a 0 obs 2 4 c 0 obs 3 6 b 1 obs 4 1 e 0 obs 5 2 a 1 obs 6 3 b 1 Variable Metadata: Variable name, label, type, stats, geospatial coordinates
  • 85. Information Extraction:Tabular Files RData Stata SPSS Excel CSV var 1 var 2 var 3 obs 1 2 a 0 obs 2 4 c 0 obs 3 6 b 1 obs 4 1 e 0 obs 5 2 a 1 obs 6 3 b 1 Variable Metadata: Variable name, label, type, stats, geospatial coordinates 2 a 0 4 c 0 6 b 1 1 e 0 2 a 1 3 b 1 DataValues: Independent of format
  • 86. Information Extraction:Tabular Files RData Stata SPSS Excel CSV var 1 var 2 var 3 obs 1 2 a 0 obs 2 4 c 0 obs 3 6 b 1 obs 4 1 e 0 obs 5 2 a 1 obs 6 3 b 1 Variable Metadata: Variable name, label, type, stats, geospatial coordinates 2 a 0 4 c 0 6 b 1 1 e 0 2 a 1 3 b 1 DataValues: Independent of format Universal Numerical Fingerprint (UNF): checksum on data values, from canonical format
  • 89. Information Extraction: FITS (astro) Files Header Metadata: coordinates (R.A., declination), photometric info, ...
  • 90. Information Extraction: FITS (astro) Files Header Metadata: coordinates (R.A., declination), photometric info, ... Data Objects: •Image Files •Spectra •Data cubes •Tables •...
  • 91. In addition to data citation and metadata features, Dataverse has a rich set of features that facilitate data publishing
  • 93. Tiered Access Open (default): CC0 Open Open Click to Download GuestBook Open Open Fill in guestbook before download Terms of Use Open Open Click through terms of use before download Data Restricted Open Restricted Request Access via click through Data Restricted Open Restricted Request Access via application Metadata Files How to Access
  • 94. Tiered Access Open (default): CC0 Open Open Click to Download GuestBook Open Open Fill in guestbook before download Terms of Use Open Open Click through terms of use before download Data Restricted Open Restricted Request Access via click through Data Restricted Open Restricted Request Access via application Metadata Files How to Access
  • 95. Tiered Access Open (default): CC0 Open Open Click to Download GuestBook Open Open Fill in guestbook before download Terms of Use Open Open Click through terms of use before download Data Restricted Open Restricted Request Access via click through Data Restricted Open Restricted Request Access via application Metadata Files How to Access
  • 96. Tiered Access Open (default): CC0 Open Open Click to Download GuestBook Open Open Fill in guestbook before download Terms of Use Open Open Click through terms of use before download Data Restricted Open Restricted Request Access via click through Data Restricted Open Restricted Request Access via application Metadata Files How to Access
  • 97. Tiered Access Open (default): CC0 Open Open Click to Download GuestBook Open Open Fill in guestbook before download Terms of Use Open Open Click through terms of use before download Data Restricted Open Restricted Request Access via click through Data Restricted Open Restricted Request Access via application Metadata Files How to Access
  • 99. Data Publishing Workflows Create Dataset (landing page restricted)
  • 100. Data Publishing Workflows Create Dataset (landing page restricted) Review (collaborators or anonymous reviewers)
  • 101. Data Publishing Workflows Create Dataset (landing page restricted) Publish v. 1 Review (collaborators or anonymous reviewers)
  • 102. Data Publishing Workflows Create Dataset (landing page restricted) Publish v. 1 Review (collaborators or anonymous reviewers) Minor change (metadata only)
  • 103. Data Publishing Workflows Create Dataset (landing page restricted) Publish v. 1 Review (collaborators or anonymous reviewers) Minor change (metadata only)
  • 104. Data Publishing Workflows Create Dataset (landing page restricted) Publish v. 1 Review (collaborators or anonymous reviewers) Minor change (metadata only) Publish v. 1.1
  • 105. Data Publishing Workflows Create Dataset (landing page restricted) Publish v. 1 Review (collaborators or anonymous reviewers) Minor change (metadata only) Publish v. 1.1 Major change (might include new data file)
  • 106. Data Publishing Workflows Create Dataset (landing page restricted) Publish v. 1 Review (collaborators or anonymous reviewers) Minor change (metadata only) Publish v. 1.1 Major change (might include new data file)
  • 107. Data Publishing Workflows Create Dataset (landing page restricted) Publish v. 1 Review (collaborators or anonymous reviewers) Minor change (metadata only) Publish v. 1.1 Major change (might include new data file) Publish v. 2
  • 108. And more at dataverse.org guides ...
  • 109. Biomedical Dataverse addresses data publication of large files: SBGridData
  • 110. The Biomedical Dataverse at Harvard Medical School - also tested as a persistent repository for LINCS data (NIH Library of Integrated Network based Cellular Signatures) Collaboration with Piotr Sliz and Caroline Shamu (HMS) (NIH Library of Integrated Network-based Cellular Signatures)
  • 111. The Biomedical Dataverse at Harvard Medical School - also tested as a persistent repository for LINCS data (NIH Library of Integrated Network based Cellular Signatures) Collaboration with Piotr Sliz and Caroline Shamu (HMS) (NIH Library of Integrated Network-based Cellular Signatures)
  • 112. An additional challenge for data publishing: Sensitive Data
  • 113. “User  Uploads  must  be  void  of  all  iden4fiable   informa4on,  such  that  re-­‐iden4fica4on  of  any  subjects   from  the  amalgama4on  of  the  informa4on  available   from  all  of  the  materials  (across  datasets  and   dataverses)  uploaded  under  any  one  author  and/or   user  should  not  be  possible.”
  • 114. “SubmiCer  represents  and  warrants  that  the  Content   does  not  contain  any  informa4on  (i)  which  iden4fies,  or   which  can  be  used  in  conjunc4on  with  other  publicly   available  informa4on  to  personally  iden4fy,  any   individual;”
  • 115. “If  you  are  submiHng  human  sequences  to  GenBank,   do  not  include  any  data  that  could  reveal  the  personal   iden4ty  of  the  source.  It  is  our  assump4on  that  you   have  received  any  necessary  informed  consent   authoriza4ons  that  your  organiza4ons  require  prior  to   submiHng  your  sequences.” GenBank
  • 116. How can we maximize publishing sensitive data while being mindful of privacy?
  • 117. Sweeney  L,  Crosas  M,  Bar-­‐Sinai  M.  Sharing  Sensi4ve  Data  with  Confidence:  The  DataTags  System.  Technology  Science.  2015101601.   October  16,  2015.  hCp://techscience.org/a/2015101601 The DataTags System
  • 118.
  • 119. A datatag is a set of security features and access requirements for file handling
  • 120. A datatag is a set of security features and access requirements for file handling A datatags repository is one that stores and shares data files in accordance with a standardized and ordered levels of security and access requirements
  • 121. Datatags&Levels& Tag$Type$ Descrip-on$ Security$Features$ Access$Requirements$ Blue$ Public& Clear&storage& Clear&transmission& & Open& Green$ Controlled$ public& Clear&storage& Clear&transmission& Email,&OAuth&verified& registra:on& Yellow$ Accountable& Clear&storage& Encrypted&transmit& Password,&Registered&,& Approval,&Click&DUA& Orange$ More$ accountable& Encrypted&storage& Encrypted&transmit& Password,&Registered,& Approval,&Signed&DUA& Red$ Fully$ accountable& Encrypted&storage& Encrypted&transmit& TwoDfactor&authen:ca:on,& Approval,&Signed&DUA& Crimson$ Maximally$ restricted& Mul:Encrypt&store& Encrypted&transmit& TwoDfactor&authen:ca:on,& Approval,&Signed&DUA&
  • 122. DataTags Workflow in a Dataverse Repository (under development) Data$File$ Inges-on$ Sensi-ve$ Dataset$ Direct$ Access$ Privacy$ Preserving$ Access$ Automa-c$ Interview$$ Review$Board$ Approval$ hCp://datatags.org hCp://privacytools.seas.harvard.edu Two-­‐factor   Authen4ca4on; Signed  DUA
  • 123. Example of DataTags Interview
  • 124. Example of DataTags Interview
  • 125. Example of DataTags Interview
  • 126. Example of DataTags Interview
  • 127. Example of DataTags Interview
  • 128. Example of DataTags Interview
  • 129. Thanks! And join us to this year’s Dataverse Community Meeting
  • 130. References • http://dataverse.org • http://dataverse.harvard.edu • http://datatags.org • Sweeney L, Crosas M, Bar-Sinai M. 2015, Sharing Sensitive Data with Confidence:The DataTags System. Technology Science, hCp://techscience.org/a/2015101601 • Gross Harmon, Reidy, 2001, Communicating Science • Mabe,  2003,  The  Growth  and  Number  of  Journals • Friendly,  2006,  A  Brief  History  of  Data  Visualiza4on