SlideShare une entreprise Scribd logo
1  sur  36
Scott Edmunds, GigaScience/BGI Hong Kong
ICG7, Hong Kong, 1st December 2012


           www.gigasciencejournal.com
The challenges integrating papers + data:
Technical issues:
•Data volumes: (1.2 zettabytes generated globally each year)

•>Exponential growth of genomics data

•Technical challenges (VMs/cloud, compression)


Cultural issues:
•Lack of incentives (Data DOIs)

•Data licensing (CC-BY, CC0)

•Journal/funder policies
 Source: 1. Mervis J. U.S. science policy. Agencies rally to tackle big data. Science. 2012 Apr 6;336(6077):22.
The challenges integrating papers + data:
Technical issues:
•Data volumes: (1.2 zettabytes generated globally each year)

•>Exponential growth of genomics data

•Technical challenges (VMs/cloud, compression)


Cultural issues:
•Lack of incentives (Data DOIs)

•Data licensing (CC-BY, CC0)

•Journal/funder policies
 Source: 1. Mervis J. U.S. science policy. Agencies rally to tackle big data. Science. 2012 Apr 6;336(6077):22.
 * T-Shirts available from Graham Steel / http://www.zazzle.co.uk/steelgraham
Why is this important?
                                                                                 • Transparency
                                                                                 • Reproducibility
                                                                                 • Re-use




“Faked research
 is endemic in
     China”



Source: New Scientist, 17th Nov 2012: http://www.newscientist.com/article/mg21628910.300-fraud-fighter-faked-research-is-endemic-in-china.html
Why is this important?

                                          475, 267 (2011)




―Wide distribution of information is key to scientific progress,
yet traditionally, Chinese scientists have not systematically
released data or research findings, even after publication.―

―There have been widespread complaints from scientists
inside and outside China about this lack of transparency. ‖

―Usually incomplete and unsystematic, [what little supporting
data released] are of little value to researchers and there is
evidence that this drives down a paper's citation numbers.‖
Source: Nature 475, 267 (2011) http://www.nature.com/news/2011/110720/full/475267a.html?
Global Issue: increasing number of retractions
                                                                   >15X increase in last decade
                                                                  Strong correlation of ―retraction index‖ with
                                                                  higher impact factor




 1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
 2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
Global Issue: unrepeatability of scientific results
                                                                 Out of 18 microarray papers, results
                                                                  from 10 could not be reproduced




Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses.
Nature Genetics 41: 149-155.
Sharing aids authors…


Sharing Detailed
Research Data Is
Associated with
Increased Citation Rate.
Piwowar HA, Day RS, Fridsma DB (2007)
PLoS ONE 2(3): e308.
doi:10.1371/journal.pone.0000308



                 Every 10 datasets collected contributes to at least 4 papers in
                 the following 3-years.
                 Piwowar, HA, Vision, TJ, & Whitlock, MC (2011). Data archiving is a good investment
                 Nature, 473 (7347), 285-285 DOI: 10.1038/473285a
Rice v Wheat: consequences of publically available
                  genome data.

                                 rice   wheat
     700
     600
     500
     400
     300
     200
     100
       0
Our first DOI:


To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J;
Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y;
Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X;
Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-
2482 isolate genome sequencing consortium (2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen.
doi:10.5524/100001
http://dx.doi.org/10.5524/100001

            To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
            Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
Downstream consequences:

1. Citations (~100)        2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons
4. Example for faster & more open science




  ―Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the
  Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he knew
  it that might take days for the lawyers at his company — Pacific Biosciences — to parse the agreements
  governing how his team could use data collected on the strain. Luckily, one team had released its data
  under a Creative Commons licence that allowed free use of the data, allowing Kasarskis and his
  colleagues to join the international research effort and publish their work without wasting time on
  legal wrangling.‖
1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the
US, affecting about 4000 people and resulting in over 50
deaths. All tested positive for an unusual and little-known
Shiga-toxin–producing E. coli bacterium. The strain was initially
analysed by scientists at BGI-Shenzhen in China, working
together with those in Hamburg, and three days later a draft
genome was released under an open data licence. This
generated interest from bioinformaticians on four continents. 24
hours after the release of the genome it had been assembled.
Within a week two dozen reports had been filed on an open-
source site dedicated to the analysis of the strain. These
analyses provided crucial information about the strain’s
virulence and resistance genes – how it spreads and which
antibiotics are effective against it. They produced results in
time to help contain the outbreak. By July 2011, scientists
published papers based on this work. By opening up their early
sequencing results to international collaboration, researchers in
Hamburg produced results that were quickly tested by a wide
range of experts, used to produce new knowledge and
ultimately to control a public health emergency.
Not just (data) quantity, but quality
1. Lack of sufficient metadata

2. Lack of interoperability

1. Long tail of curation (“Democratization” of “Big-Data”)
Not just (data) quantity, but quality
Better handling of metadata…
Novel tools/formats for data interoperability/handling.
      Cloud
    solutions?
Not just (data) quantity, but quality

Tools making work more easily reproducible…

Interoperability/Ease of use   Workflows




 Data quality assessment
Large-Scale Data
      Journal/Database
    In conjunction with:

Editor-in-Chief: Laurie Goodman, PhD
Editor: Scott Edmunds, PhD
Commisioning Editor: Nicole Nogoy, PhD
Lead Curator: Tam Sneddon D.Phil
Data Platform: Peter Li, PhD
  www.gigasciencejournal.com
Addressing the reproducibility gap:
Computable methods/workflow systems
Bioinformatics
Development      Biomedical and bioinformatics research   Publishing
Redefining what is a paper in the era of big-data?

                goal: Executable Research Objects




                                        Citable DOI
Integrating workflows into papers…
Anatomy of a Publication
 Idea




Study




           Metadata


           Data
Analysis




Answer
Anatomy of a Data Publication
 Idea




Study




           Metadata


           Data
Analysis




Answer
Publication




• Background

• Methods

• Results (Data)

• Conclusions/Discussion

                           doi:10.1186/2047-217X-1-3
Data
                                     Publication




• Background

• Methods

• Results (Data)
                                   doi:10.5524/100035
• Conclusions/Discussion

                           doi:10.1186/2047-217X-1-3
Methods +
                                     Data +
                                     Publication




• Background

• Methods                          DOI for workflows?


• Results (Data)
                                   doi:10.5524/100035
• Conclusions/Discussion

                           doi:10.1186/2047-217X-1-3
Data                  Methods            Analysis


doi:10.5524/100035   +    DOI: x   =   doi:10.1186/2047-217X-1-3


   DOI: A            +    DOI: X   =         DOI: 1
Data                  Methods            Analysis


doi:10.5524/100035   +    DOI: x   =   doi:10.1186/2047-217X-1-3


   DOI: A            +    DOI: X   =         DOI: 1

   DOI: B            +    DOI: X   =         DOI: 2
Data                  Methods            Analysis


doi:10.5524/100035   +    DOI: x   =   doi:10.1186/2047-217X-1-3


   DOI: A            +    DOI: X   =         DOI: 1

   DOI: B            +    DOI: X   =         DOI: 2

  DOI: A             +    DOI: Y   =         DOI: 3
Data                  Methods             Analysis


doi:10.5524/100035   +    DOI: x    =   doi:10.1186/2047-217X-1-3


   DOI: A            +    DOI: X    =         DOI: 1

   DOI: B            +    DOI: X    =         DOI: 2

  DOI: A             +    DOI: Y    =         DOI: 3

  A, B, C…               X, Y, Z…   =         4, 5, 6…
Different shaped publishable objects
  Data
 Papers



Executable
(Methods)
  Papers


 Analysis
  Papers
Different shaped publishable objects
         Different levels of granularity


   Experiment                                 e.g. doi:10.5524/100001        Papers
(e.g. ACRG project)


                                              e.g. doi:10.5524/100001-2     Data/
    Datasets                                                              Micropubs
 (e.g. cancer type)

                                              e.g. doi:10.5524/100001-2000
    Sample                                    or doi:10.5524/100001_xyz
(e.g. specimen xyz)



 Smaller still?       Facts/Assertions (~1014 in literature)              Nanopubs
Adding “value” publishing data

• Scope for different shaped publishable objects
• Scope for publishing methods/executable papers
• Peer review of data problematic
     – Post publication peer review
     – Change criteria (assess on transparency/access only)
     – Better use of workflows/cloud/VMs



DOIs are cheap*, data is precious: maximise its use
 * ish
• Transparency
• Reproducibility
• Re-use
                    }   = Credit
Thanks to:               Shaoguang Liang (BGI-SZ)
Laurie Goodman           Tin-Lap Lee (CUHK)
Tam Sneddon              Huayen Gao (CUHK)
Nicole Nogoy             Qiong Luo (HKUST)
Alexandra Basford        Senghong Wang (HKUST)
Peter Li                 Yan Zhou (HKUST)
Jesse Si Zhe             Cogini
                           editorial@gigasciencejournal.com
Contact us:                database@gigasciencejournal.com

                            @gigascience

 Follow us:                 facebook.com/GigaScience

                            blogs.openaccesscentral.com/blogs/gigablog/

                   www.gigadb.org
              www.gigasciencejournal.com

Contenu connexe

Tendances

HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research dataVarsha Khodiyar
 
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save livesDorothy Bishop
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-ResearchDavid De Roure
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsPhilip Bourne
 
Poster Medecine 2.0'13 London: Wiki Scoping Review published in JMIR: http://...
Poster Medecine 2.0'13 London: Wiki Scoping Review published in JMIR: http://...Poster Medecine 2.0'13 London: Wiki Scoping Review published in JMIR: http://...
Poster Medecine 2.0'13 London: Wiki Scoping Review published in JMIR: http://...Patrick Archambault
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1guest2426e1d
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...GigaScience, BGI Hong Kong
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Don Pellegrino
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingGigaScience, BGI Hong Kong
 
Embedded with the Scientists: The UCLA Experience
Embedded with the Scientists: The UCLA ExperienceEmbedded with the Scientists: The UCLA Experience
Embedded with the Scientists: The UCLA Experiencelmfederer
 
RDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
RDAP13 Lorrie Johnson: Facilitating Access to Scientific DataRDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
RDAP13 Lorrie Johnson: Facilitating Access to Scientific DataASIS&T
 
Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Dag Endresen
 
Problem-citations--CrossrefLive18--2018-11-13
Problem-citations--CrossrefLive18--2018-11-13Problem-citations--CrossrefLive18--2018-11-13
Problem-citations--CrossrefLive18--2018-11-13jodischneider
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016Fiona Nielsen
 
Why i left my job in genomics R&D - Lunteren - april 18 - 2016
Why i left my job in genomics R&D - Lunteren - april 18 - 2016Why i left my job in genomics R&D - Lunteren - april 18 - 2016
Why i left my job in genomics R&D - Lunteren - april 18 - 2016Fiona Nielsen
 
Digital Scholar Webinar: Open reproducible research
Digital Scholar Webinar: Open reproducible researchDigital Scholar Webinar: Open reproducible research
Digital Scholar Webinar: Open reproducible researchSC CTSI at USC and CHLA
 

Tendances (20)

HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research data
 
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save lives
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-Research
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
 
Poster Medecine 2.0'13 London: Wiki Scoping Review published in JMIR: http://...
Poster Medecine 2.0'13 London: Wiki Scoping Review published in JMIR: http://...Poster Medecine 2.0'13 London: Wiki Scoping Review published in JMIR: http://...
Poster Medecine 2.0'13 London: Wiki Scoping Review published in JMIR: http://...
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
 
Embedded with the Scientists: The UCLA Experience
Embedded with the Scientists: The UCLA ExperienceEmbedded with the Scientists: The UCLA Experience
Embedded with the Scientists: The UCLA Experience
 
RDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
RDAP13 Lorrie Johnson: Facilitating Access to Scientific DataRDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
RDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
 
Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Open science curriculum for students, June 2019
Open science curriculum for students, June 2019
 
Problem-citations--CrossrefLive18--2018-11-13
Problem-citations--CrossrefLive18--2018-11-13Problem-citations--CrossrefLive18--2018-11-13
Problem-citations--CrossrefLive18--2018-11-13
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016
 
Why i left my job in genomics R&D - Lunteren - april 18 - 2016
Why i left my job in genomics R&D - Lunteren - april 18 - 2016Why i left my job in genomics R&D - Lunteren - april 18 - 2016
Why i left my job in genomics R&D - Lunteren - april 18 - 2016
 
Digital Scholar Webinar: Open reproducible research
Digital Scholar Webinar: Open reproducible researchDigital Scholar Webinar: Open reproducible research
Digital Scholar Webinar: Open reproducible research
 

En vedette

Ciaran O'Neill & Amye Kenall: Peering into review - Innovation, credit & repr...
Ciaran O'Neill & Amye Kenall: Peering into review - Innovation, credit & repr...Ciaran O'Neill & Amye Kenall: Peering into review - Innovation, credit & repr...
Ciaran O'Neill & Amye Kenall: Peering into review - Innovation, credit & repr...GigaScience, BGI Hong Kong
 
Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing	for the Big-Data EraScott Edmunds at Tech4Dev on Open Publishing	for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data EraGigaScience, BGI Hong Kong
 
Effem Technologies, New Delhi, Thermolab Products
Effem Technologies, New Delhi, Thermolab ProductsEffem Technologies, New Delhi, Thermolab Products
Effem Technologies, New Delhi, Thermolab ProductsIndiaMART InterMESH Limited
 
Quick Start to Building a Cloud Service Practice
Quick Start to Building a Cloud Service PracticeQuick Start to Building a Cloud Service Practice
Quick Start to Building a Cloud Service PracticeRichard Tubb
 
Confessions of A Successful VAR
Confessions of A Successful VARConfessions of A Successful VAR
Confessions of A Successful VARRichard Tubb
 
Primavera Management Solutions Private Limited, Gurgaon, Promotional Wears fo...
Primavera Management Solutions Private Limited, Gurgaon, Promotional Wears fo...Primavera Management Solutions Private Limited, Gurgaon, Promotional Wears fo...
Primavera Management Solutions Private Limited, Gurgaon, Promotional Wears fo...IndiaMART InterMESH Limited
 

En vedette (6)

Ciaran O'Neill & Amye Kenall: Peering into review - Innovation, credit & repr...
Ciaran O'Neill & Amye Kenall: Peering into review - Innovation, credit & repr...Ciaran O'Neill & Amye Kenall: Peering into review - Innovation, credit & repr...
Ciaran O'Neill & Amye Kenall: Peering into review - Innovation, credit & repr...
 
Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing	for the Big-Data EraScott Edmunds at Tech4Dev on Open Publishing	for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data Era
 
Effem Technologies, New Delhi, Thermolab Products
Effem Technologies, New Delhi, Thermolab ProductsEffem Technologies, New Delhi, Thermolab Products
Effem Technologies, New Delhi, Thermolab Products
 
Quick Start to Building a Cloud Service Practice
Quick Start to Building a Cloud Service PracticeQuick Start to Building a Cloud Service Practice
Quick Start to Building a Cloud Service Practice
 
Confessions of A Successful VAR
Confessions of A Successful VARConfessions of A Successful VAR
Confessions of A Successful VAR
 
Primavera Management Solutions Private Limited, Gurgaon, Promotional Wears fo...
Primavera Management Solutions Private Limited, Gurgaon, Promotional Wears fo...Primavera Management Solutions Private Limited, Gurgaon, Promotional Wears fo...
Primavera Management Solutions Private Limited, Gurgaon, Promotional Wears fo...
 

Similaire à Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in the “Big-Data” Era

GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience, BGI Hong Kong
 
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...GigaScience, BGI Hong Kong
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...GigaScience, BGI Hong Kong
 
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraScott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraGigaScience, BGI Hong Kong
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)Dag Endresen
 
ischools future of data managemente dec2017
ischools future of data managemente dec2017ischools future of data managemente dec2017
ischools future of data managemente dec2017ARDC
 
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...GigaScience, BGI Hong Kong
 
The Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesThe Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesWilliam Gunn
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Richard Bookman
 
We need to solve more that just our access problems
We need to solve more that just our access problemsWe need to solve more that just our access problems
We need to solve more that just our access problemsBjörn Brembs
 

Similaire à Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in the “Big-Data” Era (20)

GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...
Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open ...
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
 
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraScott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)
 
ischools future of data managemente dec2017
ischools future of data managemente dec2017ischools future of data managemente dec2017
ischools future of data managemente dec2017
 
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
 
The Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesThe Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United States
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014
 
We need to solve more that just our access problems
We need to solve more that just our access problemsWe need to solve more that just our access problems
We need to solve more that just our access problems
 

Plus de GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...GigaScience, BGI Hong Kong
 
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"eventSusanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"eventGigaScience, BGI Hong Kong
 

Plus de GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
 
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"eventSusanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
 

Dernier

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Dernier (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in the “Big-Data” Era

  • 1. Scott Edmunds, GigaScience/BGI Hong Kong ICG7, Hong Kong, 1st December 2012 www.gigasciencejournal.com
  • 2. The challenges integrating papers + data: Technical issues: •Data volumes: (1.2 zettabytes generated globally each year) •>Exponential growth of genomics data •Technical challenges (VMs/cloud, compression) Cultural issues: •Lack of incentives (Data DOIs) •Data licensing (CC-BY, CC0) •Journal/funder policies Source: 1. Mervis J. U.S. science policy. Agencies rally to tackle big data. Science. 2012 Apr 6;336(6077):22.
  • 3. The challenges integrating papers + data: Technical issues: •Data volumes: (1.2 zettabytes generated globally each year) •>Exponential growth of genomics data •Technical challenges (VMs/cloud, compression) Cultural issues: •Lack of incentives (Data DOIs) •Data licensing (CC-BY, CC0) •Journal/funder policies Source: 1. Mervis J. U.S. science policy. Agencies rally to tackle big data. Science. 2012 Apr 6;336(6077):22. * T-Shirts available from Graham Steel / http://www.zazzle.co.uk/steelgraham
  • 4. Why is this important? • Transparency • Reproducibility • Re-use “Faked research is endemic in China” Source: New Scientist, 17th Nov 2012: http://www.newscientist.com/article/mg21628910.300-fraud-fighter-faked-research-is-endemic-in-china.html
  • 5. Why is this important? 475, 267 (2011) ―Wide distribution of information is key to scientific progress, yet traditionally, Chinese scientists have not systematically released data or research findings, even after publication.― ―There have been widespread complaints from scientists inside and outside China about this lack of transparency. ‖ ―Usually incomplete and unsystematic, [what little supporting data released] are of little value to researchers and there is evidence that this drives down a paper's citation numbers.‖ Source: Nature 475, 267 (2011) http://www.nature.com/news/2011/110720/full/475267a.html?
  • 6. Global Issue: increasing number of retractions >15X increase in last decade Strong correlation of ―retraction index‖ with higher impact factor 1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
  • 7. Global Issue: unrepeatability of scientific results Out of 18 microarray papers, results from 10 could not be reproduced Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 149-155.
  • 8. Sharing aids authors… Sharing Detailed Research Data Is Associated with Increased Citation Rate. Piwowar HA, Day RS, Fridsma DB (2007) PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308 Every 10 datasets collected contributes to at least 4 papers in the following 3-years. Piwowar, HA, Vision, TJ, & Whitlock, MC (2011). Data archiving is a good investment Nature, 473 (7347), 285-285 DOI: 10.1038/473285a
  • 9. Rice v Wheat: consequences of publically available genome data. rice wheat 700 600 500 400 300 200 100 0
  • 10. Our first DOI: To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as: Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY- 2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001 To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
  • 11.
  • 12.
  • 13.
  • 14. Downstream consequences: 1. Citations (~100) 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons 4. Example for faster & more open science ―Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and publish their work without wasting time on legal wrangling.‖
  • 15. 1.3 The power of intelligently open data The benefits of intelligently open data were powerfully illustrated by events following an outbreak of a severe gastro- intestinal infection in Hamburg in Germany in May 2011. This spread through several European countries and the US, affecting about 4000 people and resulting in over 50 deaths. All tested positive for an unusual and little-known Shiga-toxin–producing E. coli bacterium. The strain was initially analysed by scientists at BGI-Shenzhen in China, working together with those in Hamburg, and three days later a draft genome was released under an open data licence. This generated interest from bioinformaticians on four continents. 24 hours after the release of the genome it had been assembled. Within a week two dozen reports had been filed on an open- source site dedicated to the analysis of the strain. These analyses provided crucial information about the strain’s virulence and resistance genes – how it spreads and which antibiotics are effective against it. They produced results in time to help contain the outbreak. By July 2011, scientists published papers based on this work. By opening up their early sequencing results to international collaboration, researchers in Hamburg produced results that were quickly tested by a wide range of experts, used to produce new knowledge and ultimately to control a public health emergency.
  • 16. Not just (data) quantity, but quality 1. Lack of sufficient metadata 2. Lack of interoperability 1. Long tail of curation (“Democratization” of “Big-Data”)
  • 17. Not just (data) quantity, but quality Better handling of metadata… Novel tools/formats for data interoperability/handling. Cloud solutions?
  • 18. Not just (data) quantity, but quality Tools making work more easily reproducible… Interoperability/Ease of use Workflows Data quality assessment
  • 19. Large-Scale Data Journal/Database In conjunction with: Editor-in-Chief: Laurie Goodman, PhD Editor: Scott Edmunds, PhD Commisioning Editor: Nicole Nogoy, PhD Lead Curator: Tam Sneddon D.Phil Data Platform: Peter Li, PhD www.gigasciencejournal.com
  • 20. Addressing the reproducibility gap: Computable methods/workflow systems Bioinformatics Development Biomedical and bioinformatics research Publishing
  • 21. Redefining what is a paper in the era of big-data? goal: Executable Research Objects Citable DOI
  • 23. Anatomy of a Publication Idea Study Metadata Data Analysis Answer
  • 24. Anatomy of a Data Publication Idea Study Metadata Data Analysis Answer
  • 25. Publication • Background • Methods • Results (Data) • Conclusions/Discussion doi:10.1186/2047-217X-1-3
  • 26. Data Publication • Background • Methods • Results (Data) doi:10.5524/100035 • Conclusions/Discussion doi:10.1186/2047-217X-1-3
  • 27. Methods + Data + Publication • Background • Methods DOI for workflows? • Results (Data) doi:10.5524/100035 • Conclusions/Discussion doi:10.1186/2047-217X-1-3
  • 28. Data Methods Analysis doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3 DOI: A + DOI: X = DOI: 1
  • 29. Data Methods Analysis doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3 DOI: A + DOI: X = DOI: 1 DOI: B + DOI: X = DOI: 2
  • 30. Data Methods Analysis doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3 DOI: A + DOI: X = DOI: 1 DOI: B + DOI: X = DOI: 2 DOI: A + DOI: Y = DOI: 3
  • 31. Data Methods Analysis doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3 DOI: A + DOI: X = DOI: 1 DOI: B + DOI: X = DOI: 2 DOI: A + DOI: Y = DOI: 3 A, B, C… X, Y, Z… = 4, 5, 6…
  • 32. Different shaped publishable objects Data Papers Executable (Methods) Papers Analysis Papers
  • 33. Different shaped publishable objects Different levels of granularity Experiment e.g. doi:10.5524/100001 Papers (e.g. ACRG project) e.g. doi:10.5524/100001-2 Data/ Datasets Micropubs (e.g. cancer type) e.g. doi:10.5524/100001-2000 Sample or doi:10.5524/100001_xyz (e.g. specimen xyz) Smaller still? Facts/Assertions (~1014 in literature) Nanopubs
  • 34. Adding “value” publishing data • Scope for different shaped publishable objects • Scope for publishing methods/executable papers • Peer review of data problematic – Post publication peer review – Change criteria (assess on transparency/access only) – Better use of workflows/cloud/VMs DOIs are cheap*, data is precious: maximise its use * ish
  • 36. Thanks to: Shaoguang Liang (BGI-SZ) Laurie Goodman Tin-Lap Lee (CUHK) Tam Sneddon Huayen Gao (CUHK) Nicole Nogoy Qiong Luo (HKUST) Alexandra Basford Senghong Wang (HKUST) Peter Li Yan Zhou (HKUST) Jesse Si Zhe Cogini editorial@gigasciencejournal.com Contact us: database@gigasciencejournal.com @gigascience Follow us: facebook.com/GigaScience blogs.openaccesscentral.com/blogs/gigablog/ www.gigadb.org www.gigasciencejournal.com

Notes de l'éditeur

  1. Leading on from that, current and future plans include collaborating with Tin-Lap Lee at the Chinese University of Hong Kong to integrate an instance of the Galaxy bioinformatics platform with GigaDB so users can make full use of the data in GigaDB by linking it to other resources and we can incorporate fully executable papers. One such submission is a new SOAPdenovo pipeline. The SOAP tools have been wrapped in Galaxy, the workflow defined in MyExperiment and the data will be issued with a DOI and accessible via GigaDB. Utilizing the BGI cloud if necessary, users will then be able to reproduce all the steps described in the GigaScience paper to test, reanalyze, compare results etc.Since we would like GigaDB to be a host for data types that have no other home, such as imaging data, we are investigating adding other tools such as an image viewer and the like to support accessibility to and usability of the data. So, if you have a large-scale biological or biomedical dataset and/or a pipeline or software that you would like to submit to GigaScience we would love to hear from you so please come and talk to Scott or myself.
  2. That just leaves me to thank the GigaScience team: Laurie, Scott, Alexandra, Peter and Jesse, BGI for their support - specifically Shaoguang for IT and bioinformatics support – our collaborators on the database, website and tools: Tin-Lap, Qiong, Senhong, Yan, the Cogini web design team, Datacite for providing the DOI service and the isacommons team for their support and advocacy for best practice use of metadata reporting and sharing.Thank you for listening.