SlideShare une entreprise Scribd logo
1  sur  50
ICG-Europe Meeting, 24th May 2012                     Scott Edmunds


Data dissemination in the era of “big data”
William Gibson: "Information is the currency of the future world”

Sir Tim Berners-Lee: "Data is a precious thing and will last longer than the systems
themselves”




                     www.gigasciencejournal.com
                                                                Image: s-ariga cc/Flickr
Is data “the new oil”?
1.2 zettabytes (1021) of electronic data generated each year1




 Data
Deluge?

1. Mervis J. U.S. science policy. Agencies rally to tackle big data. Science. 2012 Apr 6;336(6077):22.
Global Sequencing Capacity




                        Data Production
                          5.6 Tb / day
                > 1500X of human genome / day

                Multiple Supercomputing Centers
                       157 TB   Flops
                       20 TB Memory
                       14.7 PB Storage
BGI Sequencing Capacity




           Sequencers                 Data Production
137   Illumina/HiSeq 2000               5.6 Tb / day
27    LifeTech/SOLiD 4        > 1500X of human genome / day
1     454 GS FLX+                              137

2     Illumina iScan          Multiple Supercomputing Centers
1     Illumina MiSeq                 157 TB   Flops
1     Ion Torrent                    20 TB Memory
                                     14.7 PB Storage
Now taking submissions…




    Large-Scale Data:
Journal/Database/Platform
      In conjunction with:

Editor-in-Chief: Laurie Goodman, PhD
Editor: Scott Edmunds, PhD
Assistant Editor: Alexandra Basford, PhD
Lead BioCurator: Tam Sneddon, Dphil
Data Platform: Peter Li, PhD
    www.gigasciencejournal.com
Data-data everywhere?
Data Silo’s


                          Interoperability
               Paywalls
Metadata
           $       ©
There are many hurdles…




          ?
There are many hurdles…

Technical:   too large volumes
             too heterogeneous
             no home for many data types
             too time consuming

Cultural:    inertia
             no incentives to share
             unaware of how
                      ?
Technical challenges…
Better handling of metadata…
Novel tools/formats for data interoperability/handling.
       Cloud
     solutions?
Technical challenges…
 Tools making work more easily reproducible…

Interoperability/Ease of use   Workflows




Data quality assessment
Technical challenges…
More efficient handling of data…

     Cloud?


Do we need to keep everything?

Compression?
Cultural challenges…
Data Re-use
Effort




($)



           Usability
Need to lower the hurdles…
Effort




($)



                  Usability
Better incentives?
Effort




($)



              Usability
Incentives/credit
Credit where credit is overdue:
“One option would be to provide researchers who release data to
public repositories with a means of accreditation.”
“An ability to search the literature for all online papers that used a
particular data set would enable appropriate attribution for those
who share. “
Nature Biotechnology 27, 579 (2009)


Prepublication data sharing
(Toronto International Data Release Workshop)
“Data producers benefit from creating a citable reference, as it can
later be used to reflect impact of the data sets.”
Nature 461, 168-170 (2009)
Datacitation: Datacite and DOIs
Digital Object Identifiers (DOIs)




                                      
 offer a solution

 Mostly widely used identifier for               Dataset
  scientific articles                             Yancheva et al (2007). Analyses on
 Researchers, authors, publishers                sediment of Lake Maar. PANGAEA.
  know how to use them                            doi:10.1594/PANGAEA.587840
 Put datasets on the same playing
  field as articles


                                “increase acceptance of research data as
             Aims to:           legitimate, citable contributions to the
                                scholarly record”.

                                 “data generated in the course of research
                                 are just as valuable to the ongoing academic
                                 discourse as papers and monographs”.
Datacitation: Datacite and DOIs
       Central metadata repository:
• >1 million entries to date

• Stability

• Data discoverability

• Open & harvestable

• Potential to track &
  credit use
Data publishing/DOI
        New journal format combines standard manuscript
        publication with an extensive database to host all
        associated data, and integrated tools.
         Data hosting will follow standard funding agency
        and community guidelines.
        DOI assignment available for submitted data to
        allow ease of finding and citing datasets, as well as for
        citation tracking.
        www.gigasciencejournal.com
Data Publishing




www.gigaDB.org
BGI Datasets Get DOI®s
Invertebrate
                                            Many released pre-publication…
Ant                                                    PLANTS
- Florida carpenter ant                                Chinese cabbage
                             Vertebrates
- Jerdon’s jumping ant                                 Cucumber
                             Giant panda Macaque
- Leaf-cutter ant                                      Foxtail millet
                             - Chinese rhesus
Roundworm                                              Pigeonpea
                             - Crab-eating
Schistosoma                                            Potato
                             Mini-Pig
Silkworm                                               Sorghum
                             Naked mole rat
                             Penguin
Human                        - Emperor penguin
Asian individual (YH)        - Adelie penguin
- DNA Methylome              Pigeon, domestic
- Genome Assembly            Polar bear
- Transcriptome              Sheep
                                                           doi:10.5524/100004

Cancer (14TB)                Tibetan antelope
Ancient DNA                  Microbe
- Saqqaq Eskimo              E. Coli O104:H4 TY-2482
- Aboriginal Australian
                             Cell-Line
                             Chinese Hamster Ovary
For data citation to work, needs:

• Proven utility/potential user base.

• Acceptance/inclusion by journals.

• Data+Citation: inclusion in the references.

• Tracking by citation indexes.

• Usage of the metrics by the community…
Data+Citation: inclusion in the references
• Data submitted to NCBI databases:
-   Raw data                      SRA:SRA046843
-   Assemblies of 3 strains       Genbank:AHAO00000000-AHAQ00000000
-   SNPs                          dbSNP:1056306
-   CNVs
-
-
    InDels
    SV                        }   dbGAP:nstd63


• Submission to public databases complemented by
  its citable form in GigaDB.
In the references…
Is the DOI…
And now in Nature Biotech…
Datacitation: tracking?
          DataCite metadata in harvestable form (OAI-PMH)

Plans in 2012 to link central metadata repository with WoS

            - Will finally track and credit use!




                                 To be continued…
Final step: open licensing
Our first DOI:


To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J;
Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y;
Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X;
Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-
2482 isolate genome sequencing consortium (2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen.
doi:10.5524/100001
http://dx.doi.org/10.5524/100001

            To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
            Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
Other consequences: speed/legal-freedom




“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin
of the Escherichia coli strain that infected roughly 4,000 people in Germany between
May and July. But he knew it that might take days for the lawyers at his company —
Pacific Biosciences — to parse the agreements governing how his team could use data
collected on the strain. Luckily, one team had released its data under a Creative
Commons licence that allowed free use of the data, allowing Kasarskis and his
colleagues to join the international research effort and publish their work without
wasting time on legal wrangling.”
The era of the data consumer?
The era of the data consumer?



?
The era of the data consumer?
Free access to data – but analysis hubs/nodes for will form around it




  ?
GDSAP: Genomic Data Submission
              and Analytical platform
                                 Big data
                                 from the
Data, Data, Data…              “Sequencing
                                 Oil Field”




                    Data
                   Modeling


              Pipeline
               design
                                                       Tin-Lap Lee, CUHK

                  Validation



            Commercial
            applications


                                              “Apps”
GDSAP: Genomic Data Submission
       and Analytical platform
GDSAP: Genomic Data Submission
       and Analytical platform

   mirror/open platform
Papers in the era of big-data
        $1000 genome = million $ peer-review?

     To review:                                                    (>6TBp, >1500 datasets)



                               S3 =                                              $15,000
                               EC2 (BLASTx) =                                    $500,000
Source: Folker Meyer/Wilkening et al. 2009, CLUSTER'09. IEEE International Conference on Cluster Computing and Workshops
Papers in the era of big-data
       goal: Executable Research Objects




                              Citable DOI
Papers in the era of big-data
   Interested in Reproducible Research?
Take part in our session on: “Cloud and workflows for reproducible bioinformatics”




Submit to:
• Rapid review/Open Access/High-visibility
• Article Processing Charge covered by BGI
• Hosting of any test datasets/workflows in GigaDB
Thanks to:
Laurie Goodman         Alexandra Basford
Tam Sneddon/Peter Li   Shaoguang Liang
Tin-Lap Lee (CUHK)     Qiong Luo (HKUST)
                          scott@gigasciencejournal.com
Contact us:
                          editorial@gigasciencejournal.com



                            @gigascience

Follow us:                  facebook.com/GigaScience

                            blogs.openaccesscentral.com/blogs/gigablog/


            www.gigasciencejournal.com

Contenu connexe

Tendances

Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysisTin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysisGigaScience, BGI Hong Kong
 
Cassava genome hub
Cassava genome hubCassava genome hub
Cassava genome hubCIAT
 
Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomi...
Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomi...Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomi...
Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomi...GigaScience, BGI Hong Kong
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
 
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Dag Endresen
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
 
L&P Eric Celeste - SHARE
L&P Eric Celeste -  SHAREL&P Eric Celeste -  SHARE
L&P Eric Celeste - SHARECASRAI
 
SemanticCampLondon, 16th February 2008
SemanticCampLondon, 16th February 2008SemanticCampLondon, 16th February 2008
SemanticCampLondon, 16th February 2008Andrew Walkingshaw
 
Jonathan Izant AAAS Annual Meeting 2012-02-18
Jonathan Izant AAAS Annual Meeting 2012-02-18Jonathan Izant AAAS Annual Meeting 2012-02-18
Jonathan Izant AAAS Annual Meeting 2012-02-18Sage Base
 
How to share useful data
How to share useful dataHow to share useful data
How to share useful dataPeter McQuilton
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingGigaScience, BGI Hong Kong
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynotec.titus.brown
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsJohn Kunze
 

Tendances (20)

Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysisTin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
 
Cassava genome hub
Cassava genome hubCassava genome hub
Cassava genome hub
 
Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomi...
Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomi...Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomi...
Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomi...
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
 
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
L&P Eric Celeste - SHARE
L&P Eric Celeste -  SHAREL&P Eric Celeste -  SHARE
L&P Eric Celeste - SHARE
 
SemanticCampLondon, 16th February 2008
SemanticCampLondon, 16th February 2008SemanticCampLondon, 16th February 2008
SemanticCampLondon, 16th February 2008
 
Jonathan Izant AAAS Annual Meeting 2012-02-18
Jonathan Izant AAAS Annual Meeting 2012-02-18Jonathan Izant AAAS Annual Meeting 2012-02-18
Jonathan Izant AAAS Annual Meeting 2012-02-18
 
EZID: Easy Persistent Identifiers and Data Citation
EZID: Easy Persistent Identifiers and Data CitationEZID: Easy Persistent Identifiers and Data Citation
EZID: Easy Persistent Identifiers and Data Citation
 
How to share useful data
How to share useful dataHow to share useful data
How to share useful data
 
Building Data
Building DataBuilding Data
Building Data
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data Publishing
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History Collections
 

En vedette

Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing	for the Big-Data EraScott Edmunds at Tech4Dev on Open Publishing	for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data EraGigaScience, BGI Hong Kong
 
Three Examples of MSP's Making Money with Autotask
Three Examples of MSP's Making Money with AutotaskThree Examples of MSP's Making Money with Autotask
Three Examples of MSP's Making Money with AutotaskRichard Tubb
 
Padmavati Creations, Jaipur, Fancy Diamond Jewellery
Padmavati Creations, Jaipur, Fancy Diamond JewelleryPadmavati Creations, Jaipur, Fancy Diamond Jewellery
Padmavati Creations, Jaipur, Fancy Diamond JewelleryIndiaMART InterMESH Limited
 
Ancalima Lifesciences Ltd., Sonipat, Cosmetic/ Pharmaceutical/ Veterinary For...
Ancalima Lifesciences Ltd., Sonipat, Cosmetic/ Pharmaceutical/ Veterinary For...Ancalima Lifesciences Ltd., Sonipat, Cosmetic/ Pharmaceutical/ Veterinary For...
Ancalima Lifesciences Ltd., Sonipat, Cosmetic/ Pharmaceutical/ Veterinary For...IndiaMART InterMESH Limited
 
Projectline Materials, Vadodara, Electrical, Earthing and Aluminium Products
Projectline Materials, Vadodara, Electrical, Earthing and Aluminium ProductsProjectline Materials, Vadodara, Electrical, Earthing and Aluminium Products
Projectline Materials, Vadodara, Electrical, Earthing and Aluminium ProductsIndiaMART InterMESH Limited
 
Surface International, Jodhpur, Blasting Machines and Construction Equipment
Surface International, Jodhpur, Blasting Machines and Construction EquipmentSurface International, Jodhpur, Blasting Machines and Construction Equipment
Surface International, Jodhpur, Blasting Machines and Construction EquipmentIndiaMART InterMESH Limited
 
Vedh Techno Engineers Pvt. Ltd., Thane, Ehrle Cold Water High Pressure Cleaner
Vedh Techno Engineers Pvt. Ltd., Thane, Ehrle Cold Water High Pressure CleanerVedh Techno Engineers Pvt. Ltd., Thane, Ehrle Cold Water High Pressure Cleaner
Vedh Techno Engineers Pvt. Ltd., Thane, Ehrle Cold Water High Pressure CleanerIndiaMART InterMESH Limited
 
Nicole Nogoy at the G3 Workshop: Open Access Publishing - What you need to Know
Nicole Nogoy at the G3 Workshop: Open Access Publishing - What you need to KnowNicole Nogoy at the G3 Workshop: Open Access Publishing - What you need to Know
Nicole Nogoy at the G3 Workshop: Open Access Publishing - What you need to KnowGigaScience, BGI Hong Kong
 
VS Metals, Mumbai, Electrical Installations & Equipments
VS Metals, Mumbai, Electrical Installations & EquipmentsVS Metals, Mumbai, Electrical Installations & Equipments
VS Metals, Mumbai, Electrical Installations & EquipmentsIndiaMART InterMESH Limited
 
Parjapati Steel Udyog Pvt. Ltd., Sirsa, Fabricated Products
Parjapati Steel Udyog Pvt. Ltd., Sirsa, Fabricated ProductsParjapati Steel Udyog Pvt. Ltd., Sirsa, Fabricated Products
Parjapati Steel Udyog Pvt. Ltd., Sirsa, Fabricated ProductsIndiaMART InterMESH Limited
 
K Tek Analytics, Mumbai, Balances & Analytical Instruments
K Tek Analytics, Mumbai, Balances & Analytical InstrumentsK Tek Analytics, Mumbai, Balances & Analytical Instruments
K Tek Analytics, Mumbai, Balances & Analytical InstrumentsIndiaMART InterMESH Limited
 

En vedette (11)

Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing	for the Big-Data EraScott Edmunds at Tech4Dev on Open Publishing	for the Big-Data Era
Scott Edmunds at Tech4Dev on Open Publishing for the Big-Data Era
 
Three Examples of MSP's Making Money with Autotask
Three Examples of MSP's Making Money with AutotaskThree Examples of MSP's Making Money with Autotask
Three Examples of MSP's Making Money with Autotask
 
Padmavati Creations, Jaipur, Fancy Diamond Jewellery
Padmavati Creations, Jaipur, Fancy Diamond JewelleryPadmavati Creations, Jaipur, Fancy Diamond Jewellery
Padmavati Creations, Jaipur, Fancy Diamond Jewellery
 
Ancalima Lifesciences Ltd., Sonipat, Cosmetic/ Pharmaceutical/ Veterinary For...
Ancalima Lifesciences Ltd., Sonipat, Cosmetic/ Pharmaceutical/ Veterinary For...Ancalima Lifesciences Ltd., Sonipat, Cosmetic/ Pharmaceutical/ Veterinary For...
Ancalima Lifesciences Ltd., Sonipat, Cosmetic/ Pharmaceutical/ Veterinary For...
 
Projectline Materials, Vadodara, Electrical, Earthing and Aluminium Products
Projectline Materials, Vadodara, Electrical, Earthing and Aluminium ProductsProjectline Materials, Vadodara, Electrical, Earthing and Aluminium Products
Projectline Materials, Vadodara, Electrical, Earthing and Aluminium Products
 
Surface International, Jodhpur, Blasting Machines and Construction Equipment
Surface International, Jodhpur, Blasting Machines and Construction EquipmentSurface International, Jodhpur, Blasting Machines and Construction Equipment
Surface International, Jodhpur, Blasting Machines and Construction Equipment
 
Vedh Techno Engineers Pvt. Ltd., Thane, Ehrle Cold Water High Pressure Cleaner
Vedh Techno Engineers Pvt. Ltd., Thane, Ehrle Cold Water High Pressure CleanerVedh Techno Engineers Pvt. Ltd., Thane, Ehrle Cold Water High Pressure Cleaner
Vedh Techno Engineers Pvt. Ltd., Thane, Ehrle Cold Water High Pressure Cleaner
 
Nicole Nogoy at the G3 Workshop: Open Access Publishing - What you need to Know
Nicole Nogoy at the G3 Workshop: Open Access Publishing - What you need to KnowNicole Nogoy at the G3 Workshop: Open Access Publishing - What you need to Know
Nicole Nogoy at the G3 Workshop: Open Access Publishing - What you need to Know
 
VS Metals, Mumbai, Electrical Installations & Equipments
VS Metals, Mumbai, Electrical Installations & EquipmentsVS Metals, Mumbai, Electrical Installations & Equipments
VS Metals, Mumbai, Electrical Installations & Equipments
 
Parjapati Steel Udyog Pvt. Ltd., Sirsa, Fabricated Products
Parjapati Steel Udyog Pvt. Ltd., Sirsa, Fabricated ProductsParjapati Steel Udyog Pvt. Ltd., Sirsa, Fabricated Products
Parjapati Steel Udyog Pvt. Ltd., Sirsa, Fabricated Products
 
K Tek Analytics, Mumbai, Balances & Analytical Instruments
K Tek Analytics, Mumbai, Balances & Analytical InstrumentsK Tek Analytics, Mumbai, Balances & Analytical Instruments
K Tek Analytics, Mumbai, Balances & Analytical Instruments
 

Similaire à Scott Edmunds: Data Dissemination in the era of "Big-Data"

Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceGigaScience, BGI Hong Kong
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationGigaScience, BGI Hong Kong
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience, BGI Hong Kong
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
 
Health Sciences Driving UCSD Research Cyberinfrastructure
Health Sciences Driving UCSD Research CyberinfrastructureHealth Sciences Driving UCSD Research Cyberinfrastructure
Health Sciences Driving UCSD Research CyberinfrastructureLarry Smarr
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of ScienceGlobus
 
DataCite - services and support for opening up research data
DataCite - services and support for opening up research dataDataCite - services and support for opening up research data
DataCite - services and support for opening up research dataHerbert Gruttemeier
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016Jisc
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics David Shorthouse
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds
 

Similaire à Scott Edmunds: Data Dissemination in the era of "Big-Data" (20)

Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data Citation
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDB
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Cifar
CifarCifar
Cifar
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 
Health Sciences Driving UCSD Research Cyberinfrastructure
Health Sciences Driving UCSD Research CyberinfrastructureHealth Sciences Driving UCSD Research Cyberinfrastructure
Health Sciences Driving UCSD Research Cyberinfrastructure
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
 
Big Data
Big Data Big Data
Big Data
 
DataCite - services and support for opening up research data
DataCite - services and support for opening up research dataDataCite - services and support for opening up research data
DataCite - services and support for opening up research data
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
 
Whither Small Data?
Whither Small Data?Whither Small Data?
Whither Small Data?
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 

Plus de GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 

Plus de GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Dernier

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Scott Edmunds: Data Dissemination in the era of "Big-Data"

  • 1. ICG-Europe Meeting, 24th May 2012 Scott Edmunds Data dissemination in the era of “big data” William Gibson: "Information is the currency of the future world” Sir Tim Berners-Lee: "Data is a precious thing and will last longer than the systems themselves” www.gigasciencejournal.com Image: s-ariga cc/Flickr
  • 2. Is data “the new oil”? 1.2 zettabytes (1021) of electronic data generated each year1 Data Deluge? 1. Mervis J. U.S. science policy. Agencies rally to tackle big data. Science. 2012 Apr 6;336(6077):22.
  • 3. Global Sequencing Capacity Data Production 5.6 Tb / day > 1500X of human genome / day Multiple Supercomputing Centers 157 TB Flops 20 TB Memory 14.7 PB Storage
  • 4. BGI Sequencing Capacity Sequencers Data Production 137 Illumina/HiSeq 2000 5.6 Tb / day 27 LifeTech/SOLiD 4 > 1500X of human genome / day 1 454 GS FLX+ 137 2 Illumina iScan Multiple Supercomputing Centers 1 Illumina MiSeq 157 TB Flops 1 Ion Torrent 20 TB Memory 14.7 PB Storage
  • 5. Now taking submissions… Large-Scale Data: Journal/Database/Platform In conjunction with: Editor-in-Chief: Laurie Goodman, PhD Editor: Scott Edmunds, PhD Assistant Editor: Alexandra Basford, PhD Lead BioCurator: Tam Sneddon, Dphil Data Platform: Peter Li, PhD www.gigasciencejournal.com
  • 7. Data Silo’s Interoperability Paywalls Metadata $ ©
  • 8. There are many hurdles… ?
  • 9. There are many hurdles… Technical: too large volumes too heterogeneous no home for many data types too time consuming Cultural: inertia no incentives to share unaware of how ?
  • 10. Technical challenges… Better handling of metadata… Novel tools/formats for data interoperability/handling. Cloud solutions?
  • 11. Technical challenges… Tools making work more easily reproducible… Interoperability/Ease of use Workflows Data quality assessment
  • 12. Technical challenges… More efficient handling of data… Cloud? Do we need to keep everything? Compression?
  • 13.
  • 14.
  • 17. Need to lower the hurdles… Effort ($) Usability
  • 19. Incentives/credit Credit where credit is overdue: “One option would be to provide researchers who release data to public repositories with a means of accreditation.” “An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “ Nature Biotechnology 27, 579 (2009) Prepublication data sharing (Toronto International Data Release Workshop) “Data producers benefit from creating a citable reference, as it can later be used to reflect impact of the data sets.” Nature 461, 168-170 (2009)
  • 20. Datacitation: Datacite and DOIs Digital Object Identifiers (DOIs)  offer a solution  Mostly widely used identifier for Dataset scientific articles Yancheva et al (2007). Analyses on  Researchers, authors, publishers sediment of Lake Maar. PANGAEA. know how to use them doi:10.1594/PANGAEA.587840  Put datasets on the same playing field as articles “increase acceptance of research data as Aims to: legitimate, citable contributions to the scholarly record”. “data generated in the course of research are just as valuable to the ongoing academic discourse as papers and monographs”.
  • 21. Datacitation: Datacite and DOIs Central metadata repository: • >1 million entries to date • Stability • Data discoverability • Open & harvestable • Potential to track & credit use
  • 22. Data publishing/DOI New journal format combines standard manuscript publication with an extensive database to host all associated data, and integrated tools.  Data hosting will follow standard funding agency and community guidelines. DOI assignment available for submitted data to allow ease of finding and citing datasets, as well as for citation tracking. www.gigasciencejournal.com
  • 24. BGI Datasets Get DOI®s Invertebrate Many released pre-publication… Ant PLANTS - Florida carpenter ant Chinese cabbage Vertebrates - Jerdon’s jumping ant Cucumber Giant panda Macaque - Leaf-cutter ant Foxtail millet - Chinese rhesus Roundworm Pigeonpea - Crab-eating Schistosoma Potato Mini-Pig Silkworm Sorghum Naked mole rat Penguin Human - Emperor penguin Asian individual (YH) - Adelie penguin - DNA Methylome Pigeon, domestic - Genome Assembly Polar bear - Transcriptome Sheep doi:10.5524/100004 Cancer (14TB) Tibetan antelope Ancient DNA Microbe - Saqqaq Eskimo E. Coli O104:H4 TY-2482 - Aboriginal Australian Cell-Line Chinese Hamster Ovary
  • 25. For data citation to work, needs: • Proven utility/potential user base. • Acceptance/inclusion by journals. • Data+Citation: inclusion in the references. • Tracking by citation indexes. • Usage of the metrics by the community…
  • 26. Data+Citation: inclusion in the references
  • 27. • Data submitted to NCBI databases: - Raw data SRA:SRA046843 - Assemblies of 3 strains Genbank:AHAO00000000-AHAQ00000000 - SNPs dbSNP:1056306 - CNVs - - InDels SV } dbGAP:nstd63 • Submission to public databases complemented by its citable form in GigaDB.
  • 28.
  • 31.
  • 32. And now in Nature Biotech…
  • 33. Datacitation: tracking? DataCite metadata in harvestable form (OAI-PMH) Plans in 2012 to link central metadata repository with WoS - Will finally track and credit use! To be continued…
  • 34.
  • 35. Final step: open licensing
  • 36. Our first DOI: To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as: Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY- 2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001 To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
  • 37.
  • 38.
  • 39.
  • 40. Other consequences: speed/legal-freedom “Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and publish their work without wasting time on legal wrangling.”
  • 41. The era of the data consumer?
  • 42. The era of the data consumer? ?
  • 43. The era of the data consumer? Free access to data – but analysis hubs/nodes for will form around it ?
  • 44. GDSAP: Genomic Data Submission and Analytical platform Big data from the Data, Data, Data… “Sequencing Oil Field” Data Modeling Pipeline design Tin-Lap Lee, CUHK Validation Commercial applications “Apps”
  • 45. GDSAP: Genomic Data Submission and Analytical platform
  • 46. GDSAP: Genomic Data Submission and Analytical platform mirror/open platform
  • 47. Papers in the era of big-data $1000 genome = million $ peer-review? To review: (>6TBp, >1500 datasets) S3 = $15,000 EC2 (BLASTx) = $500,000 Source: Folker Meyer/Wilkening et al. 2009, CLUSTER'09. IEEE International Conference on Cluster Computing and Workshops
  • 48. Papers in the era of big-data goal: Executable Research Objects Citable DOI
  • 49. Papers in the era of big-data Interested in Reproducible Research? Take part in our session on: “Cloud and workflows for reproducible bioinformatics” Submit to: • Rapid review/Open Access/High-visibility • Article Processing Charge covered by BGI • Hosting of any test datasets/workflows in GigaDB
  • 50. Thanks to: Laurie Goodman Alexandra Basford Tam Sneddon/Peter Li Shaoguang Liang Tin-Lap Lee (CUHK) Qiong Luo (HKUST) scott@gigasciencejournal.com Contact us: editorial@gigasciencejournal.com @gigascience Follow us: facebook.com/GigaScience blogs.openaccesscentral.com/blogs/gigablog/ www.gigasciencejournal.com

Notes de l'éditeur

  1. Our facilities feature Sanger and next-generation sequencing technologies, providing the highest throughput sequencing capacity in the world. Powered by 137 IlluminaHiSeq 2000 instruments and 27 Applied BiosystemsSOLiD™ 4 Systems, we provide, high-quality sequencing results with industry-leading turnaround time. As of December 2010, our sequencing capacity is 5 Tb raw data per day, supported by several supercomputing centers with a total peak performance up to 102 Tflops, 20 TB of memory, and 10 PB storage. We provide stable and efficient resources to store and analyze massive amounts of data generated by next generation sequencing.
  2. Our facilities feature Sanger and next-generation sequencing technologies, providing the highest throughput sequencing capacity in the world. Powered by 137 IlluminaHiSeq 2000 instruments and 27 Applied BiosystemsSOLiD™ 4 Systems, we provide, high-quality sequencing results with industry-leading turnaround time. As of December 2010, our sequencing capacity is 5 Tb raw data per day, supported by several supercomputing centers with a total peak performance up to 102 Tflops, 20 TB of memory, and 10 PB storage. We provide stable and efficient resources to store and analyze massive amounts of data generated by next generation sequencing.
  3. Helps reproducibility, but some debate over whether it can help that much regarding scaling.
  4. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  5. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  6. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  7. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.