GigaScience Editor-in-Chief Laurie Goodman's talk at the International Conference on Genomics pre-conference press-session on the release of new unpublished datasets, and a new look beta version of their database: GigaDB.org
GigaScience: data and beta-database launch. Announcing GigaDB
1. Announces the launch of
With the release of seventeen new genomic
datasets from both plants and animals
2. An upcoming open-access open-data journal and database
Innovative article publishing and
data hosting
… “big and sharable”
www.gigasciencejournal.com
Published by BGI in
partnership with BioMed Central
3. About The Journal
Open access, open-data online journal optimized for the publication of all
types of biological studies that use or create large-scale data sets.
Novel publication format that combines standard manuscript publication
with an extensive database that hosts all associated data.
Scope includes studies from the entire spectrum of life and biomedical sciences,
including imaging, neuroscience, ecology, medicine, ‘omics, and other types of
large-scale shareable data.
Data are released under a CC0 license, making them, as much as possible under
law, in the public domain so that others may freely use these for any purposes
without restriction under copyright or database law.
Editorial interaction with the different biological communities to determine
the best means of hosting and accessing their type of data.
Integrated tools to promote more widespread access, viewing, and analysis
of the stored data.
BGI Cloud Computing resources for handling and analyzing large-scale data.
All Data given a DOI to allow ease of finding and citing datasets, as well as for
citation tracking.
4. Why DOI®s?
– Clear method for data tracking and data citation, allowing:
• Increased searchability (and use) of data
• Credit for data production, making it clear who produced
the data and when
• The ability to track and receive feedback on data usage
• Credit to original authors for their data’s use
• A data citation metric potentially rivaling and
complementary to the impact factor
• The potential to publish papers relating to a dataset, while
making the data available and receiving credit for it earlier
5. Our first DOI®:
To maximize its utility to the research community and aid those fighting the current
epidemic, genomic data is released here into the public domain under a CC0
license. Until the publication of research papers on the assembly and whole-
genome analysis of this isolate we would ask you to cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J;
Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y;
Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X;
Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-
2482 isolate genome sequencing consortium (2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen.
doi:10.5524/100001
http://dx.doi.org/10.5524/100001
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
6. Nine Previously Available Datasets with DOIs
Animals
Giant panda (Ailuropoda melanoleuca)
Macaque
Chinese rhesus macaque (Macaca mulatta lasiota)
Crab-eating macaque (Macaca fascicularis)
Penguin
Emperor penguin (Aptenodytes forsteri)
Adelie penguin (Pygoscelis adeliae)
Pigeon, domestic (Columba livia domestica)
Polar bear (Ursus maritimus)
Microbes
E. coli (Escherichia coli) O104:H4 strain TY-2482
Cell Lines
CHO-K1 - Chinese hamster (Cricetulus griseus) ovary cell line k1
8. Releasing During ICG-VI
Animals: Both Vertebrates and Invertebrates
Ant:
Florida carpenter ant (Camponotus floridanus)
Jerdon’s jumping ant (Harpegnathos saltator)
Leaf-cutter ant (Acromyrmex echinatior)
Human (Homo sapiens):
Asian individual (YH):
Genome Assembly Data
DNA Methylome of Blood Cells Data
Lymphoblastoid cell Transcriptome Data
Naked mole rat (Heterocephalus glaber)
Roundworm (Ascaris suum)
Sheep, domestic (Ovis aries)
Silkworm:
Domestic (Bombyx mori) and wild (Bombyx mandarina)
Multiple strains
Tibetan antelope (Pantholops hodgsonii)
9. Releasing During ICG-VI
Plants
Chinese cabbage (Brassica rapa)
Cucumber, domestic (Cucumis sativus var. sativus L.)
Foxtail millet (Setaria italica)
Pigeonpea (Cajanus cajan)
Potato (Solanum tuberosum L.)
Sorghum(Sorghum bicolor):
Two Strains: sweet and grain
Coming:
Additional Human Individuals
Aboriginal Australian
Saqqaq palaeoeskimo
And others that are currently under review
10. Datasets without published analysis papers
• Five of these datasets illustrate the future of early data release:
These datasets are being released before their analysis papers are
published.
• These data can now be used by the community and the data cited
with a DOI:
• This promotes very rapid data release, as the data producers can
receive citable credit— the primary means by which most
academicians receive career advancement.
• Thus, DOI and citation of data reduce the need to delay data release
until after publication of the more detailed data analysis paper.
(1) Foxtail millet; (2) Sorghum; (3) Human Asian
individual lymphoblastoid cell transcriptome data;
(4) Domestic Sheep; (5) Tibetan antelope
11.
12. GDSAP:Genomic Data Submission and Analytical platform
Big data
from the
Data, Data, Data… “Sequencing
Farm”
Data
Modeling
Tin-Lap Lee, CUHK
Pipeline
design
Validation
Commercial
applications “Apps”
13. First demonstration of New Gold
Standard for Data Citation
Dr. Clare Garvey, Editor of Genome Biology, has informed
us, and agreed for us to announce, that The sorghum genome
analysis paper has just been accepted in Genome Biology. It
will be published later this month, and that paper will include
the data citation in the references where it can be easily
tracked by Thompson ISI, and allow the easiest way currently
possible for readers today to find and use that data.
Zheng. L-Y; Guo X-S; He B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F;
Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C: Genome data
from sweet and grain sorghum (Sorghum bicolor). GigaScience
(2011). http://dx.doi.org/10.5524/100012
14. Editor-in-Chief: Laurie Goodman, PhD
Editor: Scott Edmunds, PhD
Assistant Editor: Alexandra Basford, PhD
Contact: editorial@gigasciencejournal.com
Follow GigaScience on Twitter @GigaScience
www.gigasciencejournal.com
www.gigaDB.org