The document discusses the Barcode of Life project and database. It began in 2003 with the goal of creating a global reference library of DNA barcode sequences linked to specimens and species names. The Barcode of Life Database was established as a community workbench. Sequence data is submitted to GenBank and undergoes quality checks before being added to the barcode database. Challenges include ensuring high quality data with links to specimens and taxonomic identifications.
A Critique of the Proposed National Education Policy Reform
Fourth International Barcode of Life Conference Summary
1. Ilene Mizrachi
November 30, 2011
Fourth International Barcode
of Life Conference
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
2. Barcode Project -2003 and beyond
Barcode of Life project was initiated at in 2003
INSDC would be the repository for raw and assembled
sequence data
INSDC adopts new source fields to accommodate
Barcode metadata requirements
Barcode of Life Database (BOLD) established as a
community workbench and sequencing center
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
3. What is a Barcode?
A global reference library of DNA barcode sequences
that is integrated with other systems of biodiversity
information (e.g., databases of
specimens, species, biogeographic information).
Mechanism to link DNA sequences to vouchered
specimens and valid species names.
A reserved BARCODE keyword was adopted for data
that met strict barcode standards
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
4. Barcode Standard
Formally described species or a provisional label for an unpublished species
Voucher specimen identifier, preferably in a biorepository using a structured
field
Country-Code using the controlled vocabulary used by GenBank;
Sequence from a gene region specified by the CBOL
COI for animals
matK and rbcL for plants
ITS for fungi
Contain at least 75% contiguous, high quality bases from within the approved
region
Electropherogram trace files for bidirectional sequencing runs
Sequences of all forward and reverse primers
Strongly recommended data elements
GPS coordinates
Name of the identifier
Name of the collector
Date of collection
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
11. QA checks at GenBank
To ensure that the sequence data is of high quality, the
following checks are run:
Barcode data element compliance
Consistency checks such as:
reported latitude-longitude falls within cited country
collection date has already occurred
Sequence quality checks
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
13. Checking Sequence Quality
• Trim primer sequences
• Check congruence between
fwd and reverse reads
• Align sequences to check for
gaps
• Translate sequences to check
for internal stops
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
14. Updates Are Critical
Primary data repository – sequence records owned by
submitter
Submitter is responsible for providing additional data
and metadata as it becomes available:
Publication
Sequence
Taxonomy
Voucher
Third party updates are welcome!
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
15. Challenges
If Reference Barcodes are to be used for species
identification, phylogenetics, ecological
forensics, conservation, and macro-analysis of
biodiversity patterns, then the minimal requirement
should be (a) high quality sequence (b) link to
specimen and (c) taxonomic identification
Need to support rapid data release including
preliminary taxonomic classifications similar to “Fort
Lauderdale Principles” of genomics community
Data updated asynchronously at BOLD and in
GenBank. Need to continue work on update channel
Need to work with communities to devise strict QA
tests for plant and fungal Barcodes
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
16. Acknowledgements
Taxonomy Group GenBank Group
Scott Federhen Susan Schafer
Conrad Schoch Michael Fetchko
Lu Sun
Carol Hotton Software Support
Detlef Leipe Colleen Bollin
Kamen Todorov
Vasuki Gobu
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA