Tracking progress through the laboratory pipeline, keeping all required products together, consistent data assessment, analysis-lab feedback loop, key elements of a data management database (LIMS)
Unit-IV; Professional Sales Representative (PSR).pptx
Amy Driskell - Information management and data Quality
1. Managing Data Flow Through the
Barcoding Pipeline
Amy Driskell
Laboratories of Analytical Biology (LAB)
National Museum of Natural History
Smithsonian Institution
2. What is the “pipeline”?
LIMS
Specimen
Data Deposition
Data QC
3. Outline
1. BEFORE the LIMS
2. LIMS
– Data recorded
– Exploring laboratory success/failure
– Tracking project completion
3. Data QC
– Criteria and data requirements
– Checking for contamination and validity
4. Critical data management BEFORE
specimen enters the laboratory pipeline
• Data elements (“metadata”) necessary for laboratory
processing:
– Taxonomy, collection information, etc.
IMPORTANT!
• Assess laboratory successes/failures in light of this
information
• Tailor/change lab protocols
5. Careful Metadata Collection at
Specimen Collection or Harvest
• Metadata can be formatted at the beginning of a
project (e.g. at specimen collection) to guarantee a
smooth information transfer into the LIMS
• Multiple sources for metadata:
– Spreadsheets
– Field Information Management Systems (FIMS)
– Museum databases
– Fusion tables
6. Rockin’ It “Old School” -- Spreadsheets
• Modified BOLD specimen spreadsheet for use in field/museum
• Additional fields desired by PIs
• Modified easily to interface with multiple kinds of databases
• 96-well format – 2D barcoded tubes, extraction plates
• NOT directly connected to other databases, including LIMS
7. An Elegant Solution:
BiocodeMoorea FIMS
Actively connected to their LIMS
http://biocode.berkeley.edu/
8. bioValidator – cleaning up the
collection of metadata
• Many aspects of metadata require specific formats:
digital lat/long, meters, names
• bioValidator enforces adherence to formatting and
other rules
• Photo matcher
http://biovalidator.sourceforge.net/
9. Museum Collection Databases
• Sampling directly from existing collections?
• Some museum databases cannot link directly
to lab-based information systems (LIMS)
• Requires output from collection
database, input into lab database – no
automatic updates
10. Why?
1. Downstream insertion of data into other databases simplified
2. Because metadata has important uses in the lab
• Determine possible causes of failure: taxonomy, collection
event, specimen age
• adjust extraction or amplification protocols
• design new primers – e.g. smaller fragments
11. Specimens enter the lab
Metadata enters the LIMS
LIMS
Specimen
&
Metadata
Data Deposition
Data QC
12. What is a LIMS?
• An electronic lab “notebook” (aka database) to
replace our traditional paper lab notebooks.
• Tracks a specimen through lab processes from
extraction through to barcode sequence completion
(data QC may use external software).
• Records every lab procedure.
• Provides information to guide further lab efforts –
success rates, “redo” lists
• Records the physical location of extracts, etc.
13. My requirements for a LIMS
• I want a system that records every piece of
information about each specimen/extract for
which I produce a barcode sequence.
• I want my procedures and protocols to be
transparent enough so that anyone can
reproduce my results.
• This includes my QC procedures.
• Currently no good place to publish these data.
14. Data to be recorded
• Extraction: protocol, digestion time, etc.
• PCR: recipes, DNA [ ], cycling parameters, clean-up
method (PCR machine, brand of enzyme, lot #)
• Gel photos
• Sequencing: recipe, clean-up, machine, etc.
• Bonus: success or failure can be mapped back to any
of these recorded values. Maybe the Taq was bad?
Or the PCR machine needs repair?
15. • A LIMS can be homegrown (like LAB’s barcoding
LIMS, or SI’s plant barcoding LIMS) – relatively simple
relational databases
• Sophisticated, commercially produced – Geneious
plug-in MooreaBiocode LIMS (plug-in is free)
•Software updated and maintained
•Plugs into the Geneious data analysis software
http://software.mooreabiocode.org
18. Tracking project progress
& identifying next steps
• Which specimens have
completed barcodes?
• Which specimens need
additional labwork?
• Which specimens should be
abandoned?
• Where are the original DNA
extracts or tissue samples?
20. Raw data enters the QC process
LIMS
Specimen
Data Deposition
Data QC
21. Data QC
• OUTSIDE of LIMS database
• “Clean up” raw data – trim, examine quality
• Assemble passed traces (“contig”) for a
specimen
• Examine/edit contigs
• Check validity of resulting sequences
22. My data QC ethos
• All criteria for each step of data analysis is
recorded
• For raw trace processing: trimming
criteria, length and quality requirements, binning
criteria
• For assembly: assembly parameters, product
length, etc.
• Hand editing is minimized*
• It would be possible for anyone to recreate the
barcode sequence
23. Any DNA sequence analysis software can be
used for data QC
• Sequencher (Genecodes) &Geneious (Biomatters)
– Trim ends of raw sequences with adjustable criteria, explore
effects of trim criteria
– Discard short or poor sequences
– Assemble trimmed reads with stringent, but adjustable criteria
– Output completed sequences
• Geneious LIMS is plugged into the data analysis software
– direct communication
– binning*
• Sequencher data must be exported and imported into LIMS
24. Data analysis
Here are the traces. You can see some
FIMS data in the document fields (eg
identified by, tissue id). You will also notice
a binning column (see the following slide)
25. Binning
Automatic categorization of reads and
assemblies
•Change binning
parameters, examine effects
•Trimming and assembly dialog
boxes similar
26. Final Steps:
Is it a contaminant? Is it identified correctly?
• A number of procedures for identifying
contamination or incorrect identification
– BLASTingdatabase of known contaminants; Genbank;
BOLD
– Quick and dirty assembly tests
– NJ trees
– Geneious taxonomy verification tool
27. Verify Taxonomy
• BLASTs your sequences
• Gets the NCBI taxonomy for the best hit(s)
• Compares to the taxonomy from the FIMS
28. Good, clean, barcode sequences
• Feed back into LIMS*
– Monitor progress
– Connect sequences and traces to specimen data
• Prepare for output to databases Genbank or
BOLD upload packages
LIMS
&
Specimen
Data QC
Data Deposition
29. Positive Information Flow from field or
museum to final data deposition
1. Collect metadata to flow easily into LIMS and
other databases
2. Record all aspects of all laboratory procedures
(LIMS)
3. Use LIMS system for reporting and protocol
investigation, monitoring of project progress
4. Input information and data from QC procedures
into LIMS*
5. LIMS output upload packages for public
databases
Notes de l'éditeur
An example workflow. This workflow was very straight forward – everything worked the first time so we didn’t have to rerun anything. Reaction templates and cocktails on the left, reaction thermocycles on the right.