Why and how to clean Illumina genome sequencing reads. Includes illustrative examples, and a case where a project was saved by using Nesoni clip: to discover the cause of non-mapping reads.
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
Using Snippy to call variants in bacterial short read datasets via alignment to reference, and then using these alignments to produce core SNP alignments for phylogenomics.
Introducing data analysis: reads to resultsAGRF_Ltd
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. Introduction data analysis: reads to results was presented by Dr. Torsten Seemann from the Victorian Bioinformatics Consortium at Monash University, Clayton.
Presented: 1st August 2012
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
Second part of the training session 'RNA-seq for Differential expression' analysis. We explain the characteristics of RNA-seq data that allow us to detect differential expression. Interested in following this session? Please contact http://www.jakonix.be/contact.html
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
A presentation for people intersted in understanding how Illumina adapter ligation, clustering ands SBS sequencing work. Follow core-genomics http://core-genomics.blogspot.co.uk/
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
Using Snippy to call variants in bacterial short read datasets via alignment to reference, and then using these alignments to produce core SNP alignments for phylogenomics.
Introducing data analysis: reads to resultsAGRF_Ltd
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. Introduction data analysis: reads to results was presented by Dr. Torsten Seemann from the Victorian Bioinformatics Consortium at Monash University, Clayton.
Presented: 1st August 2012
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
Second part of the training session 'RNA-seq for Differential expression' analysis. We explain the characteristics of RNA-seq data that allow us to detect differential expression. Interested in following this session? Please contact http://www.jakonix.be/contact.html
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
A presentation for people intersted in understanding how Illumina adapter ligation, clustering ands SBS sequencing work. Follow core-genomics http://core-genomics.blogspot.co.uk/
High data quality and accuracy are recognized characteristics of Sanger re-sequencing projects and are primary reasons that next generation sequencing projects compliment their results by capillary electrophoresis data validation. We have developed an on-line tool called Primer Designer™ to streamline the NGS-to-Sanger sequencing workflow by taking the laborious task of PCR primer design out of the hands of the researcher by providing pre-designed assays for the human exome. The primer design tool has been created to enable scientists using next generation sequencing to quickly confirm variants discovered in their work by providing the means to quickly search, order and receive suitable pre-designed PCR primers for Sanger sequencing. Using the Primer Designer™ tool to design M13-tailed and non-tailed PCR primers for Sanger sequencing we will demonstrate validation of 28-variants across 24-amplicons and 19-genes using the BDD, BDTv1.1 and BDTv3.1 sequencing chemistries on the 3500xl Genetic Analyzer capillary electrophoresis platform.
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
This slidedeck provides a technical overview of DNA/RNA preprocessing, template preparation, sequencing and data analysis. It covers the applications for NGS technologies, including guidelines for how to select the technology that will best address your biological question.
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
Struggling with low editing efficiency or delivery problems? IDT has developed a simple and affordable CRISPR-Cas9 solution that outperforms other methods. In this presentation we present the advantages of using a Cas9:tracrRNA:crRNA ribonucleoprotein (RNP) complex in genome editing experiments, and explain why it is the most efficient driver for genome editing compared to alternative methods, such as expression plasmids or the use of sgRNAs. We also review RNP delivery using cationic lipids and electroporation, and provide tips for optimized transfection in your system.
Introduction to real-Time Quantitative PCR (qPCR) - Download the slidesQIAGEN
This slidedeck introduces the concepts of real-time PCR and how to conduct a real-time PCR assay. The topics that are covered include an overview of real-time PCR chemistries, protocols, quantification methods, real-time PCR applications and factors for success.
High data quality and accuracy are recognized characteristics of Sanger re-sequencing projects and are primary reasons that next generation sequencing projects compliment their results by capillary electrophoresis data validation. We have developed an on-line tool called Primer Designer™ to streamline the NGS-to-Sanger sequencing workflow by taking the laborious task of PCR primer design out of the hands of the researcher by providing pre-designed assays for the human exome. The primer design tool has been created to enable scientists using next generation sequencing to quickly confirm variants discovered in their work by providing the means to quickly search, order and receive suitable pre-designed PCR primers for Sanger sequencing. Using the Primer Designer™ tool to design M13-tailed and non-tailed PCR primers for Sanger sequencing we will demonstrate validation of 28-variants across 24-amplicons and 19-genes using the BDD, BDTv1.1 and BDTv3.1 sequencing chemistries on the 3500xl Genetic Analyzer capillary electrophoresis platform.
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
This slidedeck provides a technical overview of DNA/RNA preprocessing, template preparation, sequencing and data analysis. It covers the applications for NGS technologies, including guidelines for how to select the technology that will best address your biological question.
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
Struggling with low editing efficiency or delivery problems? IDT has developed a simple and affordable CRISPR-Cas9 solution that outperforms other methods. In this presentation we present the advantages of using a Cas9:tracrRNA:crRNA ribonucleoprotein (RNP) complex in genome editing experiments, and explain why it is the most efficient driver for genome editing compared to alternative methods, such as expression plasmids or the use of sgRNAs. We also review RNP delivery using cationic lipids and electroporation, and provide tips for optimized transfection in your system.
Introduction to real-Time Quantitative PCR (qPCR) - Download the slidesQIAGEN
This slidedeck introduces the concepts of real-time PCR and how to conduct a real-time PCR assay. The topics that are covered include an overview of real-time PCR chemistries, protocols, quantification methods, real-time PCR applications and factors for success.
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
How genomics is changing the practice of public health microbiology. The role of whole genome sequencing as the "one true assay". Another powerful tool for the epidemiologist.
A presentation to a lay audience at Melbourne Knowledge Week on how bacteria are a part of our life and what we are doing with genomics to manage them.
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015Torsten Seemann
An introduction to basic genomics bioinformatics concepts in 20 minutes for an audience of clinicians, epidemiologists and other public health officials.
Presentation from the 3rd Joint Meeting of the Antimicrobial Resistance and Healthcare-Associated Infections (ARHAI) Networks, organised by the European Centre of Disease Prevention and Control - Stockholm, 11-13 February 2015
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...Torsten Seemann
I describe the three levels of parallelism that can be exploited in bioinformatics software (1) clusters of multiple computers; (2) multiple cores on each computer; and (3) vector machine code instructions.
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...Игорь Шадеркин
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but valid data are lacking in many geographic areas
Magnus Unemo, PhD, Assoc. Professor
Reference Laboratory for Pathogenic Neisseria
Department of Clinical Microbiology
Örebro University Hospital
Sweden
this is the project regarding the detection and analysis of DNA sequences,it provide the fascility to find the repets from the hudge data set.we can find tha all repeats which is occured in human body.
Polymerase chain reaction (PCR) is a technique used in molecular biology to amplify a single copy or a few copies of a segment of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence. It is an easy, cheap, and reliable way to repeatedly replicate a focused segment of DNA, a concept which is applicable to numerous fields in modern biology and related sciences.
OpenFOAM benchmark for EPYC server -Influence of coarsestLevelCorr in GAMG so...takuyayamamoto1800
Parallel efficiency of GAMG solver in OpenFOAM is evaluated for EPYC server. Especially, in this study, the influence of coarsestLevelCorr on the calculation time is evaluated in lid driven cavity flow.
RNase H2-dependent PCR (rhPCR) is a powerful method for increasing PCR specificity and eliminating primer-dimers by using blocked primers and a thermostable RNase H2 from Pyrococcus abyssi (P. abyssi). Primers will only support extension and replication after the blocked portion is cleaved. Cleavage by the RNase H2 enzyme occurs only when primers are bound to their complementary target sequence, thus providing increased specificity. Also, the thermostability of P. abyssi RNase H2 provides a “hot start” capability to the reaction. In this presentation, Dr Joseph Dobosy (senior research scientist in the molecular genetics research division of IDT) gives a detailed explanation of the rhPCR mechanism, offer tips on how to design assays using this powerful technology, and discuss examples of applications that benefit from rhPCR.
Generating random numbers in a highly parallel program is surprising non-trivial. A lot of good generators have lots of state and is purely serial. Simple generators like LCG can leapfrog ahead but of limited quality and depends on #cores. We want our code to be independent of the degree of parallelism.
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...Torsten Seemann
"Bioinformatics tools for the diagnostic laboratory" presented at the Australian Society for Antimicrobials 2016 annual conference in Melbourne Australia. Slides are aimed at a biological / pathology / clinican audience. Some material has been re-imagined from Nick Loman's ECCMID 2015 talk.
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...Torsten Seemann
This talk introduces a Linux Professional audience to bacterial genomics and modern sequencing technology. The title is slightly misleading and is a bit of clickbait. The diagrams are good.
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015Torsten Seemann
Long read sequencing - the good, the bad, and the really cool. Covers Illumina SLR, Pacbio RSII and Oxford Nanopore as of June 2015. Discusses bioinformatics differences of long reads over short reads.
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Torsten Seemann
Invited talk at the Australian Society for Microbiology Annual Conference 2014 on "FriPan" our tool for visualizing bacterial pan genomes across 10-100s of isolates.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
The Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdfmediapraxi
The rise of virtual labs has been a key tool in universities and schools, enhancing active learning and student engagement.
💥 Let’s dive into the future of science and shed light on PraxiLabs’ crucial role in transforming this field!
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
3. Illumina reads
● Usually 100 or 150 bp
○ 250bp rolling out now, 400bp next year
● Indel errors rare
● Homopolymer errors rare
● Substitution errors < 1%
○ Error rate higher at 3' end
● Adaptor issues
○ rare in HiSeq (TruSeq prep)
○ more common in MiSeq (Nextera prep)
● Very high quality overall
4. Illumina libraries
●"single end"
○just a shotgun read sequenced from one end
●"paired end"
○~500bp fragments sequenced at both ends
○very reliable
●"mate pair"
○circularized 2-10 kbp fragments sequencing
○then "paired end" protocol
○reliability varies
5. ● garbage reads
○ instrument weirdness
● duplicate reads
○ low complexity library, PCR duplicate
● adaptor read-through
○ fragment too short
● indel errors
○ skipping bases, inserting extra bases
● uncalled base
○ couldn't reliably estimate, replace with "N"
● substitution errors
○ reading wrong base
Sequences have errors
More common
Less common
6. Why clean reads?
● Erroneous data may cause software to:
○ run more slowly
○ use more RAM
○ produce poor / biased / incorrect results
● Cleaning can:
○ improve overall average quality of the reads
■ hopefully giving a reliable result
○ reduce the volume of reads
■ some algorithms are O(N.logN) or O(N2
)
■ enable processing when otherwise couldn't
● (some software does handle them appropriately)
8. DNA sequence quality
● DNA sequences often have a
quality value associated with each
nucleotide
● A measure of reliability for each base
○ as it is derived from physical process
■ chromatogram (Sanger sequencing)
■ pH reading (Ion Torrent sequencing)
● Formalised by the Phred software for the
Human Genome Project
9. Phred qualities
● Q = -10 log10
P <=> P = 10- Q / 10
○ Q = Phred quality score
○ P = probability of base call being incorrect
Quality value Chance it is wrong Accuracy
10 1 in 10 90%
20 1 in 100 99%
30 1 in 1000 99.9%
40 1 in 10,000 99.99%
50 1 in 100,000 99.999%
12. Anatomy of a FASTQ entry
@read00179
AGTCTGATATGCTGTACCTATTATAATTCTAGGCGCTCAT
GCCCGCGGATATCGTAGCTATATGCTTCA
+
8;ACCCD?DD???@B9<9<CAC@=AAAA8A;B<A@882,+
495;;3990,02..-&-&-*,,,,(0**#
Start symbol
Sequence ID
Sequence
Separator line
Encoded quality values,
one symbol per nucleotide
16. Ambiguous bases
● If there is ambiguity in the base call, an "N" is used
@ILLUMINA:6:1:964:115#GATCAG/1
GGACCTGAGAGTGTGCATGAAGAGGGCAGCGCGCACNGCATCA
+
HHHGFGEEECDEBA@BBBA<=<;:98743720&,+**_%$#"!
● Possible software responses:
○ Crash!
○ Ignore it
○ Silently convert to fixed or random base (Velvet > 'A')
○ Handle it appropriately (Bowtie2)
● Small proportion overall, safer to discard
17. Homopolymers
● A read consisting of all the same base
@ILLUMINA:6:1:964:115#GATCAG/1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
HHHGFGEEECDEBA@BBBA<=<;:98743720&,+**_%$#"!
● Often occur from clusters at edge of flowcell lane
● Early Illumina software called '?' as 'A' rather than N
● Unlikely to be present in real DNA
● Best to discard
● Less common with newer Illumina OLB software
18. Quality trimming
●Remove low quality sequence
○Q=13 corresponds to 5% error (p=0.05)
○Q=0..13 encoded by: !"#@%&'()*+,-/
@ILLUMINA:6:1:9646:1115#GATCAG/1
GGACCTGAGAGTGTGCATGAAGAGGGCAGCCCCGCACTGCATG
+
HHHGFGEEECDEBA@BBBA<=<;:98743720&,+**_%$#"!
●Can trim per
○each base
○window moving average eg. 3 base mean
○minimum % good per window eg. need 4 of 5
19. Illumina Adaptors
● Used in the sequencing chemistry
● Can appear at ends of read sequences
● Worse for mate-pair than for paired-end reads
● PCR Primer
CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT
● Genomic DNA Sequencing Primer
CACTCTTTCCCTACACGACGCTCTTCCGATCT
● All other TruSeq & Nextera Adaptors
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
20. Adaptor clipping
● Method
○ Align 3' and 5' read end against all adaptor sequences
○ If there is an anchored "match", trim the read
● Minimum length of match?
○ want to remove adaptor, but not real sequence [10 bp]
● Allow substitutions in match?
○ as reads have errors, need some tolerance [1 sub]
● Allow gaps/indels in match?
○ indels are unlikely in Illumina reads [no]
● Slow to perform compared to other preprocessing steps
21. Decloning
●Illumina "mate pair" sequencing
○Requires a lot of starting DNA
○Challenging protocol to implement reliably
○Not enough final DNA leads to PCR duplicates
○Coverage is highly non-uniform and sporadic
○Causes bias in analyses
●Decloning
○Replace clones with a single representative
○Choose representative with highest quality
○Helps salvage usable information content
22. Read length
●Enforce a minimum read length L
●k-mer based tools
○break reads into k-mers, so L < k is pointless
○Velvet, Trinity, Gossamer, ...
●Uniqueness
○desire reasonable uniqueness of sequence,
otherwise multiple mapping will take forever!
○L=24 is bare minimum (I reckon)
○BWA, Bowtie, Shrimp, ...
23. Other strategies
●Digital normalisation
○remove low frequency k-mers
○remove reads w/ too many low freq k-mers
●Error correction
○replace low frequency k-mers with their
"closest" high-frequency k-mer
○other methods I don't understand yet
●Just use local alignment to "solve" the problem
28. The GKP FFPE BWA PE mystery
●FFPE sample
○formalin-fixed, paraffin embedded
○long term tissue archival storage method
●Sequencing
○HiSeq-2000
○22 million x 100bp paired-end reads
○quality looks good ... but not mapping!
32. Nesoni
●Implemented primarily by Paul Harrison @ VBC
○swiss army knife
○snp calling, phylogenetics, DGE, ....
○extendible pipeline system (Python)
●We used the "nesoni clip:" module
○adaptor clipping = on
○min quality = Q10
○allow Ns = no
○min length = 2
38. Nesoni clip: summary
nesoni 0.92
FASTQ offset seems to be 33
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 22,611,994 read-pairs
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 121,236 read-1 too short after quality clip
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 483,729 read-1 too short after adaptor clip
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 22,007,029 read-1 kept
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 100.000 read-1 average input length
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 75.426 read-1 average output length
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 673,175 read-2 too short after quality clip
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 419,343 read-2 too short after adaptor clip
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 21,519,476 read-2 kept
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 100.000 read-2 average input length
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 77.201 read-2 average output length
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 21,147,301 pairs kept after clipping
(> SM_AdMP1_ID_07B26948_L004_CLIPPED 1,231,903 reads kept after clipping
started 21 November 2012 11:02 AM
finished 21 November 2012 11:43 AM
run time 0:40:06