SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Hospital Universitari Vall d’Hebron
Institut de Recerca - VHIR
Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)
Bioinformàtica per la
Recerca Biomèdica
http://ueb.vhir.org/2014BRB
Ferran Briansó
ferran.brianso@vhir.org
15/05/2014
INTRODUCTION TO NGS
VARIANT CALLING ANALYSIS
1. NGS WORKFLOW OVERVIEW
2. WET LAB STEPS
3. IMPORTANT SEQUENCING CONCEPTS
4. NGS ANALYSIS WORKFLOW
1. Primary analysis: de-multiplexing, QC
2. Secondary analysis: read mapping
and variant calling
3. Tertiary analysis: annotation, filtering...
5. VISUALIZATION
6. COMMON PIPELINES AND FORMATS
7. CONCLUSIONS
5
1
2
3
5
6
PRESENTATION OUTLINE
4
7
NGS WORKFLOW OVERVIEW1
3Extracted from Dr Kassahn's publicly shared slides (2013)
LIBRARY PREPARATION2
4
Select target
Hybridization-based cature or PCR
Add adapters
Contain binding sequences
Barcodes
Primer sequences
Amplify material
2
5
Select target
Hybridization-based cature or PCR
Add adapters
Contain binding sequences
Barcodes
Primer sequences
Amplify material
A) Fragment DNA
B) End-repair
C) A-tailing, adapter ligation and PCR
D) Final library contains
• sample insert
• indices (barcodes)
• flowcell binding sequences
• primer binding sequences
LIBRARY PREPARATION2
6
Select target
Hybridization-based cature or PCR
Add adapters
Contain binding sequences
Barcodes
Primer sequences
Amplify material
LIBRARY PREPARATION2
TEMPLATE PREPARATION
7
Attachment of library
e.g. To Illumina Flowcell
Amplification of library molecules
e.g. Brigde amplification
2
BRIDGE AMPLIFICATION
8
2
SEQUENCING
9
Sequencing-by-Synthesis
Detection by:
• Illumina – fluorescence
• Ion Torrent – pH
• ROCHE 454 – PO4 and light
2
SEQUENCING-BY-SYNTHESIS (ILLUMINA)
10
2
IMPORTANT SEQUENCING CONCEPTS1
11
Barcoding/Indexing:
allows multiplexing of different samples
Single-end vs paired-end sequencing
Coverage: avg. number reads per target
Quality scores (Qscore): log-scales!
3
NGS DATA ANALYSIS WORKFLOW4
12
DE-MULTIPLEXING (BARCODE SPLITTING)
13
4
FASTQ FORMAT
14
4
see en.wikipedia.org/wiki/FASTQ_format
SEQUENCE QUALITY: fastQC
15
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Details of the output https://docs.google.com/document/pub?id=16GwPmwYW7o_r-ZUgCu8-oSBBY1gC97TfTTinGDk98Ws
4
NGS DATA ANALYSIS WORKFLOW4
16
READ MAPPING (BASIC ALIGNMENT)4
17
Comparison against
reference genome
(! not assembly !)
Many aligners
(short reads, longer reads, RNAseq...)
Examples: BWA, Bowtie
SAM/BAM files
BURROWS-WHEELER ALIGNMENT TOOL (BWA)
18
Popular tool for genomic sequence
data (not RNASeq!)
Li and Durbin 2009 Bioinformatics
Challenge:
compare billion of short sequence
reads (.fastq file) against human
genome (3Gb)
Burrows-Wheeler Transform to “index” the human genome and allow
memory-efficient and fast string matching between sequence read and
reference genome
4
Li & Durbin 2009 Bionformatics
SAM/BAM FILES
19
4
see http://samtools.sourceforge.net/SAMv1.pdf
SAM/BAM FILES
20
@ Header (information regarding reference genome, alignment method...)
1) Read ID (QNAME)
2) Bitwise FLAG (first/second read in pair, both reads mapped...)
3) ReferenceSequence Name (RNAME)
4) Position (POS, coordinate)
5) MapQuality (MAPQ = -10log10P[wrong mapping position])
6) CIGAR (describes alignment – matches, skipped regions, insertions..)
7) ReferenceSequence (RNEXT, Ref seq of the pair)
8) Position of the pair (PNEXT)
9) TemplateLength (TLEN)
10) ReadSequence
11) QUAL (in Fastq format, '*' if NA)
...
4
VARIANT CALLING
21
Identify sequence variants
Distinguish signal vs noise
VCF files
Examples: SAMtools, SNVmix
4
SEQUENCE VARIANTS
22
Differences to the reference
4
SEQUENCE VARIANTS
23
Sanger: is it real??
NGS: read count
Provides confidence (statistics!)
Sensitivity tune-able parameter
(dependent on coverage)
4
VARIANT CALLING: GATK
24
Genome Analysis Toolkit (BROAD Institute)
• Initially developed for 1000 Genomes Project
• Single or multiple sample analysis (cohort)
• Popular tool for germline variant calling
• Evaluates probability of genotype given read data
4
see http://www.broadinstitute.org/gatk/
and McKenna et al. Genome Research 2010
SOMATIC VARIANT CALLING
25
Somatic mutations can occur at low freq. (<10%) due to:
• Tumor heterogeneity (multiple clones)
• Low tumor purity (% normal cells in tumor sample)
Requires different thresholds than
germline variant calling when
evaluating signal vs noise
Trade-off between sensitivity
(ability to detect mutation) and
specificity (rate of false positives)
Nature Reviews Cancer 12, 323-334 (May 2012)
4
INDELS DETECTION1
26
Small insertions/ deletions
The trouble with mapping approaches
4
modified from Heng Li (Broad Institute)
INDELS DETECTION
27
Small insertions/ deletions
The trouble with mapping approaches
4
INDELS DETECTION
28
Small insertions/ deletions
The trouble with mapping approaches
4
RE-ALIGNMENT
29
Re-align considering multi-read context, SNPs & INDELS previous info...
4
adapted from Andreas Schreiber
EVALUATING VARIANT QUALITY
30
TAKING INTO ACCOUNT:
• Coverage at position
• Number independent reads supporting variant
• Observed allele fraction vs expected (somatic / germline)
• Strand bias
• Base qualities at variant position
• Mapping qualities of reads supporting variant
• Variant position within reads (near ends or at centre)
4
VCF FILES
31
Variant Call Format
Standard for reporting variants from NGS
Describes metadata of analysis and variant calls
Text file format (open in Text Editor or Excel)
!!! Not a MS Office vCard !!!
see
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format
-version-41
4
VCF FILES
32
4
NGS DATA ANALYSIS WORKFLOW
33
4
VARIANT ANNOTATION
34
Provide biological & clinical context
Identify disease-causing mutations
(among 1000s of variants)
4
ANNOTATION OVERVIEW
35
4
VARIANT FILTERING AND PRIORIZATION
36
PURPOSE:
Identify pathogenic or
disease-associated mutation(s)
Reduce candidate variants
to reportable setCOMMON STEPS:
• Remove poor quality variant calls
• Remove common polymorphisms
• Prioritize variants with high functional impact
• Compare against known disease genes
• Consider mode of inheritance (autosomal recessive, X-linked...)
• Consider segregation in family (where multiple samples available)
4
NGS DATA ANALYSIS WORKFLOW
37
5
VISUALIZATION – IGV (or Genome Browser, Circos...)
38
5
provided by Katherine Pillman
COMMON PIPELINE6
39
bcl2fastq (Illumina)
FastQC (open-source)
Exomes (HiSeq):
BWA(open-source), GATK (Broad)
Gene panels (MiSeq, PGM):
MiSeq Reporter (Illumina)
Torrent Suite (Ion Torrent)
Custom scripts and third party tools
(Annovar, snpEff, PolyPhen, SIFT...)
Commercial annotation software
(GeneticistAssistant, VariantStudio...)
COMMON DATA FORMATS6
40
.bcl
.fastq
.BAM
.VCF
.csv
.txt
.xls
.html
...
CONCLUSIONS7
41
NGS data - the new currency of (molecular) biology
Broad applications (ecology, evolution, ag sciences, medical research and
clinical diagnostics...).
Rapidly evolving (sequencing technologies, library preparation methods,
analysis approaches, software).
Different tools/pipelines/parametrization gives different results,
(more standards needed).
Bioinformatics pipelines typically combine vendor software, third-party
tools and custom scripts.
Requires skills in scripting, Linux/Unix, HPC.
Requires advanced hardware (not always available).
Understanding of data (SE, PE, RNA-Seq) important for successful analysis.

Contenu connexe

Tendances

genome sequencing, types by kk sahu sir
genome sequencing, types by kk sahu sirgenome sequencing, types by kk sahu sir
genome sequencing, types by kk sahu sirKAUSHAL SAHU
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Next generation sequencing
Next  generation  sequencingNext  generation  sequencing
Next generation sequencingNidhi Singh
 
Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxHaibo Liu
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_predictionBas van Breukelen
 
20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slidesharehansjansen9999
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingSwathi Prabakar
 

Tendances (20)

Qpcr
QpcrQpcr
Qpcr
 
genome sequencing, types by kk sahu sir
genome sequencing, types by kk sahu sirgenome sequencing, types by kk sahu sir
genome sequencing, types by kk sahu sir
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
222397 lecture 16 17
222397 lecture 16 17222397 lecture 16 17
222397 lecture 16 17
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
qRT PCR
qRT PCRqRT PCR
qRT PCR
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
Next generation sequencing
Next  generation  sequencingNext  generation  sequencing
Next generation sequencing
 
Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptx
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
Ion Torrent Sequencing
Ion Torrent SequencingIon Torrent Sequencing
Ion Torrent Sequencing
 
20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 

Similaire à Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course - Session 2.3 - VHIR, Barcelona)

GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposalGenomeInABottle
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...Mark Evans
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansGenomeInABottle
 
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowGolden Helix
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 
Compact Genome Format
Compact Genome FormatCompact Genome Format
Compact Genome FormatArvados
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
 
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2Thomas Keane
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...OECD Environment
 
Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018GenomeInABottle
 

Similaire à Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course - Session 2.3 - VHIR, Barcelona) (20)

GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing Workflow
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Compact Genome Format
Compact Genome FormatCompact Genome Format
Compact Genome Format
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...
 
Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018
 
Eccmid meet the expert 2015
Eccmid meet the expert 2015Eccmid meet the expert 2015
Eccmid meet the expert 2015
 

Plus de VHIR Vall d’Hebron Institut de Recerca

Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...VHIR Vall d’Hebron Institut de Recerca
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...VHIR Vall d’Hebron Institut de Recerca
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...VHIR Vall d’Hebron Institut de Recerca
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...VHIR Vall d’Hebron Institut de Recerca
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...VHIR Vall d’Hebron Institut de Recerca
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génicaCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génicaVHIR Vall d’Hebron Institut de Recerca
 
Curso de Genómica - UAT (VHIR) 2012 - Tecnologías de Ultrasecuenciación y de ...
Curso de Genómica - UAT (VHIR) 2012 - Tecnologías de Ultrasecuenciación y de ...Curso de Genómica - UAT (VHIR) 2012 - Tecnologías de Ultrasecuenciación y de ...
Curso de Genómica - UAT (VHIR) 2012 - Tecnologías de Ultrasecuenciación y de ...VHIR Vall d’Hebron Institut de Recerca
 

Plus de VHIR Vall d’Hebron Institut de Recerca (20)

Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
 
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
 
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
 
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
 
Information management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cmsInformation management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cms
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCRCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCR
 
Curso de Genómica - UAT (VHIR) 2012 - RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - RT-qPCRCurso de Genómica - UAT (VHIR) 2012 - RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - RT-qPCR
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génicaCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
 
Curso de Genómica - UAT (VHIR) 2012 - Microarrays
Curso de Genómica - UAT (VHIR) 2012 - MicroarraysCurso de Genómica - UAT (VHIR) 2012 - Microarrays
Curso de Genómica - UAT (VHIR) 2012 - Microarrays
 
Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens
 Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens
Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 
Curso de Genómica - UAT (VHIR) 2012 - Tecnologías de Ultrasecuenciación y de ...
Curso de Genómica - UAT (VHIR) 2012 - Tecnologías de Ultrasecuenciación y de ...Curso de Genómica - UAT (VHIR) 2012 - Tecnologías de Ultrasecuenciación y de ...
Curso de Genómica - UAT (VHIR) 2012 - Tecnologías de Ultrasecuenciación y de ...
 

Dernier

VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 

Dernier (20)

VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course - Session 2.3 - VHIR, Barcelona)

  • 1. Hospital Universitari Vall d’Hebron Institut de Recerca - VHIR Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII) Bioinformàtica per la Recerca Biomèdica http://ueb.vhir.org/2014BRB Ferran Briansó ferran.brianso@vhir.org 15/05/2014 INTRODUCTION TO NGS VARIANT CALLING ANALYSIS
  • 2. 1. NGS WORKFLOW OVERVIEW 2. WET LAB STEPS 3. IMPORTANT SEQUENCING CONCEPTS 4. NGS ANALYSIS WORKFLOW 1. Primary analysis: de-multiplexing, QC 2. Secondary analysis: read mapping and variant calling 3. Tertiary analysis: annotation, filtering... 5. VISUALIZATION 6. COMMON PIPELINES AND FORMATS 7. CONCLUSIONS 5 1 2 3 5 6 PRESENTATION OUTLINE 4 7
  • 3. NGS WORKFLOW OVERVIEW1 3Extracted from Dr Kassahn's publicly shared slides (2013)
  • 4. LIBRARY PREPARATION2 4 Select target Hybridization-based cature or PCR Add adapters Contain binding sequences Barcodes Primer sequences Amplify material 2
  • 5. 5 Select target Hybridization-based cature or PCR Add adapters Contain binding sequences Barcodes Primer sequences Amplify material A) Fragment DNA B) End-repair C) A-tailing, adapter ligation and PCR D) Final library contains • sample insert • indices (barcodes) • flowcell binding sequences • primer binding sequences LIBRARY PREPARATION2
  • 6. 6 Select target Hybridization-based cature or PCR Add adapters Contain binding sequences Barcodes Primer sequences Amplify material LIBRARY PREPARATION2
  • 7. TEMPLATE PREPARATION 7 Attachment of library e.g. To Illumina Flowcell Amplification of library molecules e.g. Brigde amplification 2
  • 9. SEQUENCING 9 Sequencing-by-Synthesis Detection by: • Illumina – fluorescence • Ion Torrent – pH • ROCHE 454 – PO4 and light 2
  • 11. IMPORTANT SEQUENCING CONCEPTS1 11 Barcoding/Indexing: allows multiplexing of different samples Single-end vs paired-end sequencing Coverage: avg. number reads per target Quality scores (Qscore): log-scales! 3
  • 12. NGS DATA ANALYSIS WORKFLOW4 12
  • 15. SEQUENCE QUALITY: fastQC 15 http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ Details of the output https://docs.google.com/document/pub?id=16GwPmwYW7o_r-ZUgCu8-oSBBY1gC97TfTTinGDk98Ws 4
  • 16. NGS DATA ANALYSIS WORKFLOW4 16
  • 17. READ MAPPING (BASIC ALIGNMENT)4 17 Comparison against reference genome (! not assembly !) Many aligners (short reads, longer reads, RNAseq...) Examples: BWA, Bowtie SAM/BAM files
  • 18. BURROWS-WHEELER ALIGNMENT TOOL (BWA) 18 Popular tool for genomic sequence data (not RNASeq!) Li and Durbin 2009 Bioinformatics Challenge: compare billion of short sequence reads (.fastq file) against human genome (3Gb) Burrows-Wheeler Transform to “index” the human genome and allow memory-efficient and fast string matching between sequence read and reference genome 4 Li & Durbin 2009 Bionformatics
  • 20. SAM/BAM FILES 20 @ Header (information regarding reference genome, alignment method...) 1) Read ID (QNAME) 2) Bitwise FLAG (first/second read in pair, both reads mapped...) 3) ReferenceSequence Name (RNAME) 4) Position (POS, coordinate) 5) MapQuality (MAPQ = -10log10P[wrong mapping position]) 6) CIGAR (describes alignment – matches, skipped regions, insertions..) 7) ReferenceSequence (RNEXT, Ref seq of the pair) 8) Position of the pair (PNEXT) 9) TemplateLength (TLEN) 10) ReadSequence 11) QUAL (in Fastq format, '*' if NA) ... 4
  • 21. VARIANT CALLING 21 Identify sequence variants Distinguish signal vs noise VCF files Examples: SAMtools, SNVmix 4
  • 23. SEQUENCE VARIANTS 23 Sanger: is it real?? NGS: read count Provides confidence (statistics!) Sensitivity tune-able parameter (dependent on coverage) 4
  • 24. VARIANT CALLING: GATK 24 Genome Analysis Toolkit (BROAD Institute) • Initially developed for 1000 Genomes Project • Single or multiple sample analysis (cohort) • Popular tool for germline variant calling • Evaluates probability of genotype given read data 4 see http://www.broadinstitute.org/gatk/ and McKenna et al. Genome Research 2010
  • 25. SOMATIC VARIANT CALLING 25 Somatic mutations can occur at low freq. (<10%) due to: • Tumor heterogeneity (multiple clones) • Low tumor purity (% normal cells in tumor sample) Requires different thresholds than germline variant calling when evaluating signal vs noise Trade-off between sensitivity (ability to detect mutation) and specificity (rate of false positives) Nature Reviews Cancer 12, 323-334 (May 2012) 4
  • 26. INDELS DETECTION1 26 Small insertions/ deletions The trouble with mapping approaches 4 modified from Heng Li (Broad Institute)
  • 27. INDELS DETECTION 27 Small insertions/ deletions The trouble with mapping approaches 4
  • 28. INDELS DETECTION 28 Small insertions/ deletions The trouble with mapping approaches 4
  • 29. RE-ALIGNMENT 29 Re-align considering multi-read context, SNPs & INDELS previous info... 4 adapted from Andreas Schreiber
  • 30. EVALUATING VARIANT QUALITY 30 TAKING INTO ACCOUNT: • Coverage at position • Number independent reads supporting variant • Observed allele fraction vs expected (somatic / germline) • Strand bias • Base qualities at variant position • Mapping qualities of reads supporting variant • Variant position within reads (near ends or at centre) 4
  • 31. VCF FILES 31 Variant Call Format Standard for reporting variants from NGS Describes metadata of analysis and variant calls Text file format (open in Text Editor or Excel) !!! Not a MS Office vCard !!! see http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format -version-41 4
  • 33. NGS DATA ANALYSIS WORKFLOW 33 4
  • 34. VARIANT ANNOTATION 34 Provide biological & clinical context Identify disease-causing mutations (among 1000s of variants) 4
  • 36. VARIANT FILTERING AND PRIORIZATION 36 PURPOSE: Identify pathogenic or disease-associated mutation(s) Reduce candidate variants to reportable setCOMMON STEPS: • Remove poor quality variant calls • Remove common polymorphisms • Prioritize variants with high functional impact • Compare against known disease genes • Consider mode of inheritance (autosomal recessive, X-linked...) • Consider segregation in family (where multiple samples available) 4
  • 37. NGS DATA ANALYSIS WORKFLOW 37 5
  • 38. VISUALIZATION – IGV (or Genome Browser, Circos...) 38 5 provided by Katherine Pillman
  • 39. COMMON PIPELINE6 39 bcl2fastq (Illumina) FastQC (open-source) Exomes (HiSeq): BWA(open-source), GATK (Broad) Gene panels (MiSeq, PGM): MiSeq Reporter (Illumina) Torrent Suite (Ion Torrent) Custom scripts and third party tools (Annovar, snpEff, PolyPhen, SIFT...) Commercial annotation software (GeneticistAssistant, VariantStudio...)
  • 41. CONCLUSIONS7 41 NGS data - the new currency of (molecular) biology Broad applications (ecology, evolution, ag sciences, medical research and clinical diagnostics...). Rapidly evolving (sequencing technologies, library preparation methods, analysis approaches, software). Different tools/pipelines/parametrization gives different results, (more standards needed). Bioinformatics pipelines typically combine vendor software, third-party tools and custom scripts. Requires skills in scripting, Linux/Unix, HPC. Requires advanced hardware (not always available). Understanding of data (SE, PE, RNA-Seq) important for successful analysis.