SlideShare une entreprise Scribd logo
1  sur  21
Sequencing data analysis
Workshop – part 3 / peak calling and annotation



                     Outline

           Previously in this workshop…

      Peak calling and annotation – the steps

    Peak calling and annotation – the workshop




                 Maté Ongenaert
Previously in this workshop…
Introduction – the real cost of sequencing
Previously in this workshop…
Introduction – the real cost of sequencing
Previously in this workshop…
  The workflow of NGS data analysis
                            Data analysis

                 Raw machine reads… What’s next?

                Preprocessing (machine/technology)
                 - adaptors, indexes, conversions,…
                 - machine/technology dependent

              Reads with associated qualities (universal)
                              - FASTQ
                            - QC check

            Depending on application (general applicable)
        - ‘de novo’ assembly of genome (bacterial genomes,…)
         - Mapping to a reference genome  mapped reads
                          - SAM/BAM/…

             High-level analysis (specific for application)
                            - SNP calling
                           - Peak calling
Previously in this workshop…
  The workflow of NGS data analysis
Previously in this workshop…
                                     Main data formats
                                     Raw sequence reads:

- Represent the sequence ~ FASTA
  >SEQUENCE_IDENTIFIER
  GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT


- Extension: represent the quality, per base ~ FASTQ – Q for quality
Score ~ phred ~ ASCII table ~ phred + 33 = Sanger
  @SEQUENCE_IDENTIFIER
  GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
  +
  !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65



- Machine and platform independent and compressed: SRA (NCBI)
Get the original FASTQ file using SRATools (NCBI)
Previously in this workshop…
                                Main data formats
- Now moving to a common file format  SAM / BAM (Sequence Alignment/Map)
- BAM: binary (read: computer-readable, indexed, compressed) ‘form’ of SAM

DESCRIPTION OF THE 11 FIELDS IN THE ALIGNMENT SECTION

# QNAME: template name
#FLAG
#RNAME: reference name
# POS: mapping position
#MAPQ: mapping quality
#CIGAR: CIGAR string
#RNEXT: reference name of the mate/next fragment
#PNEXT: position of the mate/next fragment
#TLEN: observed template length
#SEQ: fragment sequence
#QUAL: ASCII of Phred-scale base quality+33

#Headers
@HD VN:1.3 SO:coordinate
@SQ SN:ref LN:45

#Alignment block
r001 163 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG *
r002 0 ref 9 30 3S6M1P1I4M * 0 0 AAAAGATAAGGATA *
r003 0 ref 9 30 5H6M * 0 0 AGCTAA * NM:i:1
r004 0 ref 16 30 6M14N5M * 0 0 ATAGCTTCAGC *
Previously in this workshop…
                                         Main data formats
- BED files (location / annotation / scores): Browser Extensible Data
Used for mapping / annotation / peak locations / - extension: bigBED (binary)
FIELDS USED:
# chr
# start
# end
# name
# score
# strand

track   name=pairedReads description="Clone Paired Reads" useScore=1
#chr    start end name score strand
chr22   1000 5000 cloneA 960 +
chr22   2000 6000 cloneB 900 –


- BEDGraph files (location, combined with score)
Used to represent peak scores
track type=bedGraph name="BedGraph Format" description="BedGraph format"
visibility=full color=200,100,0 altColor=0,100,200 priority=20
#chr start    end      score
chr19 59302000 59302300 -1.0
chr19 59302300 59302600 -0.75
chr19 59302600 59302900 -0.50
Previously in this workshop…
                                       Main data formats
- WIG files (location / annotation / scores): wiggle
Used for visulization or summarize data, in most cases count data or normalized count
data (RPKM) – extension: BigWig – binary versions (often used in GEO for ChIP-seq peaks)




browser position chr19:59304200-59310700
browser hide all

#150 base wide bar graph at arbitrarily spaced positions,
#threshold line drawn at y=11.76
#autoScale off viewing range set to [0:25]
#priority = 10 positions this as the first graph

track type=wiggle_0 name="variableStep" description="variableStep format"
visibility=full autoScale=off viewLimits=0.0:25.0 color=50,150,255
yLineMark=11.76 yLineOnOff=on priority=10
variableStep chrom=chr19 span=150
59304701 10.0
59304901 12.5
59305401 15.0
59305601 17.5
59305901 20.0
59306081 17.5
Previously in this workshop…
                                    Main data formats
- GFF format (General Feature Format) or GTF
Used for annotation of genetic / genomic features – such as all coding genes in Ensembl
Often used in downstream analysis to assign annotation to regions / peaks / …
FIELDS USED:

# seqname (the name of the sequence)
# source (the program that generated this feature)
# feature (the name of this type of feature – for example: exon)
# start (the starting position of the feature in the sequence)
# end (the ending position of the feature)
# score (a score between 0 and 1000)
# strand (valid entries include '+', '-', or '.')
# frame (if the feature is a coding exon, frame should be a number between
0-2 that represents the reading frame of the first base. If the feature is
not a coding exon, the value should be '.'.)
# group (all lines with the same group are linked together into a single
item)

track name=regulatory description="TeleGene(tm)    Regulatory Regions"
#chr   source   feature   start    end   scores    tr fr group
chr22 TeleGene enhancer 1000000 1001000 500        + . touch1
chr22 TeleGene promoter 1010000 1010100 900        + . touch1
chr22 TeleGene promoter 1020000 1020000 800        - . touch2
Peak calling
                                      The workflow
Peak calling:

Identify genomic regions where the number of sequenced reads (coverage) of the IP-
sample is higher than can be estimated from the input (control) samples >> enriched
regions >> possibly captured by the IP & thus sequenced with more coverage

Peak annotation:

When such enriched regions are identified, where are they located (intron/exon/…) ?
What is the closest gene or the closest promoter region?
Peak calling
                                        The workflow
Peak calling:

Coverage

From the BAM file: mapping against the reference genome
Both the IP-sample and the control (Input) must be mapped, duplicates will be ignored by
most peak callers

Peak caller will determine coverage for both samples
- Store them for visualisation (WIG files; BIGWIG files or similar)

Enriched

Find out which regions are enriched (or within the sample or versus a control (Input)
sample  statistics ~ model of tag distributions and normalisation strategy
Peak calling
                                                                                  The workflow
 Peak calling:

 Enriched

 Find out which regions are enriched (or within the sample or versus a control (Input)
 sample  statistics ~ model of tag distributions and normalisation strategy
                                                                                                                    Significance relative to control
                                Density profiles                Peak assignment         Control data adjustment                                                        Statistical model / test
                                                                                                                                  data

                                                                                                                                         Statistical
                          Window-      Tag         Gaussian   Strand-    Peak height   Bacground     Genomic              Normalized                   Conditional    Local    Chromome
  Program     Reference                                                                                             FDR                  model on                                                 HMM   T-test
                           based    clustering      kernel    specific      or FE       subtract   dupl/deletions          control                      binomial     poisson    poisson
                                                                                                                                          control

Cisgenome       [73]        X                                                X            X                          X                        X            X
 Minimal
 ChipSeq        [74]                    X                                    X                                                  X
Peak Finder
  E-range       [75]                    X                                    X                                                  X             X                                     X
  MACS          [76]        X                                                X                                       X                        X                        X
  QuEST         [77]                                  X                      X                                       X                        X                                     X
  Hpeak         [78]        X                                                X                                                                X                                                    X
Sole-Search     [79]        X                                                X                           X                                    X                                                           X
 PeakSeq        [80]                    X                                    X                                                                X            X
  SISSRS        [81]        X                                    X                                                              X
spp package     [82]        X                                    X                        X              X           X
Peak calling
                                       The workflow
Usage: macs14 <-t tfile> [-n name] [-g genomesize] [options]

Example: macs14 -t ChIP.bam -c Control.bam -f BAM -g h -n test -w --call-subpeaks


macs14 -- Model-based Analysis for ChIP-Sequencing

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit.
  -t TFILE, --treatment=TFILE
                        ChIP-seq treatment files. REQUIRED. When ELANDMULTIPET
                        is selected, you must provide two files separated by
                        comma, e.g.
                        s_1_1_eland_multi.txt,s_1_2_eland_multi.txt
  -c CFILE, --control=CFILE
                        Control files. When ELANDMULTIPET is selected, you
                        must provide two files separated by comma, e.g.
                        s_2_1_eland_multi.txt,s_2_2_eland_multi.txt
  -n NAME, --name=NAME Experiment name, which will be used to generate output
                        file names. DEFAULT: "NA"
  -f FORMAT, --format=FORMAT
                        Format of tag file, "AUTO", "BED" or "ELAND" or
                        "ELANDMULTI" or "ELANDMULTIPET" or "ELANDEXPORT" or
                        "SAM" or "BAM" or "BOWTIE". The default AUTO option
                        will let MACS decide which format the file is. Please
                        check the definition in 00README file if you choose EL
                        AND/ELANDMULTI/ELANDMULTIPET/ELANDEXPORT/SAM/BAM/BOWTI
                        E. DEFAULT: "AUTO"
Peak calling
                                        The workflow
Peak annotation

Enriched

Peak locations > in which features is my peak located; is it close to a gene; provide me
some statistics on how far my peaks are from annotated TSSes

R/BioConductor
ChipPeakAnno package

PeakAnalyzer
Sequencing data analysis
Workshop – part 3 / peak calling and annotation



                     Outline

           Previously in this workshop…

      Peak calling and annotation – the steps

    Peak calling and annotation – the workshop




                 Maté Ongenaert
Peak calling
                                 The workflow
Further downstream processing
Peak overlaps

                                                Is this observed overlap larger
                                                than one can expect if the
                                                datasets were random?

                                                 Peak caller gives each peak a
                                                  score
                                                 Randomy distribute this score
                                                  accross the peaks of the same
                                                  peakset (factor) and, for a
                                                  percentage of top-
                                                  peaks, calculate overlapping
                                                  peaks in real dataset and with
                                                  random distributed scores
Peak calling
                                       The workflow
Further downstream processing
Identify sequence motifs (region around ‘peak’, searched for motifs)




 Further downstream processing
 Identify differentially bound regions between conditions/factors/…
Peak calling
                                                                 The workflow
Further downstream processing
Peak overlaps

   Real         10%          15%         20%          30%           50%          75%

                  7           18          25           52            102          201




  Means         0,
                 347         11
                              ,53        2,
                                          699         9,
                                                       297          42,
                                                                      377       1 888
                                                                                 40,

Factor diff   20,7291
                1    066   1 61 4484
                            5, 1       9,
                                        262689885   5,
                                                     593202108    2,
                                                                   406966043   14266651
                                                                                ,      52



   FDR          10%          15%         20%          30%           50%          75%

                  0           0            0            0             0            0



   10%          10%          15%         20%          30%           50%          75%

                 282         333          506          907           1000        1000

   20%          10%          15%         20%          30%           50%          75%

                 59           33          125          332           1000        1000

   30%          10%          15%         20%          30%           50%          75%

                  4           2            9           27            981         1000

   50%          10%          15%         20%          30%           50%          75%

                  2           0            0            0            95          1000

   75%          10%          15%         20%          30%           50%          75%

                  0           0            0            0             0           148
Sequencing data analysis
Workshop – part 3 / peak calling and annotation



                     Outline

           Previously in this workshop…

      Peak calling and annotation – the steps

    Peak calling and annotation – the workshop




                 Maté Ongenaert
Blok
de   Van…
       ETER

Contenu connexe

Tendances

Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slidessmpant
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference acceleratorsDarshanG13
 
Simulation and Performance Analysis of AODV using NS-2.34
Simulation and Performance Analysis of AODV using NS-2.34Simulation and Performance Analysis of AODV using NS-2.34
Simulation and Performance Analysis of AODV using NS-2.34Shaikhul Islam Chowdhury
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.Jason Hearne-McGuiness
 
Protocol implementation on NS2
Protocol implementation on NS2Protocol implementation on NS2
Protocol implementation on NS2amreshrai02
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflowKeon Kim
 
Model Based Schedulability Analysis of Java Bytecode Programs Executed on Com...
Model Based Schedulability Analysis of Java Bytecode Programs Executed on Com...Model Based Schedulability Analysis of Java Bytecode Programs Executed on Com...
Model Based Schedulability Analysis of Java Bytecode Programs Executed on Com...InfinIT - Innovationsnetværket for it
 
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduhoKim Du-Ho
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Itzik Kotler
 
Fann tool users_guide
Fann tool users_guideFann tool users_guide
Fann tool users_guideBirol Kuyumcu
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Marina Kolpakova
 
High-Performance Physics Solver Design for Next Generation Consoles
High-Performance Physics Solver Design for Next Generation ConsolesHigh-Performance Physics Solver Design for Next Generation Consoles
High-Performance Physics Solver Design for Next Generation ConsolesSlide_N
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureNetronome
 

Tendances (19)

Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slides
 
3rd 3DDRESD: ReCPU 4 NIDS
3rd 3DDRESD: ReCPU 4 NIDS3rd 3DDRESD: ReCPU 4 NIDS
3rd 3DDRESD: ReCPU 4 NIDS
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
 
Simulation and Performance Analysis of AODV using NS-2.34
Simulation and Performance Analysis of AODV using NS-2.34Simulation and Performance Analysis of AODV using NS-2.34
Simulation and Performance Analysis of AODV using NS-2.34
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.
 
Protocol implementation on NS2
Protocol implementation on NS2Protocol implementation on NS2
Protocol implementation on NS2
 
Tma ph d_school_2011
Tma ph d_school_2011Tma ph d_school_2011
Tma ph d_school_2011
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
 
Model Based Schedulability Analysis of Java Bytecode Programs Executed on Com...
Model Based Schedulability Analysis of Java Bytecode Programs Executed on Com...Model Based Schedulability Analysis of Java Bytecode Programs Executed on Com...
Model Based Schedulability Analysis of Java Bytecode Programs Executed on Com...
 
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)
 
Fann tool users_guide
Fann tool users_guideFann tool users_guide
Fann tool users_guide
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...
 
High-Performance Physics Solver Design for Next Generation Consoles
High-Performance Physics Solver Design for Next Generation ConsolesHigh-Performance Physics Solver Design for Next Generation Consoles
High-Performance Physics Solver Design for Next Generation Consoles
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
 
BioMake BOSC 2004
BioMake BOSC 2004BioMake BOSC 2004
BioMake BOSC 2004
 
Ns2
Ns2Ns2
Ns2
 

En vedette

New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project Senthil Natesan
 
Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Xing Xu
 
Ecobouwers opendeur passiefhuis Lokeren
Ecobouwers opendeur passiefhuis LokerenEcobouwers opendeur passiefhuis Lokeren
Ecobouwers opendeur passiefhuis LokerenMaté Ongenaert
 
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...Maté Ongenaert
 
Genome voyager-beta-brochure
Genome voyager-beta-brochureGenome voyager-beta-brochure
Genome voyager-beta-brochureXing Xu
 
ENCODE project: brief summary of main findings
ENCODE project: brief summary of main findingsENCODE project: brief summary of main findings
ENCODE project: brief summary of main findingsMaté Ongenaert
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataXing Xu
 
A Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisA Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisMatthieu Schapranow
 
New Progress in Pyrosequencing for DNA Methylation
New Progress in Pyrosequencing for DNA MethylationNew Progress in Pyrosequencing for DNA Methylation
New Progress in Pyrosequencing for DNA MethylationQIAGEN
 
PCR - From Setup to Cleanup: A Beginner`s Guide with Useful Tips and Tricks -...
PCR - From Setup to Cleanup: A Beginner`s Guide with Useful Tips and Tricks -...PCR - From Setup to Cleanup: A Beginner`s Guide with Useful Tips and Tricks -...
PCR - From Setup to Cleanup: A Beginner`s Guide with Useful Tips and Tricks -...QIAGEN
 
Microbiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteMicrobiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteQIAGEN
 
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...QIAGEN
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Maté Ongenaert
 
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...QIAGEN
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 

En vedette (17)

New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project
 
Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012
 
Ecobouwers opendeur passiefhuis Lokeren
Ecobouwers opendeur passiefhuis LokerenEcobouwers opendeur passiefhuis Lokeren
Ecobouwers opendeur passiefhuis Lokeren
 
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
Strong reversal of the lung fibrosis disease signature by autotaxin inhibitor...
 
Genome voyager-beta-brochure
Genome voyager-beta-brochureGenome voyager-beta-brochure
Genome voyager-beta-brochure
 
ENCODE project: brief summary of main findings
ENCODE project: brief summary of main findingsENCODE project: brief summary of main findings
ENCODE project: brief summary of main findings
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
 
Open human genome data
Open human genome dataOpen human genome data
Open human genome data
 
A Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisA Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data Analysis
 
New Progress in Pyrosequencing for DNA Methylation
New Progress in Pyrosequencing for DNA MethylationNew Progress in Pyrosequencing for DNA Methylation
New Progress in Pyrosequencing for DNA Methylation
 
PCR - From Setup to Cleanup: A Beginner`s Guide with Useful Tips and Tricks -...
PCR - From Setup to Cleanup: A Beginner`s Guide with Useful Tips and Tricks -...PCR - From Setup to Cleanup: A Beginner`s Guide with Useful Tips and Tricks -...
PCR - From Setup to Cleanup: A Beginner`s Guide with Useful Tips and Tricks -...
 
Microbiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteMicrobiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro Suite
 
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
 
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 

Similaire à Workshop NGS data analysis - 3

General pipeline of transcriptomics analysis
General pipeline of transcriptomics analysisGeneral pipeline of transcriptomics analysis
General pipeline of transcriptomics analysisSanty Marques-Ladeira
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Mathias Herberts
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature HashingWush Wu
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
 
Logistic Regression using Mahout
Logistic Regression using MahoutLogistic Regression using Mahout
Logistic Regression using Mahouttanuvir
 
What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0Makoto Yui
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machinesbutest
 
002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapiScott Miao
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsMySQLConference
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machinesbutest
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsJan Aerts
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Interruption Timer Périodique
Interruption Timer PériodiqueInterruption Timer Périodique
Interruption Timer PériodiqueAnne Nicolas
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...buildacloud
 

Similaire à Workshop NGS data analysis - 3 (20)

General pipeline of transcriptomics analysis
General pipeline of transcriptomics analysisGeneral pipeline of transcriptomics analysis
General pipeline of transcriptomics analysis
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature Hashing
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015
 
Logistic Regression using Mahout
Logistic Regression using MahoutLogistic Regression using Mahout
Logistic Regression using Mahout
 
Performance tests - it's a trap
Performance tests - it's a trapPerformance tests - it's a trap
Performance tests - it's a trap
 
R Cheat Sheet
R Cheat SheetR Cheat Sheet
R Cheat Sheet
 
What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machines
 
002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapi
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machines
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
2020 icldla-updated
2020 icldla-updated2020 icldla-updated
2020 icldla-updated
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Interruption Timer Périodique
Interruption Timer PériodiqueInterruption Timer Périodique
Interruption Timer Périodique
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
 
Macs course
Macs courseMacs course
Macs course
 

Plus de Maté Ongenaert

Unleash transcriptomics to gain insights in disease mechanisms: integration i...
Unleash transcriptomics to gain insights in disease mechanisms: integration i...Unleash transcriptomics to gain insights in disease mechanisms: integration i...
Unleash transcriptomics to gain insights in disease mechanisms: integration i...Maté Ongenaert
 
Exploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosisExploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosisMaté Ongenaert
 
High-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting themHigh-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting themMaté Ongenaert
 
Microarray data and pathway analysis: example from the bench
Microarray data and pathway analysis: example from the benchMicroarray data and pathway analysis: example from the bench
Microarray data and pathway analysis: example from the benchMaté Ongenaert
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyMaté Ongenaert
 
Integrative transcriptomics to study non-coding RNA functions
Integrative transcriptomics to study non-coding RNA functionsIntegrative transcriptomics to study non-coding RNA functions
Integrative transcriptomics to study non-coding RNA functionsMaté Ongenaert
 
Race against the sequencing machine: processing of raw DNA sequence data at t...
Race against the sequencing machine: processing of raw DNA sequence data at t...Race against the sequencing machine: processing of raw DNA sequence data at t...
Race against the sequencing machine: processing of raw DNA sequence data at t...Maté Ongenaert
 
Bringing the data back to the researchers
Bringing the data back to the researchersBringing the data back to the researchers
Bringing the data back to the researchersMaté Ongenaert
 
The post-genomic era: epigenetic sequencing applications and data integration
The post-genomic era: epigenetic sequencing applications and data integrationThe post-genomic era: epigenetic sequencing applications and data integration
The post-genomic era: epigenetic sequencing applications and data integrationMaté Ongenaert
 
Literature managment training
Literature managment trainingLiterature managment training
Literature managment trainingMaté Ongenaert
 
Scientific literature managment - exercises
Scientific literature managment - exercisesScientific literature managment - exercises
Scientific literature managment - exercisesMaté Ongenaert
 

Plus de Maté Ongenaert (13)

Unleash transcriptomics to gain insights in disease mechanisms: integration i...
Unleash transcriptomics to gain insights in disease mechanisms: integration i...Unleash transcriptomics to gain insights in disease mechanisms: integration i...
Unleash transcriptomics to gain insights in disease mechanisms: integration i...
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
Exploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosisExploring the neuroblastoma epigenome: perspectives for improved prognosis
Exploring the neuroblastoma epigenome: perspectives for improved prognosis
 
High-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting themHigh-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting them
 
Microarray data and pathway analysis: example from the bench
Microarray data and pathway analysis: example from the benchMicroarray data and pathway analysis: example from the bench
Microarray data and pathway analysis: example from the bench
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
 
Integrative transcriptomics to study non-coding RNA functions
Integrative transcriptomics to study non-coding RNA functionsIntegrative transcriptomics to study non-coding RNA functions
Integrative transcriptomics to study non-coding RNA functions
 
Race against the sequencing machine: processing of raw DNA sequence data at t...
Race against the sequencing machine: processing of raw DNA sequence data at t...Race against the sequencing machine: processing of raw DNA sequence data at t...
Race against the sequencing machine: processing of raw DNA sequence data at t...
 
Bringing the data back to the researchers
Bringing the data back to the researchersBringing the data back to the researchers
Bringing the data back to the researchers
 
The post-genomic era: epigenetic sequencing applications and data integration
The post-genomic era: epigenetic sequencing applications and data integrationThe post-genomic era: epigenetic sequencing applications and data integration
The post-genomic era: epigenetic sequencing applications and data integration
 
Introduction
IntroductionIntroduction
Introduction
 
Literature managment training
Literature managment trainingLiterature managment training
Literature managment training
 
Scientific literature managment - exercises
Scientific literature managment - exercisesScientific literature managment - exercises
Scientific literature managment - exercises
 

Dernier

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 

Dernier (20)

FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 

Workshop NGS data analysis - 3

  • 1. Sequencing data analysis Workshop – part 3 / peak calling and annotation Outline Previously in this workshop… Peak calling and annotation – the steps Peak calling and annotation – the workshop Maté Ongenaert
  • 2. Previously in this workshop… Introduction – the real cost of sequencing
  • 3. Previously in this workshop… Introduction – the real cost of sequencing
  • 4. Previously in this workshop… The workflow of NGS data analysis Data analysis Raw machine reads… What’s next? Preprocessing (machine/technology) - adaptors, indexes, conversions,… - machine/technology dependent Reads with associated qualities (universal) - FASTQ - QC check Depending on application (general applicable) - ‘de novo’ assembly of genome (bacterial genomes,…) - Mapping to a reference genome  mapped reads - SAM/BAM/… High-level analysis (specific for application) - SNP calling - Peak calling
  • 5. Previously in this workshop… The workflow of NGS data analysis
  • 6. Previously in this workshop… Main data formats Raw sequence reads: - Represent the sequence ~ FASTA >SEQUENCE_IDENTIFIER GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT - Extension: represent the quality, per base ~ FASTQ – Q for quality Score ~ phred ~ ASCII table ~ phred + 33 = Sanger @SEQUENCE_IDENTIFIER GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 - Machine and platform independent and compressed: SRA (NCBI) Get the original FASTQ file using SRATools (NCBI)
  • 7. Previously in this workshop… Main data formats - Now moving to a common file format  SAM / BAM (Sequence Alignment/Map) - BAM: binary (read: computer-readable, indexed, compressed) ‘form’ of SAM DESCRIPTION OF THE 11 FIELDS IN THE ALIGNMENT SECTION # QNAME: template name #FLAG #RNAME: reference name # POS: mapping position #MAPQ: mapping quality #CIGAR: CIGAR string #RNEXT: reference name of the mate/next fragment #PNEXT: position of the mate/next fragment #TLEN: observed template length #SEQ: fragment sequence #QUAL: ASCII of Phred-scale base quality+33 #Headers @HD VN:1.3 SO:coordinate @SQ SN:ref LN:45 #Alignment block r001 163 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG * r002 0 ref 9 30 3S6M1P1I4M * 0 0 AAAAGATAAGGATA * r003 0 ref 9 30 5H6M * 0 0 AGCTAA * NM:i:1 r004 0 ref 16 30 6M14N5M * 0 0 ATAGCTTCAGC *
  • 8. Previously in this workshop… Main data formats - BED files (location / annotation / scores): Browser Extensible Data Used for mapping / annotation / peak locations / - extension: bigBED (binary) FIELDS USED: # chr # start # end # name # score # strand track name=pairedReads description="Clone Paired Reads" useScore=1 #chr start end name score strand chr22 1000 5000 cloneA 960 + chr22 2000 6000 cloneB 900 – - BEDGraph files (location, combined with score) Used to represent peak scores track type=bedGraph name="BedGraph Format" description="BedGraph format" visibility=full color=200,100,0 altColor=0,100,200 priority=20 #chr start end score chr19 59302000 59302300 -1.0 chr19 59302300 59302600 -0.75 chr19 59302600 59302900 -0.50
  • 9. Previously in this workshop… Main data formats - WIG files (location / annotation / scores): wiggle Used for visulization or summarize data, in most cases count data or normalized count data (RPKM) – extension: BigWig – binary versions (often used in GEO for ChIP-seq peaks) browser position chr19:59304200-59310700 browser hide all #150 base wide bar graph at arbitrarily spaced positions, #threshold line drawn at y=11.76 #autoScale off viewing range set to [0:25] #priority = 10 positions this as the first graph track type=wiggle_0 name="variableStep" description="variableStep format" visibility=full autoScale=off viewLimits=0.0:25.0 color=50,150,255 yLineMark=11.76 yLineOnOff=on priority=10 variableStep chrom=chr19 span=150 59304701 10.0 59304901 12.5 59305401 15.0 59305601 17.5 59305901 20.0 59306081 17.5
  • 10. Previously in this workshop… Main data formats - GFF format (General Feature Format) or GTF Used for annotation of genetic / genomic features – such as all coding genes in Ensembl Often used in downstream analysis to assign annotation to regions / peaks / … FIELDS USED: # seqname (the name of the sequence) # source (the program that generated this feature) # feature (the name of this type of feature – for example: exon) # start (the starting position of the feature in the sequence) # end (the ending position of the feature) # score (a score between 0 and 1000) # strand (valid entries include '+', '-', or '.') # frame (if the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'.) # group (all lines with the same group are linked together into a single item) track name=regulatory description="TeleGene(tm) Regulatory Regions" #chr source feature start end scores tr fr group chr22 TeleGene enhancer 1000000 1001000 500 + . touch1 chr22 TeleGene promoter 1010000 1010100 900 + . touch1 chr22 TeleGene promoter 1020000 1020000 800 - . touch2
  • 11. Peak calling The workflow Peak calling: Identify genomic regions where the number of sequenced reads (coverage) of the IP- sample is higher than can be estimated from the input (control) samples >> enriched regions >> possibly captured by the IP & thus sequenced with more coverage Peak annotation: When such enriched regions are identified, where are they located (intron/exon/…) ? What is the closest gene or the closest promoter region?
  • 12. Peak calling The workflow Peak calling: Coverage From the BAM file: mapping against the reference genome Both the IP-sample and the control (Input) must be mapped, duplicates will be ignored by most peak callers Peak caller will determine coverage for both samples - Store them for visualisation (WIG files; BIGWIG files or similar) Enriched Find out which regions are enriched (or within the sample or versus a control (Input) sample  statistics ~ model of tag distributions and normalisation strategy
  • 13. Peak calling The workflow Peak calling: Enriched Find out which regions are enriched (or within the sample or versus a control (Input) sample  statistics ~ model of tag distributions and normalisation strategy Significance relative to control Density profiles Peak assignment Control data adjustment Statistical model / test data Statistical Window- Tag Gaussian Strand- Peak height Bacground Genomic Normalized Conditional Local Chromome Program Reference FDR model on HMM T-test based clustering kernel specific or FE subtract dupl/deletions control binomial poisson poisson control Cisgenome [73] X X X X X X Minimal ChipSeq [74] X X X Peak Finder E-range [75] X X X X X MACS [76] X X X X X QuEST [77] X X X X X Hpeak [78] X X X X Sole-Search [79] X X X X X PeakSeq [80] X X X X SISSRS [81] X X X spp package [82] X X X X X
  • 14. Peak calling The workflow Usage: macs14 <-t tfile> [-n name] [-g genomesize] [options] Example: macs14 -t ChIP.bam -c Control.bam -f BAM -g h -n test -w --call-subpeaks macs14 -- Model-based Analysis for ChIP-Sequencing Options: --version show program's version number and exit -h, --help show this help message and exit. -t TFILE, --treatment=TFILE ChIP-seq treatment files. REQUIRED. When ELANDMULTIPET is selected, you must provide two files separated by comma, e.g. s_1_1_eland_multi.txt,s_1_2_eland_multi.txt -c CFILE, --control=CFILE Control files. When ELANDMULTIPET is selected, you must provide two files separated by comma, e.g. s_2_1_eland_multi.txt,s_2_2_eland_multi.txt -n NAME, --name=NAME Experiment name, which will be used to generate output file names. DEFAULT: "NA" -f FORMAT, --format=FORMAT Format of tag file, "AUTO", "BED" or "ELAND" or "ELANDMULTI" or "ELANDMULTIPET" or "ELANDEXPORT" or "SAM" or "BAM" or "BOWTIE". The default AUTO option will let MACS decide which format the file is. Please check the definition in 00README file if you choose EL AND/ELANDMULTI/ELANDMULTIPET/ELANDEXPORT/SAM/BAM/BOWTI E. DEFAULT: "AUTO"
  • 15. Peak calling The workflow Peak annotation Enriched Peak locations > in which features is my peak located; is it close to a gene; provide me some statistics on how far my peaks are from annotated TSSes R/BioConductor ChipPeakAnno package PeakAnalyzer
  • 16. Sequencing data analysis Workshop – part 3 / peak calling and annotation Outline Previously in this workshop… Peak calling and annotation – the steps Peak calling and annotation – the workshop Maté Ongenaert
  • 17. Peak calling The workflow Further downstream processing Peak overlaps Is this observed overlap larger than one can expect if the datasets were random?  Peak caller gives each peak a score  Randomy distribute this score accross the peaks of the same peakset (factor) and, for a percentage of top- peaks, calculate overlapping peaks in real dataset and with random distributed scores
  • 18. Peak calling The workflow Further downstream processing Identify sequence motifs (region around ‘peak’, searched for motifs) Further downstream processing Identify differentially bound regions between conditions/factors/…
  • 19. Peak calling The workflow Further downstream processing Peak overlaps Real 10% 15% 20% 30% 50% 75% 7 18 25 52 102 201 Means 0, 347 11 ,53 2, 699 9, 297 42, 377 1 888 40, Factor diff 20,7291 1 066 1 61 4484 5, 1 9, 262689885 5, 593202108 2, 406966043 14266651 , 52 FDR 10% 15% 20% 30% 50% 75% 0 0 0 0 0 0 10% 10% 15% 20% 30% 50% 75% 282 333 506 907 1000 1000 20% 10% 15% 20% 30% 50% 75% 59 33 125 332 1000 1000 30% 10% 15% 20% 30% 50% 75% 4 2 9 27 981 1000 50% 10% 15% 20% 30% 50% 75% 2 0 0 0 95 1000 75% 10% 15% 20% 30% 50% 75% 0 0 0 0 0 148
  • 20. Sequencing data analysis Workshop – part 3 / peak calling and annotation Outline Previously in this workshop… Peak calling and annotation – the steps Peak calling and annotation – the workshop Maté Ongenaert
  • 21. Blok de Van… ETER