SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Luca Cozzuto
Sarah Bonnin
Bioinformatics Core Facility
Additional topics (parsing
methods) for biologists with
a focus on ChIP-seq data
ChIP-Seq experiment
By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles
& references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
ChIP-Seq experiment
By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles
& references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
ChIP-Seq experiment
By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles
& references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
ChIP-Seq experiment
By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles
& references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
@HWI-ST227:389:C4WA2ACXX:7:1204:2272:59979
GGAGGAAGGTCCTCGCTCCTCTTTCATATAAGGGAAATGGCTGAAT
+
FFFFHHHHHHJIJJJJJJJJIJJJIGIGIGGIJJIJIJJJJJJIII
@HWI-ST227:389:C4WA2ACXX:7:1205:15214:42893
GAGGATCCCAGGGAGGAAGGTCCTCGCTCCTCTTTCATCTAAGGGA
+
12BAFB?A:3<AE1@<FF;1*@EG*)?0?DBD>9BF9B*?######
@HWI-ST227:389:C4WA2ACXX:8:2208:2467:44624
AAAGAGGAGAGAGGACCATCCTCCCTGGGATCCTCAGAAGTCTACT
+
BDDA:DB?2AA@FC>F?EEGC<FED>GFD;?GBB?<?F99*/9?9?
Raw data, reads in FASTQ format
Raw data, reads in FASTQ format
@HWI-ST227:389:C4WA2ACXX:7:1204:2272:59979
GGAGGAAGGTCCTCGCTCCTCTTTCATATAAGGGAAATGGCTGAAT
+
FFFFHHHHHHJIJJJJJJJJIJJJIGIGIGGIJJIJIJJJJJJIII
@HWI-ST227:389:C4WA2ACXX:7:1205:15214:42893
GAGGATCCCAGGGAGGAAGGTCCTCGCTCCTCTTTCATCTAAGGGA
+
12BAFB?A:3<AE1@<FF;1*@EG*)?0?DBD>9BF9B*?######
@HWI-ST227:389:C4WA2ACXX:8:2208:2467:44624
AAAGAGGAGAGAGGACCATCCTCCCTGGGATCCTCAGAAGTCTACT
+
BDDA:DB?2AA@FC>F?EEGC<FED>GFD;?GBB?<?F99*/9?9?
Header Sequence Quality
Raw data, reads in FASTQ format
zcat B7_H3K4me1.fastq.gz | awk '{num++}END{print num/4}’
41103741
Counting fastq reads (the slow way)
Raw data, reads in FASTQ format
Phred quality score.
l Q=-10 log10p
l p = probability that the corresponding base call is
incorrect
l Example: p = 0.001 means a quality of 30
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJ
0........................26...31.........41
Raw data, reads in FASTQ format
Analyzing the quality (FASTQC)
GOOD BAD
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Alignment
l Align 20-30 million reads per sample to the reference
genome.
l Reference genome can be very long (human is 3 Giga
bases)
l We need ultra-fast mappers:
l Bowtie (http://bowtie-bio.sourceforge.net/index.shtml)
l Bwa (http://bio-bwa.sourceforge.net/)
l GEM (https://github.com/smarco/gem3-mapper)
l …
Reference genome (Fasta file)
>1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Reference genome (Fasta file)
>1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Header
Reference genome (Fasta file)
zcat GRCm38.primary_assembly.genome.fa.gz | grep ">"
>chr1 1
>chr2 2
>chr3 3
>chr4 4
>chr5 5
>chr6 6
>chr7 7
>chr8 8
>chr9 9
>chr10 10
>chr11 11
>chr12 12
>chr13 13
>chr14 14
>chr15 15
>chr16 16
>chr17 17
>chr18 18
>chr19 19
>chrX X
>chrY Y
>chrM MT
Annotations (GTF format)
#!genome-build GRCm38.p5
#!genome-version GRCm38
#!genome-date 2012-01
#!genome-build-accession NCBI:GCA_000001635.7
#!genebuild-last-updated 2017-01
1 havana gene 3073253 3074322 . + . gene_id
"ENSMUSG00000102693"; gene_version "1"; gene_name "4933401J01Rik";
gene_source "havana"; gene_biotype "TEC"; havana_gene "OTTMUSG00000049935";
havana_gene_version "1";
https://www.ensembl.org/info/website/upload/gff.html
Header
Annotations (GTF format)
#!genome-build GRCm38.p5
#!genome-version GRCm38
#!genome-date 2012-01
#!genome-build-accession NCBI:GCA_000001635.7
#!genebuild-last-updated 2017-01
1 havana gene 3073253 3074322 . + . gene_id
"ENSMUSG00000102693"; gene_version "1"; gene_name "4933401J01Rik";
gene_source "havana"; gene_biotype "TEC"; havana_gene "OTTMUSG00000049935";
havana_gene_version "1";
Reference sequence // Source // Feature (gene, transcript, exon etc) //
Start // End // Score // Strand // Frame (0,1,2) //
Attributes separated by “;”
https://www.ensembl.org/info/website/upload/gff.html
Alignment
l Align 20-30 million reads per sample to the reference
genome.
l Reference genome has to be indexed
l Problems with repetitive sequences
?
Alignment
l Align 20-30 million reads per sample to the reference
genome.
l Reference genome has to be indexed
l Problems with repetitive sequences
l Problems with PCR artifacts (marking duplicates)
Alignment (SAM / BAM format)
@HD VN:1.5 SO:coordinate
@SQ SN:1 LN:195471971
@SQ SN:2 LN:182113224
@SQ SN:3 LN:160039680
…
@PG ID:bowtie2 PN:bowtie2 VN:2.3.2
CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 --non-deterministic -x
bowtie2genome -p 8 -U B7_H3K4me1.fastq.gz"
NS500454:71:H3TV7BGXY:4:22608:3293:16569 16 1 3000101 7
75M * 0 0
TTTTTTTTTTTTTTTTTTTTTTTGGTTTTGAGACTATTGATGACTGCCTCTATTTCTTTAGGGGAAATGGGACTTE/EEEAAEEE
EEEE6E6EAEE/E6EEE//<6/E/EAEE/EE/E/EE66E6E6EEEEEEE/EAAA/E/EE/AAAAA MD:Z:25G1G0G46
XG:i:0 NM:i:3 XM:i:3 XN:i:0 XO:i:0 AS:i:-11 XS:i:-20 YT:Z:UU
PG:Z:MarkDuplicates
https://samtools.github.io/hts-specs/SAMv1.pdf
Alignment (SAM / BAM format)
@HD VN:1.5 SO:coordinate
@SQ SN:1 LN:195471971
@SQ SN:2 LN:182113224
@SQ SN:3 LN:160039680
…
@PG ID:bowtie2 PN:bowtie2 VN:2.3.2
CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 --non-deterministic -x
bowtie2genome -p 8 -U B7_H3K4me1.fastq.gz"
NS500454:71:H3TV7BGXY:4:22608:3293:16569 16 1 3000101 7
75M * 0 0
TTTTTTTTTTTTTTTTTTTTTTTGGTTTTGAGACTATTGATGACTGCCTCTATTTCTTTAGGGGAAATGGGACTT
E/EEEAAEEEEEEE6E6EAEE/E6EEE//<6/E/EAEE/EE/E/EE66E6E6EEEEEEE/EAAA/E/EE/AAAAA
MD:Z:25G1G0G46 XG:i:0 NM:i:3 XM:i:3 XN:i:0 XO:i:0 AS:i:-11
XS:i:-20 YT:Z:UU PG:Z:MarkDuplicates
Header
@HD: header line // VN: format version // SO: sorting order of alignments
@SQ: reference sequence dictionary // SN: sequence name // LN: length
@PG: program // ID: program name // VN: version // CL: command line
https://samtools.github.io/hts-specs/SAMv1.pdf
Alignment (SAM / BAM format)
@HD VN:1.5 SO:coordinate
@SQ SN:1 LN:195471971
@SQ SN:2 LN:182113224
@SQ SN:3 LN:160039680
…
@PG ID:bowtie2 PN:bowtie2 VN:2.3.2
CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 --non-deterministic -x
bowtie2genome -p 8 -U B7_H3K4me1.fastq.gz"
NS500454:71:H3TV7BGXY:4:22608:3293:16569 16 1 3000101 7
75M * 0 0
TTTTTTTTTTTTTTTTTTTTTTTGGTTTTGAGACTATTGATGACTGCCTCTATTTCTTTAGGGGAAATGGGACTT
E/EEEAAEEEEEEE6E6EAEE/E6EEE//<6/E/EAEE/EE/E/EE66E6E6EEEEEEE/EAAA/E/EE/AAAAA
MD:Z:25G1G0G46 XG:i:0 NM:i:3 XM:i:3 XN:i:0 XO:i:0 AS:i:-11
XS:i:-20 YT:Z:UU PG:Z:MarkDuplicates
Alignment
Query name // FLAG // Reference name // leftmost mapping position //
Mapping quality (7, p=0.2) // CIGAR string // Reference name for mate read //
Position of the mate // template length // sequence // quality
In this case FLAG 16 means: “read being reverse complemented”
https://samtools.github.io/hts-specs/SAMv1.pdf
Alignment (SAM / BAM format)
https://software.broadinstitute.org/software/igv/
Quality control of the enrichment
https://deeptools.readthedocs.io/en/develop/index.html
Distribution of the signal (wiggle format)
https://deeptools.readthedocs.io/en/develop/index.html
variableStep chrom=chr2
300701 12.5
300702 12.5
300703 12.5
300704 12.5
300705 12.5
...
Peak calling
https://software.broadinstitute.org/software/igv/
Peak calling
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-
Seq (MACS). Genome Biol. 2008;9(9):R137.
It is possible to infer the fragment size and use it for extending the reads to
get more reliable peaks (i.e. binding sites). The peak is in the middle.
Peak coordinates (Bed format)
https://genome.ucsc.edu/FAQ/FAQformat.html#format1
Chromosome // Start // End (3 fields BED)
+ Name // Score // Strand (6 fields BED)
+ thickStart // thickEnd // itemRgb
+ blockCount // blockSizes // blockStarts (12 fields BED)
track name=chipseq description=”IP of Ring1B TF"
1 3444977 3445551 peak_1 31 .
1 4773116 4774454 peak_2 114 .
1 4774530 4777431 peak_3 108 .
1 4786374 4786850 peak_4 80 .
1 4806806 4807288 peak_5 66 .
bigBed and bigWig format
https://genome.ucsc.edu/goldenpath/help/bigWig.html
https://genome.ucsc.edu/goldenpath/help/bigBed.html
Indexed binary format generated from bed and wiggle files.
Annotating peaks
https://bedtools.readthedocs.io/en/latest/
Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014 Sep 8;47:11.12.1-34
Crossing information from gtf files and bed files (BedTools)
intersectBed -a Peaks/B7_H3K4me1_vs_B7_input-macs-narrow--q_0_peaks.bed 
-b gencode.vM17.annotation.gtf 
-wa -wb -nonamecheck | 
awk '{if ($9 == "gene") print }'
Annotating peaks
https://bedtools.readthedocs.io/en/latest/
Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014 Sep 8;47:11.12.1-34
Crossing information from gtf files and bed files (BedTools)
intersectBed -a Peaks/B7_H3K4me1_vs_B7_input-macs-narrow--q_0_peaks.bed 
-b gencode.vM17.annotation.gtf 
-wa -wb -nonamecheck | 
awk '{if ($9 == "gene") print }'
chr1 3444977 3445551 peak_15 31 .
chr1 HAVANA gene -nonamecheck 3205901 3671498 . -
. gene_id "ENSMUSG00000051951.5"; gene_type
"protein_coding"; gene_name "Xkr4"; level 2; havana_gene
"OTTMUSG00000026353.2";
Annotating peaks
https://bedtools.readthedocs.io/en/latest/
Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014 Sep 8;47:11.12.1-34
Crossing information from gtf files and bed files (BedTools)
awk '{if ($3 == "gene") print }' gencode.vM17.annotation.gtf | 
closestBed -a Peaks/B7_H3K4me1_vs_B7_input-macs-narrow--q_0_peaks.bed 
-d -b -

Contenu connexe

Similaire à Course on parsing methods for biologists with a focus on ChIP-seq data

Михаил Епихин — Бутылочное горлышко. как найти узкие места сервиса и увеличит...
Михаил Епихин — Бутылочное горлышко. как найти узкие места сервиса и увеличит...Михаил Епихин — Бутылочное горлышко. как найти узкие места сервиса и увеличит...
Михаил Епихин — Бутылочное горлышко. как найти узкие места сервиса и увеличит...Yandex
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...Benjamin Schwessinger
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerSasha Goldshtein
 
An open source framework for processing daily satellite images (AVHRR) over l...
An open source framework for processing daily satellite images (AVHRR) over l...An open source framework for processing daily satellite images (AVHRR) over l...
An open source framework for processing daily satellite images (AVHRR) over l...Sajid Pareeth
 
NGS techniques and data
NGS techniques and data NGS techniques and data
NGS techniques and data Lex Nederbragt
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
The Power of CSS
The Power of CSSThe Power of CSS
The Power of CSSAniket Pant
 
【TECH×GAME COLLEGE#22】マイクリプトヒーローズの作り方
【TECH×GAME COLLEGE#22】マイクリプトヒーローズの作り方【TECH×GAME COLLEGE#22】マイクリプトヒーローズの作り方
【TECH×GAME COLLEGE#22】マイクリプトヒーローズの作り方double jump.tokyo, inc
 
NetBioSIG2014-Talk by Ashwini Patil
NetBioSIG2014-Talk by Ashwini PatilNetBioSIG2014-Talk by Ashwini Patil
NetBioSIG2014-Talk by Ashwini PatilAlexander Pico
 
F Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineF Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineFrancesca Giordano
 
Presentation for Phi Sigma Fall 2015
Presentation for Phi Sigma Fall 2015Presentation for Phi Sigma Fall 2015
Presentation for Phi Sigma Fall 2015Caelie Kern
 

Similaire à Course on parsing methods for biologists with a focus on ChIP-seq data (20)

Михаил Епихин — Бутылочное горлышко. как найти узкие места сервиса и увеличит...
Михаил Епихин — Бутылочное горлышко. как найти узкие места сервиса и увеличит...Михаил Епихин — Бутылочное горлышко. как найти узкие места сервиса и увеличит...
Михаил Епихин — Бутылочное горлышко. как найти узкие места сервиса и увеличит...
 
Selection analysis using HyPhy
Selection analysis using HyPhySelection analysis using HyPhy
Selection analysis using HyPhy
 
How to Cisco ACI Multi-Pod
How to Cisco ACI Multi-PodHow to Cisco ACI Multi-Pod
How to Cisco ACI Multi-Pod
 
Advanced Computational Drug Design
Advanced Computational Drug DesignAdvanced Computational Drug Design
Advanced Computational Drug Design
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
 
Chan, Pak
Chan, PakChan, Pak
Chan, Pak
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF Primer
 
Paloma Pérez-Enfermedades raras de la piel
Paloma Pérez-Enfermedades raras de la pielPaloma Pérez-Enfermedades raras de la piel
Paloma Pérez-Enfermedades raras de la piel
 
An open source framework for processing daily satellite images (AVHRR) over l...
An open source framework for processing daily satellite images (AVHRR) over l...An open source framework for processing daily satellite images (AVHRR) over l...
An open source framework for processing daily satellite images (AVHRR) over l...
 
Submitted sequence (strains)
Submitted sequence (strains)Submitted sequence (strains)
Submitted sequence (strains)
 
NGS techniques and data
NGS techniques and data NGS techniques and data
NGS techniques and data
 
Ruegeria pomeroyi dss term
Ruegeria  pomeroyi  dss termRuegeria  pomeroyi  dss term
Ruegeria pomeroyi dss term
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
The Power of CSS
The Power of CSSThe Power of CSS
The Power of CSS
 
Edge trends mizuno
Edge trends mizunoEdge trends mizuno
Edge trends mizuno
 
【TECH×GAME COLLEGE#22】マイクリプトヒーローズの作り方
【TECH×GAME COLLEGE#22】マイクリプトヒーローズの作り方【TECH×GAME COLLEGE#22】マイクリプトヒーローズの作り方
【TECH×GAME COLLEGE#22】マイクリプトヒーローズの作り方
 
NetBioSIG2014-Talk by Ashwini Patil
NetBioSIG2014-Talk by Ashwini PatilNetBioSIG2014-Talk by Ashwini Patil
NetBioSIG2014-Talk by Ashwini Patil
 
F Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineF Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis Pipeline
 
Presentation for Phi Sigma Fall 2015
Presentation for Phi Sigma Fall 2015Presentation for Phi Sigma Fall 2015
Presentation for Phi Sigma Fall 2015
 

Plus de Luca Cozzuto

vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'Luca Cozzuto
 
From Zero to Nextflow 2017
From Zero to Nextflow 2017From Zero to Nextflow 2017
From Zero to Nextflow 2017Luca Cozzuto
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Luca Cozzuto
 
Annotating nc-RNAs with Rfam
Annotating nc-RNAs with RfamAnnotating nc-RNAs with Rfam
Annotating nc-RNAs with RfamLuca Cozzuto
 

Plus de Luca Cozzuto (6)

vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'
 
From Zero to Nextflow 2017
From Zero to Nextflow 2017From Zero to Nextflow 2017
From Zero to Nextflow 2017
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
 
AnnoWiki
AnnoWikiAnnoWiki
AnnoWiki
 
Macs course
Macs courseMacs course
Macs course
 
Annotating nc-RNAs with Rfam
Annotating nc-RNAs with RfamAnnotating nc-RNAs with Rfam
Annotating nc-RNAs with Rfam
 

Dernier

COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactisticshameyhk98
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptxJoelynRubio1
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationNeilDeclaro1
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 

Dernier (20)

COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 

Course on parsing methods for biologists with a focus on ChIP-seq data

  • 1. Luca Cozzuto Sarah Bonnin Bioinformatics Core Facility Additional topics (parsing methods) for biologists with a focus on ChIP-seq data
  • 2. ChIP-Seq experiment By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles & references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
  • 3. ChIP-Seq experiment By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles & references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
  • 4. ChIP-Seq experiment By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles & references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
  • 5. ChIP-Seq experiment By Jkwchui - Cell diagram adapted from LadyOfHats' Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles & references., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17890854
  • 7. Raw data, reads in FASTQ format @HWI-ST227:389:C4WA2ACXX:7:1204:2272:59979 GGAGGAAGGTCCTCGCTCCTCTTTCATATAAGGGAAATGGCTGAAT + FFFFHHHHHHJIJJJJJJJJIJJJIGIGIGGIJJIJIJJJJJJIII @HWI-ST227:389:C4WA2ACXX:7:1205:15214:42893 GAGGATCCCAGGGAGGAAGGTCCTCGCTCCTCTTTCATCTAAGGGA + 12BAFB?A:3<AE1@<FF;1*@EG*)?0?DBD>9BF9B*?###### @HWI-ST227:389:C4WA2ACXX:8:2208:2467:44624 AAAGAGGAGAGAGGACCATCCTCCCTGGGATCCTCAGAAGTCTACT + BDDA:DB?2AA@FC>F?EEGC<FED>GFD;?GBB?<?F99*/9?9? Header Sequence Quality
  • 8. Raw data, reads in FASTQ format zcat B7_H3K4me1.fastq.gz | awk '{num++}END{print num/4}’ 41103741 Counting fastq reads (the slow way)
  • 9. Raw data, reads in FASTQ format Phred quality score. l Q=-10 log10p l p = probability that the corresponding base call is incorrect l Example: p = 0.001 means a quality of 30 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJ 0........................26...31.........41
  • 10. Raw data, reads in FASTQ format Analyzing the quality (FASTQC) GOOD BAD https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  • 11. Alignment l Align 20-30 million reads per sample to the reference genome. l Reference genome can be very long (human is 3 Giga bases) l We need ultra-fast mappers: l Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) l Bwa (http://bio-bwa.sourceforge.net/) l GEM (https://github.com/smarco/gem3-mapper) l …
  • 12. Reference genome (Fasta file) >1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
  • 13. Reference genome (Fasta file) >1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN Header
  • 14. Reference genome (Fasta file) zcat GRCm38.primary_assembly.genome.fa.gz | grep ">" >chr1 1 >chr2 2 >chr3 3 >chr4 4 >chr5 5 >chr6 6 >chr7 7 >chr8 8 >chr9 9 >chr10 10 >chr11 11 >chr12 12 >chr13 13 >chr14 14 >chr15 15 >chr16 16 >chr17 17 >chr18 18 >chr19 19 >chrX X >chrY Y >chrM MT
  • 15. Annotations (GTF format) #!genome-build GRCm38.p5 #!genome-version GRCm38 #!genome-date 2012-01 #!genome-build-accession NCBI:GCA_000001635.7 #!genebuild-last-updated 2017-01 1 havana gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693"; gene_version "1"; gene_name "4933401J01Rik"; gene_source "havana"; gene_biotype "TEC"; havana_gene "OTTMUSG00000049935"; havana_gene_version "1"; https://www.ensembl.org/info/website/upload/gff.html
  • 16. Header Annotations (GTF format) #!genome-build GRCm38.p5 #!genome-version GRCm38 #!genome-date 2012-01 #!genome-build-accession NCBI:GCA_000001635.7 #!genebuild-last-updated 2017-01 1 havana gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693"; gene_version "1"; gene_name "4933401J01Rik"; gene_source "havana"; gene_biotype "TEC"; havana_gene "OTTMUSG00000049935"; havana_gene_version "1"; Reference sequence // Source // Feature (gene, transcript, exon etc) // Start // End // Score // Strand // Frame (0,1,2) // Attributes separated by “;” https://www.ensembl.org/info/website/upload/gff.html
  • 17. Alignment l Align 20-30 million reads per sample to the reference genome. l Reference genome has to be indexed l Problems with repetitive sequences ?
  • 18. Alignment l Align 20-30 million reads per sample to the reference genome. l Reference genome has to be indexed l Problems with repetitive sequences l Problems with PCR artifacts (marking duplicates)
  • 19. Alignment (SAM / BAM format) @HD VN:1.5 SO:coordinate @SQ SN:1 LN:195471971 @SQ SN:2 LN:182113224 @SQ SN:3 LN:160039680 … @PG ID:bowtie2 PN:bowtie2 VN:2.3.2 CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 --non-deterministic -x bowtie2genome -p 8 -U B7_H3K4me1.fastq.gz" NS500454:71:H3TV7BGXY:4:22608:3293:16569 16 1 3000101 7 75M * 0 0 TTTTTTTTTTTTTTTTTTTTTTTGGTTTTGAGACTATTGATGACTGCCTCTATTTCTTTAGGGGAAATGGGACTTE/EEEAAEEE EEEE6E6EAEE/E6EEE//<6/E/EAEE/EE/E/EE66E6E6EEEEEEE/EAAA/E/EE/AAAAA MD:Z:25G1G0G46 XG:i:0 NM:i:3 XM:i:3 XN:i:0 XO:i:0 AS:i:-11 XS:i:-20 YT:Z:UU PG:Z:MarkDuplicates https://samtools.github.io/hts-specs/SAMv1.pdf
  • 20. Alignment (SAM / BAM format) @HD VN:1.5 SO:coordinate @SQ SN:1 LN:195471971 @SQ SN:2 LN:182113224 @SQ SN:3 LN:160039680 … @PG ID:bowtie2 PN:bowtie2 VN:2.3.2 CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 --non-deterministic -x bowtie2genome -p 8 -U B7_H3K4me1.fastq.gz" NS500454:71:H3TV7BGXY:4:22608:3293:16569 16 1 3000101 7 75M * 0 0 TTTTTTTTTTTTTTTTTTTTTTTGGTTTTGAGACTATTGATGACTGCCTCTATTTCTTTAGGGGAAATGGGACTT E/EEEAAEEEEEEE6E6EAEE/E6EEE//<6/E/EAEE/EE/E/EE66E6E6EEEEEEE/EAAA/E/EE/AAAAA MD:Z:25G1G0G46 XG:i:0 NM:i:3 XM:i:3 XN:i:0 XO:i:0 AS:i:-11 XS:i:-20 YT:Z:UU PG:Z:MarkDuplicates Header @HD: header line // VN: format version // SO: sorting order of alignments @SQ: reference sequence dictionary // SN: sequence name // LN: length @PG: program // ID: program name // VN: version // CL: command line https://samtools.github.io/hts-specs/SAMv1.pdf
  • 21. Alignment (SAM / BAM format) @HD VN:1.5 SO:coordinate @SQ SN:1 LN:195471971 @SQ SN:2 LN:182113224 @SQ SN:3 LN:160039680 … @PG ID:bowtie2 PN:bowtie2 VN:2.3.2 CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 --non-deterministic -x bowtie2genome -p 8 -U B7_H3K4me1.fastq.gz" NS500454:71:H3TV7BGXY:4:22608:3293:16569 16 1 3000101 7 75M * 0 0 TTTTTTTTTTTTTTTTTTTTTTTGGTTTTGAGACTATTGATGACTGCCTCTATTTCTTTAGGGGAAATGGGACTT E/EEEAAEEEEEEE6E6EAEE/E6EEE//<6/E/EAEE/EE/E/EE66E6E6EEEEEEE/EAAA/E/EE/AAAAA MD:Z:25G1G0G46 XG:i:0 NM:i:3 XM:i:3 XN:i:0 XO:i:0 AS:i:-11 XS:i:-20 YT:Z:UU PG:Z:MarkDuplicates Alignment Query name // FLAG // Reference name // leftmost mapping position // Mapping quality (7, p=0.2) // CIGAR string // Reference name for mate read // Position of the mate // template length // sequence // quality In this case FLAG 16 means: “read being reverse complemented” https://samtools.github.io/hts-specs/SAMv1.pdf
  • 22. Alignment (SAM / BAM format) https://software.broadinstitute.org/software/igv/
  • 23. Quality control of the enrichment https://deeptools.readthedocs.io/en/develop/index.html
  • 24. Distribution of the signal (wiggle format) https://deeptools.readthedocs.io/en/develop/index.html variableStep chrom=chr2 300701 12.5 300702 12.5 300703 12.5 300704 12.5 300705 12.5 ...
  • 26. Peak calling Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP- Seq (MACS). Genome Biol. 2008;9(9):R137. It is possible to infer the fragment size and use it for extending the reads to get more reliable peaks (i.e. binding sites). The peak is in the middle.
  • 27. Peak coordinates (Bed format) https://genome.ucsc.edu/FAQ/FAQformat.html#format1 Chromosome // Start // End (3 fields BED) + Name // Score // Strand (6 fields BED) + thickStart // thickEnd // itemRgb + blockCount // blockSizes // blockStarts (12 fields BED) track name=chipseq description=”IP of Ring1B TF" 1 3444977 3445551 peak_1 31 . 1 4773116 4774454 peak_2 114 . 1 4774530 4777431 peak_3 108 . 1 4786374 4786850 peak_4 80 . 1 4806806 4807288 peak_5 66 .
  • 28. bigBed and bigWig format https://genome.ucsc.edu/goldenpath/help/bigWig.html https://genome.ucsc.edu/goldenpath/help/bigBed.html Indexed binary format generated from bed and wiggle files.
  • 29. Annotating peaks https://bedtools.readthedocs.io/en/latest/ Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014 Sep 8;47:11.12.1-34 Crossing information from gtf files and bed files (BedTools) intersectBed -a Peaks/B7_H3K4me1_vs_B7_input-macs-narrow--q_0_peaks.bed -b gencode.vM17.annotation.gtf -wa -wb -nonamecheck | awk '{if ($9 == "gene") print }'
  • 30. Annotating peaks https://bedtools.readthedocs.io/en/latest/ Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014 Sep 8;47:11.12.1-34 Crossing information from gtf files and bed files (BedTools) intersectBed -a Peaks/B7_H3K4me1_vs_B7_input-macs-narrow--q_0_peaks.bed -b gencode.vM17.annotation.gtf -wa -wb -nonamecheck | awk '{if ($9 == "gene") print }' chr1 3444977 3445551 peak_15 31 . chr1 HAVANA gene -nonamecheck 3205901 3671498 . - . gene_id "ENSMUSG00000051951.5"; gene_type "protein_coding"; gene_name "Xkr4"; level 2; havana_gene "OTTMUSG00000026353.2";
  • 31. Annotating peaks https://bedtools.readthedocs.io/en/latest/ Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014 Sep 8;47:11.12.1-34 Crossing information from gtf files and bed files (BedTools) awk '{if ($3 == "gene") print }' gencode.vM17.annotation.gtf | closestBed -a Peaks/B7_H3K4me1_vs_B7_input-macs-narrow--q_0_peaks.bed -d -b -