SlideShare une entreprise Scribd logo
1  sur  41
ChIP-seq analysis




               Luca Cozzuto
            BioinformaticsCore
ChIP-seq analysis

•ChIP-seq is the combination of chromatin immuno-
precipitation with ultra-sequencing.

• Allows to detect genomic portions bound by proteins such
as:

   • Transcription factors
   •Histones
   • Polymerase II
   •…
ChIP-seq analysis
Typical workflow
ChIP-seq analysis
Typical workflow
ChIP-seq analysis
Starting the analysis.

• Typically you will receive from 10 to 30 millions of raw
reads per sample corresponding to a zipped file of 0.5-1.5
Gbytes.

FASTQ format
@HWUSI-EAS621:69:64EKPAAXX:3:1:11477:1265 1:N:0:   @(HEADER)
GAAACTTGAGGACTGCCCAGCTCGACAGACACTGGA
    (SEQUENCE)
+                                                     +(HEADER)
GEGGDGG@GGDGGGGGGGBDGGDG8GG@3D6:3:67
    (QUALITY)

The quality is encoded with a ASCII character and
represents the Phred quality score.


p = probability that that base call is incorrect
Q = 20 means base call accuracy of 99%
ChIP-seq analysis
Starting the analysis.

• It is strongly recommended to check the quality of the
sequences we received before doing the analysis!




                         Fastqc analysis
ChIP-seq analysis
Starting the analysis.

Mapping by using ultra-fast mappers:

   • GEM
   • Bowtie
   • BWA
   • Stampy

It is required to index the reference genome before doing
the analysis.
ChIP-seq analysis
Peak calling – MACS

Model-based Analysis of ChIP-Seq data.




                        TF
ChIP-seq analysis
Peak calling – MACS




                                   Sequences from IP
                      TF
ChIP-seq analysis
Peak calling – MACS




                                        Sequences from IP
                          TF




Sequenced tags on
+ strand
- strand
ChIP-seq analysis
Peak calling - MACS
ChIP-seq analysis
Peak calling – MACS
Given a sonication size (bandwith) and a fold-enrichment
(mfold), MACS slides 2*bandwidth windows across the genome to find
regions enriched to a random tag genome distribution >= mfold (default
between 10 and 30).
ChIP-seq analysis
Peak calling – MACS
MACS select at least 1,000 “model peaks” for calculating the distance
“d” between paired peaks.
ChIP-seq analysis
Peak calling – MACS
How to determine if peaks are greater than expected by chance?

•x = observed read number
•λ= expected read number




Probability to find a peak higher than x.

Tag distribution along the genome could be modeled by a Poisson
distribution.
ChIP-seq analysis
Peak calling – MACS

Example:
Tag count = 2
Number of reads = 30,000,000
Read length = 36
Mappable human genome = 2,700,000,000
ChIP-seq analysis
Peak calling – MACS

Example:
Tag count = 10
Number of reads = 30,000,000
Read length = 36
Mappable human genome = 2,700,000,000
ChIP-seq analysis
Peak calling – MACS
• shifting each tag d/2 to the 3’
• sliding windows with 2*d length across the genome to
detect the enriched regions (Poisson distribution p-value
<= 1e-5).
• Overlapping enriched regions are fused.
• Summit of the peak is considered the putative binding site



                            TF
ChIP-seq analysis
Peak calling – MACS
In order to address local biases in the genome such as local chromatin
structure, sequencing bias, genome copy number variation… MACS
evaluates candidates peaks by comparing them against a “local”
distribution.




                                               Fold enrichment =
                                               Enrichment over the
                                               λlocal
ChIP-seq analysis
Peak calling – MACS
False Discovery Rate (FDR) is calculated as number of control peaks
called / number of sample peaks. Control peaks are calculated by
swapping control and sample.




          FDR is calculated only when a control is provided!
ChIP-seq analysis
    Practical part
ChIP-seq analysis
                               Practical part


Connect to the Etna machine by using ssh.

     • MAC or Linux users can do using this command
$ ssh –X course@xxx.crg.es
course@xxx.crg.es'spassword:


     Password:xxxxxxx

     • Windows users should first download Putty and PSCP
     programs and then use them for accessing that
     machine. http://goo.gl/4BWud
ChIP-seq analysis




                                course@xxx.crg.es




Password:xxxxxx
ChIP-seq analysis
Different formats can be used as input files:
BED, ELAND, SAM, BAM, BOWTIE and for paired ends
ELAND-MULTIPET
      $ head ../data/Input_tags.bed
      chr1 233604      233639     0   2   -
      chr1 559767      559802     0   3   +
      chr1 742600      742635     0   2   +
      chr1 742600      742635     0   0   +
      chr1 744231      744266     0   0   +
      chr1 744307      744342     0   2   -
      chr1 746885      746920     0   2   +
      chr1 746958      746993     0   1   +
      chr1 748226      748261     0   2   +
      chr1 748357      748392     0   0   -




Bed fields: chromosome name, start, end, name, score
strand
ChIP-seq analysis
  Launching MACS passing the sample, the control, the
  genome size (hs = homo sapiens) and the name

$macs14 -t ../data/Treatment_tags.bed -c ../data/Input_tags.bed -ghs-n FoxA1
ChIP-seq analysis
  Check the output printed to the screen.


$macs14 -t ../data/Treatment_tags.bed -c ../data/Input_tags.bed -ghs -n FoxA1
INFO @ Thu, 29 Mar 2012 14:58:35:
# ARGUMENTS LIST:
# name = FoxA1
# format = AUTO
# ChIP-seq file = ./Treatment_tags.bed
# control file = ./Input_tags.bed
# effective genome size = 2.70e+09
# band width = 300
# model fold = 10,30
# pvalue cutoff = 1.00e-05
# Small dataset will be scaled towards larger dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps

INFO @ Thu, 29 Mar 2012 14:58:35: #1 read tag files...
INFO @ Thu, 29 Mar 2012 14:58:35: #1 read treatment tags...
INFO @ Thu, 29 Mar 2012 14:58:35: Detected format is: BED



Regional lambda has two values in this version: small to
consider bias around the summit and large for the
surrounding area.
ChIP-seq analysis
 Check the output printed to the screen.

INFO   @ Thu, 29 Mar 2012 14:59:41: #1 tag size is determined as 35 bps
INFO   @ Thu, 29 Mar 2012 14:59:41: #1 tag size = 35
INFO   @ Thu, 29 Mar 2012 14:59:41: #1 total tags in treatment: 3909805
..
INFO   @ Thu, 29 Mar 2012 14:59:46: #2 Build Peak Model...
INFO   @ Thu, 29 Mar 2012 15:00:00: #2 number of paired peaks: 11861
INFO   @ Thu, 29 Mar 2012 15:00:00: #2 finished!
INFO   @ Thu, 29 Mar 2012 15:00:00: #2 predicted fragment length is 119 bps
INFO   @ Thu, 29 Mar 2012 15:00:00: #2.2 Generate R script for model : FoxA1_model.r
INFO   @ Thu, 29 Mar 2012 15:00:00: #3 Call peaks...
INFO   @ Thu, 29 Mar 2012 15:00:00: #3 shift treatment data
INFO   @ Thu, 29 Mar 2012 15:00:01: #3 merge +/- strand of treatment data
INFO   @ Thu, 29 Mar 2012 15:00:01: #3 call peak candidates
INFO   @ Thu, 29 Mar 2012 15:00:13: #3 shift control data
INFO   @ Thu, 29 Mar 2012 15:00:13: #3 merge +/- strand of control data
INFO   @ Thu, 29 Mar 2012 15:00:15: #3 call negative peak candidates
INFO   @ Thu, 29 Mar 2012 15:00:25: #3 use control data to filter peak candidates...
INFO   @ Thu, 29 Mar 2012 15:00:31: #3 Finally, 13591 peaks are called!
INFO   @ Thu, 29 Mar 2012 15:00:31: #3 find negative peaks by swapping treat and control
INFO   @ Thu, 29 Mar 2012 15:00:36: #3 Finally, 594 peaks are called!
ChIP-seq analysis
Output files

•FoxA1_model.r

• FoxA1_negative_peaks.xls

• FoxA1_peaks.bed

• FoxA1_peaks.xls

• FoxA1_summits.bed
ChIP-seq analysis
  MACS peak model

$R --vanilla < FoxA1_model.r
..
$evince FoxA1_model.pdf
ChIP-seq analysis
 FoxA1_peaks.xls

                                                                -
                                                            10*LOG10 fold_enri
  chr       start       end      length   summit    tags     (pvalue)   chment FDR(%)
chr1          858357    858641        285     128       6            51    13.93   4.09
chr1          998955    999229        275     106       9         74.39    18.28   0.26
chr1         1050021   1050286        266     154      13           152    52.23      0
chr1         1684288   1684577        290     176       9          89.7    32.14   0.01
chr1         1775031   1775371        341     270       6         51.08    16.71   4.06
chr1         1780682   1780965        284     183       6         61.17     19.9   1.45




FoxA1_negative_peaks.xls

                                                                        -
                                                                    10*LOG1 fold_enric
      chr      start       end      length   summit        tags     0(pvalue) hment
  chr1         7155010    7155530        521     311              9     61.64    44.47
  chr1        11265816   11266025        210     106              6     59.86    38.12
  chr1        18597004   18597307        304     188              8     66.25    31.77
  chr1        33412779   33412964        186      94              6     58.68    22.92
  chr1        33759125   33759514        390     234              9     62.88    19.77
  chr1        37102727   37102952        226     114              6     55.14    31.51
ChIP-seq analysis
 FoxA1_peaks.bed
 chr, start, end, peak id and score = -10*LOG10(pvalue)

   chr1          858356     858641 MACS_peak_1       51
   chr1          998954     999229 MACS_peak_2    74.39
   chr1         1050020    1050286 MACS_peak_3      152
   chr1         1684287    1684577 MACS_peak_4     89.7
   chr1         1775030    1775371 MACS_peak_5    51.08
   chr1         1780681    1780965 MACS_peak_6    61.17
   chr1         1923146    1923449 MACS_peak_7   164.87




FoxA1_summits.bed
chr, start, end, peak id and score = height of the summit


   chr1          858483     858484 MACS_peak_1       4
   chr1          999059     999060 MACS_peak_2       7
   chr1         1050173    1050174 MACS_peak_3      12
   chr1         1684462    1684463 MACS_peak_4       8
   chr1         1775299    1775300 MACS_peak_5       4
   chr1         1780863    1780864 MACS_peak_6       4
   chr1         1923347    1923348 MACS_peak_7      14
ChIP-seq analysis

$macs14 -t ../data/Treatment_tags.bed -c ../data/Input_tags.bed -ghs -n FoxA1 -w




-w option allows to create“wiggle” files for each
chromosome analyzed.

-B option creates “bedgraph” files.

-S option together with either –w or –B creates a single
huge file for the whole genome.

--space=NUM can be used for change the resolution of the
wiggle file
ChIP-seq analysis
Upload files in the UCSC genome browser
http://genome.ucsc.edu/index.html
ChIP-seq analysis
Upload files in the UCSC genome browser
http://genome.ucsc.edu/index.html
ChIP-seq analysis
Upload files in the UCSC genome browser
http://genome.ucsc.edu/index.html
ChIP-seq analysis
Upload files in the UCSC genome browser
http://genome.ucsc.edu/index.html
ChIP-seq analysis
Upload files in the UCSC genome browser
ChIP-seq analysis
Upload files in the UCSC genome browser
Peak example: chr22:20141500..20141987
ChIP-seq analysis
 Analyze histone modifications

 • Broader peaks
 • No clear shape (more summits)
 • The peak model is often impossible to create.

$macs14 -t ../data/ES.H3K27me3.bed –g mm --nomodel --nolambda -n H3K27me3




 • It is recommended to skip the model with the --nomodel
 option.
 • Since no control is available the comparison will be done
 against the sample background. It is recommended to skip
 the local background when you have no control and very
 broad peaks.
ChIP-seq analysis
Upload files in the UCSC genome browser
Peak example: chrX:47,922,749-47,926,228
ChIP-seq analysis
Galaxy platform
Soon a local installation at CRG!!!




                                      https://main.g2.bx.psu.edu/
ChIP-seq analysis
Bibliography:

• http://en.wikipedia.org/wiki/File:ChIP-sequencing.svg
•http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
• http://liulab.dfci.harvard.edu/MACS/
•
http://sourceforge.net/apps/mediawiki/gemlibrary/index.php?title=The_GEM_libra
ry
• http://bio-bwa.sourceforge.net/
•http://www.well.ox.ac.uk/project-stampy
•http://bowtie-bio.sourceforge.net/index.shtml
•http://genome.ucsc.edu/
•http://www.r-project.org/
•http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
•https://main.g2.bx.psu.edu/root

Contenu connexe

Tendances

Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platformsAllSeq
 
LUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis PipelineLUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis PipelineHai-Wei Yen
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachHong ChangBum
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pubsesejun
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Thomas Keane
 
rnaseq_from_babelomics
rnaseq_from_babelomicsrnaseq_from_babelomics
rnaseq_from_babelomicsFrancisco Garc
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisAdamCribbs1
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondAdamCribbs1
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 

Tendances (20)

NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
LUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis PipelineLUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis Pipeline
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pub
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
 
rnaseq_from_babelomics
rnaseq_from_babelomicsrnaseq_from_babelomics
rnaseq_from_babelomics
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
DNA_Services
DNA_ServicesDNA_Services
DNA_Services
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyond
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 

En vedette

Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009bosc
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Luca Cozzuto
 
XPRIME: A Novel Motif Searching Method
XPRIME: A Novel Motif Searching MethodXPRIME: A Novel Motif Searching Method
XPRIME: A Novel Motif Searching Methodrlpoulsen
 
MEMEs in the Classroom
MEMEs in the ClassroomMEMEs in the Classroom
MEMEs in the ClassroomMichael A.
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Bioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss researchBioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss researchJoaquin Dopazo
 
Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Shaojun Xie
 
Unit 9 - DNA, RNA, and Proteins Notes
Unit 9  - DNA, RNA, and Proteins NotesUnit 9  - DNA, RNA, and Proteins Notes
Unit 9 - DNA, RNA, and Proteins Notesasteinman
 
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challengesLex Nederbragt
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5BITS
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 

En vedette (20)

Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
 
XPRIME: A Novel Motif Searching Method
XPRIME: A Novel Motif Searching MethodXPRIME: A Novel Motif Searching Method
XPRIME: A Novel Motif Searching Method
 
6 motif and pattern
6   motif and pattern6   motif and pattern
6 motif and pattern
 
MEMEs in the Classroom
MEMEs in the ClassroomMEMEs in the Classroom
MEMEs in the Classroom
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Bioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss researchBioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss research
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
 
DNA Motif Finding 2010
DNA Motif Finding 2010DNA Motif Finding 2010
DNA Motif Finding 2010
 
Unit 9 - DNA, RNA, and Proteins Notes
Unit 9  - DNA, RNA, and Proteins NotesUnit 9  - DNA, RNA, and Proteins Notes
Unit 9 - DNA, RNA, and Proteins Notes
 
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 

Similaire à Macs course

Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...EL-Hachemi Guerrout
 
Maximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digestMaximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digestPythian
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
 
HW 5-RSAascii2str.mfunction str = ascii2str(ascii) .docx
HW 5-RSAascii2str.mfunction str = ascii2str(ascii)        .docxHW 5-RSAascii2str.mfunction str = ascii2str(ascii)        .docx
HW 5-RSAascii2str.mfunction str = ascii2str(ascii) .docxwellesleyterresa
 
20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysis20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysisYi-Feng Chang
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...RISC-V International
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
 
QoS In The Enterprise
QoS In The EnterpriseQoS In The Enterprise
QoS In The EnterprisePrivate
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemSneha Inguva
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Florian Lautenschlager
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...Junho Suh
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 

Similaire à Macs course (20)

Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
 
Maximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digestMaximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digest
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
HW 5-RSAascii2str.mfunction str = ascii2str(ascii) .docx
HW 5-RSAascii2str.mfunction str = ascii2str(ascii)        .docxHW 5-RSAascii2str.mfunction str = ascii2str(ascii)        .docx
HW 5-RSAascii2str.mfunction str = ascii2str(ascii) .docx
 
20141105 asfws-norx-slides
20141105 asfws-norx-slides20141105 asfws-norx-slides
20141105 asfws-norx-slides
 
20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysis20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysis
 
Thesis
ThesisThesis
Thesis
 
Thesis
ThesisThesis
Thesis
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
QoS In The Enterprise
QoS In The EnterpriseQoS In The Enterprise
QoS In The Enterprise
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 

Dernier

Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 

Dernier (20)

Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 

Macs course

  • 1. ChIP-seq analysis Luca Cozzuto BioinformaticsCore
  • 2. ChIP-seq analysis •ChIP-seq is the combination of chromatin immuno- precipitation with ultra-sequencing. • Allows to detect genomic portions bound by proteins such as: • Transcription factors •Histones • Polymerase II •…
  • 5. ChIP-seq analysis Starting the analysis. • Typically you will receive from 10 to 30 millions of raw reads per sample corresponding to a zipped file of 0.5-1.5 Gbytes. FASTQ format @HWUSI-EAS621:69:64EKPAAXX:3:1:11477:1265 1:N:0: @(HEADER) GAAACTTGAGGACTGCCCAGCTCGACAGACACTGGA (SEQUENCE) + +(HEADER) GEGGDGG@GGDGGGGGGGBDGGDG8GG@3D6:3:67 (QUALITY) The quality is encoded with a ASCII character and represents the Phred quality score. p = probability that that base call is incorrect Q = 20 means base call accuracy of 99%
  • 6. ChIP-seq analysis Starting the analysis. • It is strongly recommended to check the quality of the sequences we received before doing the analysis! Fastqc analysis
  • 7. ChIP-seq analysis Starting the analysis. Mapping by using ultra-fast mappers: • GEM • Bowtie • BWA • Stampy It is required to index the reference genome before doing the analysis.
  • 8. ChIP-seq analysis Peak calling – MACS Model-based Analysis of ChIP-Seq data. TF
  • 9. ChIP-seq analysis Peak calling – MACS Sequences from IP TF
  • 10. ChIP-seq analysis Peak calling – MACS Sequences from IP TF Sequenced tags on + strand - strand
  • 12. ChIP-seq analysis Peak calling – MACS Given a sonication size (bandwith) and a fold-enrichment (mfold), MACS slides 2*bandwidth windows across the genome to find regions enriched to a random tag genome distribution >= mfold (default between 10 and 30).
  • 13. ChIP-seq analysis Peak calling – MACS MACS select at least 1,000 “model peaks” for calculating the distance “d” between paired peaks.
  • 14. ChIP-seq analysis Peak calling – MACS How to determine if peaks are greater than expected by chance? •x = observed read number •λ= expected read number Probability to find a peak higher than x. Tag distribution along the genome could be modeled by a Poisson distribution.
  • 15. ChIP-seq analysis Peak calling – MACS Example: Tag count = 2 Number of reads = 30,000,000 Read length = 36 Mappable human genome = 2,700,000,000
  • 16. ChIP-seq analysis Peak calling – MACS Example: Tag count = 10 Number of reads = 30,000,000 Read length = 36 Mappable human genome = 2,700,000,000
  • 17. ChIP-seq analysis Peak calling – MACS • shifting each tag d/2 to the 3’ • sliding windows with 2*d length across the genome to detect the enriched regions (Poisson distribution p-value <= 1e-5). • Overlapping enriched regions are fused. • Summit of the peak is considered the putative binding site TF
  • 18. ChIP-seq analysis Peak calling – MACS In order to address local biases in the genome such as local chromatin structure, sequencing bias, genome copy number variation… MACS evaluates candidates peaks by comparing them against a “local” distribution. Fold enrichment = Enrichment over the λlocal
  • 19. ChIP-seq analysis Peak calling – MACS False Discovery Rate (FDR) is calculated as number of control peaks called / number of sample peaks. Control peaks are calculated by swapping control and sample. FDR is calculated only when a control is provided!
  • 20. ChIP-seq analysis Practical part
  • 21. ChIP-seq analysis Practical part Connect to the Etna machine by using ssh. • MAC or Linux users can do using this command $ ssh –X course@xxx.crg.es course@xxx.crg.es'spassword: Password:xxxxxxx • Windows users should first download Putty and PSCP programs and then use them for accessing that machine. http://goo.gl/4BWud
  • 22. ChIP-seq analysis course@xxx.crg.es Password:xxxxxx
  • 23. ChIP-seq analysis Different formats can be used as input files: BED, ELAND, SAM, BAM, BOWTIE and for paired ends ELAND-MULTIPET $ head ../data/Input_tags.bed chr1 233604 233639 0 2 - chr1 559767 559802 0 3 + chr1 742600 742635 0 2 + chr1 742600 742635 0 0 + chr1 744231 744266 0 0 + chr1 744307 744342 0 2 - chr1 746885 746920 0 2 + chr1 746958 746993 0 1 + chr1 748226 748261 0 2 + chr1 748357 748392 0 0 - Bed fields: chromosome name, start, end, name, score strand
  • 24. ChIP-seq analysis Launching MACS passing the sample, the control, the genome size (hs = homo sapiens) and the name $macs14 -t ../data/Treatment_tags.bed -c ../data/Input_tags.bed -ghs-n FoxA1
  • 25. ChIP-seq analysis Check the output printed to the screen. $macs14 -t ../data/Treatment_tags.bed -c ../data/Input_tags.bed -ghs -n FoxA1 INFO @ Thu, 29 Mar 2012 14:58:35: # ARGUMENTS LIST: # name = FoxA1 # format = AUTO # ChIP-seq file = ./Treatment_tags.bed # control file = ./Input_tags.bed # effective genome size = 2.70e+09 # band width = 300 # model fold = 10,30 # pvalue cutoff = 1.00e-05 # Small dataset will be scaled towards larger dataset. # Range for calculating regional lambda is: 1000 bps and 10000 bps INFO @ Thu, 29 Mar 2012 14:58:35: #1 read tag files... INFO @ Thu, 29 Mar 2012 14:58:35: #1 read treatment tags... INFO @ Thu, 29 Mar 2012 14:58:35: Detected format is: BED Regional lambda has two values in this version: small to consider bias around the summit and large for the surrounding area.
  • 26. ChIP-seq analysis Check the output printed to the screen. INFO @ Thu, 29 Mar 2012 14:59:41: #1 tag size is determined as 35 bps INFO @ Thu, 29 Mar 2012 14:59:41: #1 tag size = 35 INFO @ Thu, 29 Mar 2012 14:59:41: #1 total tags in treatment: 3909805 .. INFO @ Thu, 29 Mar 2012 14:59:46: #2 Build Peak Model... INFO @ Thu, 29 Mar 2012 15:00:00: #2 number of paired peaks: 11861 INFO @ Thu, 29 Mar 2012 15:00:00: #2 finished! INFO @ Thu, 29 Mar 2012 15:00:00: #2 predicted fragment length is 119 bps INFO @ Thu, 29 Mar 2012 15:00:00: #2.2 Generate R script for model : FoxA1_model.r INFO @ Thu, 29 Mar 2012 15:00:00: #3 Call peaks... INFO @ Thu, 29 Mar 2012 15:00:00: #3 shift treatment data INFO @ Thu, 29 Mar 2012 15:00:01: #3 merge +/- strand of treatment data INFO @ Thu, 29 Mar 2012 15:00:01: #3 call peak candidates INFO @ Thu, 29 Mar 2012 15:00:13: #3 shift control data INFO @ Thu, 29 Mar 2012 15:00:13: #3 merge +/- strand of control data INFO @ Thu, 29 Mar 2012 15:00:15: #3 call negative peak candidates INFO @ Thu, 29 Mar 2012 15:00:25: #3 use control data to filter peak candidates... INFO @ Thu, 29 Mar 2012 15:00:31: #3 Finally, 13591 peaks are called! INFO @ Thu, 29 Mar 2012 15:00:31: #3 find negative peaks by swapping treat and control INFO @ Thu, 29 Mar 2012 15:00:36: #3 Finally, 594 peaks are called!
  • 27. ChIP-seq analysis Output files •FoxA1_model.r • FoxA1_negative_peaks.xls • FoxA1_peaks.bed • FoxA1_peaks.xls • FoxA1_summits.bed
  • 28. ChIP-seq analysis MACS peak model $R --vanilla < FoxA1_model.r .. $evince FoxA1_model.pdf
  • 29. ChIP-seq analysis FoxA1_peaks.xls - 10*LOG10 fold_enri chr start end length summit tags (pvalue) chment FDR(%) chr1 858357 858641 285 128 6 51 13.93 4.09 chr1 998955 999229 275 106 9 74.39 18.28 0.26 chr1 1050021 1050286 266 154 13 152 52.23 0 chr1 1684288 1684577 290 176 9 89.7 32.14 0.01 chr1 1775031 1775371 341 270 6 51.08 16.71 4.06 chr1 1780682 1780965 284 183 6 61.17 19.9 1.45 FoxA1_negative_peaks.xls - 10*LOG1 fold_enric chr start end length summit tags 0(pvalue) hment chr1 7155010 7155530 521 311 9 61.64 44.47 chr1 11265816 11266025 210 106 6 59.86 38.12 chr1 18597004 18597307 304 188 8 66.25 31.77 chr1 33412779 33412964 186 94 6 58.68 22.92 chr1 33759125 33759514 390 234 9 62.88 19.77 chr1 37102727 37102952 226 114 6 55.14 31.51
  • 30. ChIP-seq analysis FoxA1_peaks.bed chr, start, end, peak id and score = -10*LOG10(pvalue) chr1 858356 858641 MACS_peak_1 51 chr1 998954 999229 MACS_peak_2 74.39 chr1 1050020 1050286 MACS_peak_3 152 chr1 1684287 1684577 MACS_peak_4 89.7 chr1 1775030 1775371 MACS_peak_5 51.08 chr1 1780681 1780965 MACS_peak_6 61.17 chr1 1923146 1923449 MACS_peak_7 164.87 FoxA1_summits.bed chr, start, end, peak id and score = height of the summit chr1 858483 858484 MACS_peak_1 4 chr1 999059 999060 MACS_peak_2 7 chr1 1050173 1050174 MACS_peak_3 12 chr1 1684462 1684463 MACS_peak_4 8 chr1 1775299 1775300 MACS_peak_5 4 chr1 1780863 1780864 MACS_peak_6 4 chr1 1923347 1923348 MACS_peak_7 14
  • 31. ChIP-seq analysis $macs14 -t ../data/Treatment_tags.bed -c ../data/Input_tags.bed -ghs -n FoxA1 -w -w option allows to create“wiggle” files for each chromosome analyzed. -B option creates “bedgraph” files. -S option together with either –w or –B creates a single huge file for the whole genome. --space=NUM can be used for change the resolution of the wiggle file
  • 32. ChIP-seq analysis Upload files in the UCSC genome browser http://genome.ucsc.edu/index.html
  • 33. ChIP-seq analysis Upload files in the UCSC genome browser http://genome.ucsc.edu/index.html
  • 34. ChIP-seq analysis Upload files in the UCSC genome browser http://genome.ucsc.edu/index.html
  • 35. ChIP-seq analysis Upload files in the UCSC genome browser http://genome.ucsc.edu/index.html
  • 36. ChIP-seq analysis Upload files in the UCSC genome browser
  • 37. ChIP-seq analysis Upload files in the UCSC genome browser Peak example: chr22:20141500..20141987
  • 38. ChIP-seq analysis Analyze histone modifications • Broader peaks • No clear shape (more summits) • The peak model is often impossible to create. $macs14 -t ../data/ES.H3K27me3.bed –g mm --nomodel --nolambda -n H3K27me3 • It is recommended to skip the model with the --nomodel option. • Since no control is available the comparison will be done against the sample background. It is recommended to skip the local background when you have no control and very broad peaks.
  • 39. ChIP-seq analysis Upload files in the UCSC genome browser Peak example: chrX:47,922,749-47,926,228
  • 40. ChIP-seq analysis Galaxy platform Soon a local installation at CRG!!! https://main.g2.bx.psu.edu/
  • 41. ChIP-seq analysis Bibliography: • http://en.wikipedia.org/wiki/File:ChIP-sequencing.svg •http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html • http://liulab.dfci.harvard.edu/MACS/ • http://sourceforge.net/apps/mediawiki/gemlibrary/index.php?title=The_GEM_libra ry • http://bio-bwa.sourceforge.net/ •http://www.well.ox.ac.uk/project-stampy •http://bowtie-bio.sourceforge.net/index.shtml •http://genome.ucsc.edu/ •http://www.r-project.org/ •http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ •https://main.g2.bx.psu.edu/root