Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
CDAC 2018 Boeva analysis chromatin
1. ANALYSIS OF EPIGENETICS AND CHROMATIN
STATES IN NORMAL AND CANCER CELLS
Valentina BOEVA
Institut Cochin, Inserm U1016
2. Epigenetic profiles = combination of CpG
methylation of DNA and histone modifications
M. S. Yan et al, J. Appl. Physiol., 2010
-CH3
+ Information about the 3D structure of chromatin
2
-CH3
3. Relation between CpG methylation and
gene expression
3
Kapourani and Sanguinetti, Bioinformatics 2016
Cluster 1: Uniformly unmethylated; generally repressed
Cluster 2: U-shape profile, hypo-methylation around the TSS surrounded by hyper-methylation; high expression
Cluster 3: S-shape profile, hypo-methylated before TSS; intermediate expression
Cluster 4: hyper-methylated; repressed
Cluster 5: Reverse S-shape, profile hyper-methylated before TSS; intermediate expression
4. Bisulfite sequencing employed to detect
methylation status of Cytosine
• Bisulfite treatment transforms unmethylated
cytosine in uracil
4
5. RRBS (Reduced representation bisulfite sequencing)
– a cheap way to profile CpG methylation
• Using restriction enzyme targeting 5’CCGG3’
sequences
5
6. DNA methylation arrays
• Illumina Infinium MethylationEPIC array (850K) or
450K BeadChip
• Agilent 244K array
6
7. Visualization of the array data in the UCSC
genome browser
orange = methylated
(>= 60%)
purple = partially methylated
(20% < 60%)
bright blue = unmethylated
(<= 20%)
7
8. Epigenetic profiles = combination of CpG
methylation of DNA and histone modifications
M. S. Yan et al, J. Appl. Physiol., 2010
-CH3
+ Information about the 3D structure of chromatin
8
9. Histone modifications correlate with gene
transcription levels
• Histone modifications
Bhaumik et al, Nat Str & Mol Biol, 2007
Li et al, Cell, 2007
9
10. Histone modifications correlate with gene
transcription levels
H4K20me1H3K9acH3K9me3
Haitham Ashoor
Correlation of
different histone
marks with gene
expression
H3K27me3 H3K36me3 H3K79me2
TSS TSS TSS
TSS TSS TSS
Average density of histone modification signal
and have specific distribution around gene Transcription Start Sites (TSSs)
10
HeLa-S3 cell line
+30Kb-30Kb
11. With histone marks, one can predict gene
expression
ENCODE Project Consortium, Nature, 2012
R=0.9
11
12. ChIP-seq technique can provide information
about modifications of histone tails
Mains steps of ChIP-Seq technique:
12
ChIP-seq = chromatin
immunoprecipitation +
sequencing
13. ChIP-seq technique can provide information
about modifications of histone tails
Mains steps of ChIP-Seq technique:
35-100bp
Cluster of reads (peak) in the UCSC genome browser
13
Q?
14. Analysis of ChIP-seq data: density profile
calculation
chromosome
reads
putative fragments
density
4
2 binned density
We calculate the density both for the ChIP and control sample
0 .wig file
14
16. Peak calling: detection of coordinates of regions
enriched in a given histone mark
CLB-GA neuroblastoma cell line
ZMYZ1
H3K27ac
H3K27ac peaks
H3K4me3
H3K4me3 peaks
Active
promoter
Active
enhancer
~70kb
17. Histone modifications form groups and
indicate distinct chromatin states
• Histone modifications, histone variants, binding sites
(Pol II, CTCF, p300,…) chromatin states
ENCODE Project Consortium, Nature, 2012
17
18. Histone modifications form groups and
indicate distinct chromatin states
• Histone modifications, histone variants, binding sites
(Pol II, CTCF, p300,…) chromatin states
ENCODE Project Consortium, Nature, 2012
18
19. Histone modifications form groups and
indicate distinct chromatin states
• Histone modifications, histone variants, binding sites
(Pol II, CTCF, p300,…) chromatin states
ENCODE Project Consortium, Nature, 2012
19
20. Histone modifications form groups and
indicate distinct chromatin states
• Histone modifications, histone variants, binding sites
(Pol II, CTCF, p300,…) chromatin states
ENCODE Project Consortium, Nature, 2012
R Predicted repressed or low-activity region
T Predicted transcribed region
WE Predicted weak enhancer or open chromatin cis-regulatory element
E Predicted enhancer
CTCF CTCF-enriched element
PF Predicted promoter flanking region
TSS Predicted promoter region including TSS
7states
20
21. Histone modifications form groups and
indicate distinct chromatin states
Ernst & Kellis, Nature Biotechnology, 2010
Input chromatin
mark
information and
resulting
chromatin state
annotation for a
120-kb region of
human
chromosome 7
surrounding the
CAPZA2 gene
51states
22. How many states to select?
• Use your biological intuition
• Score each model based on the
log likelihood of the model
minus a penalization on the
model complexity determined
by the Bayesian Information
Criterion (BIC) of one-half the
number of parameters times the
natural log of the number of
intervals
"There are
three kinds of
lies: lies,
damned lies,
and
statistics."
Benjamin Disraeli
23. What is going on with chromatin states and
overall epigenetic profiles in cancer?
• Do epigenetic states change compared to
normal ancestral cells?
• Is there any global phenomenon related to
epigenetics present in cancer cells?
Lung cancer close-up. MOREDUN ANIMAL HEALTH LTD/SPL / Gettyimages
23
24. Histone and CpG-methyl modifying proteins
are often mutated or deleted in cancer
Timp & Feinberg, Nature Rev. Cancer, 2013
Epigenome-modifyinggenemutationsinhumancancer
24
Q?
More than 50% of human cancers
harbor mutations in enzymes that are
involved in chromatin organization
25. Changes in CpG methylation are common in
cancer
• Loss of imprinting (e.g. of IGF2)
• Hypermethylation of CpG islands of tumor
suppressor genes
• Genome-wide DNA hypomethylation
25
26. DNA methylation status can be associated
with tumor aggressiveness
Kaplan–Meier curves showing the correlation of pre-
biochemotherapy serum ER-α methylation status with OS
(p = 0.003)
ER-α methylation
Skin cancer
Kaplan–Meier survival curves of
biochemotherapy patients:
Correlation of pre-BC serum
RASSF1A methylation BM with
overall survival (p = .013).
RASSF1A methylation
Skin cancer
Mori et al, 2006; From Mori et al, 2005
26
27. DNA methylation status can define cancer subtypes
(and be associated with tumor aggressiveness)
The degree of DNA
methylation of 553
genes directly correlates
with poor prognosis in
ACCs CpG island methylator
phenotype
Non - CpG island
methylator phenotype
Barreau et al., J Clin Endocrinol Metab., 2013)
CpG island methylator phenotype in adrenocortical carcinomas
27
28. CpG island methylator phenotype (CIMP) can be associated
with good or poor prognosis in different cancers 28
Hughes et al., Cancer Research 2013
29. Cancer treatment with inhibitors of DNMTs 29
Survival stratified by
target gene methylation
status.
(Promoter methylation of
APC, CDH13, RASS1a, and
CDKN2a)
Juergens et al., CANCER DISCOVERY 2011
A phase I/II trial of combined epigenetic therapy with
azacitidine and entinostat, inhibitors of DNA methylation
and histone deacetylation, respectively, in extensively
pretreated patients with recurrent metastatic non–small
cell lung cancer.
30. LRES & LOCKs: Global changes in epigenetic
patterns in cancer
• Histone modification patterns are altered in
human tumors
– Gain of Long Range
Epigenetic Silencing
(LRES)
– Loss of Large
organized chromatin-
lysine-(K9)
modifications (LOCKs)
S.J. Clark, Hum. Mol. Genet., 2007
Hypothetical view of LRES in cancer
30
B. Wen, Hum. Nat. Genet., 2009
31. Example of epigenetic silencing of HOXD
gene cluster in bladder cancer
Cluster of HOXD genes repressed by epigenetic mechanisms (PRC2)
Enrichment in repressive histone mark H3K27me3
31
32. Cancer treatment with Ezh2 inhibitors 32
Knutson et al., PNAS 2013
EPZ-6438
Kim & Roberts, Nature Medicine, 2016
rhabdoid tumors
mutated SMARCB1
33. Creation of cancer specific super-enhancers
Super-enhancer
has high H3K27ac
33
Whyte et al., Cell 2013
34. Example: Detection of super-enhancer
regions using HMCan and LILY
34
H3K27ac profiles in NB and normal cells
Controls
Example: SE in PHOX2B in NB cell lines
NB cell line
NB cell line
NB cell line
NB cell line
Ashoor et al. 2013, Bioinformatics
Boeva et al. 2017, Nature Genetics
35. Creation of cancer specific super-enhancers
Super-enhancer
has high H3K27ac
35
36. Creation of cancer specific super-enhancers
Hnisz et al., Cell 2013
Super-enhancer
has high H3K27ac
36
Colorectal
cancer
37. Analysis of histone modification profiles can
suggest “epigenetic” treatment for cancer patients
Chipumuro et al, Cell, 2014
Neuroblastomas with MYCN-amplification have a
specific epigenetic profile (super-enhancers)
MYCN-amplified cells are sensitive to a
specific drug (CDK7-inhibitor)
Application of this
drug reduces tumor
volume
37
38. De novo enhancer creation or enhancer
hijacking
• T-cell acute lymphoblastic leukemia: somatic
mutations => binding motifs for MYB => a
super-enhancer upstream of the TAL1
oncogene
• Neuroblastoma: TERT activation via enhancer
hijacking
• Medulloblastoma: GFI1 family oncogenes
activation via enhancer hijacking
Mansour et al, Science, 2014
Northcott et al. Nature, 2014
Peifer et al., Nature, 2015
38
39. Rewiring of core regulatory circuitries (CRCs)
in cancer
In cancer:
39
Normal cell
Cancer cell
TFs gain/lose SEs
(+ Number of gene
copies change and
affect expression)
Cell identity change
Change in
transcriptional
networks
40. Rewiring of core regulatory circuitries (CRCs)
in cancer
• CRCs = set of TFs that autoregulate themselves
and define cell identity in normal cells
40
Saint-André et al, Genome Research, 2016
41. Rewiring of core regulatory circuitries (CRCs)
in cancer
• CRCs = set of TFs that autoregulate themselves
and define cell identity in normal cells
41
Saint-André et al, Genome Research, 2016
42. Summary
• DNA methylation & Histone modifications/histone variants have
direct effect on gene transcription
• Histone modifications form groups and indicate distinct
chromatin states
• Epigenetic profiles change in cancer compared to normal
ancestral cells (>30 epigenome-modifying proteins can be
mutated in different cancers; ½ cancers have at least one
mutation in a chomatin gene)
• These changes can be used to stratify patients and/or define
efficient ‘epigenetic’ drugs
• Discovery of genetic event associated with oncogenic epigenetic
changes may provide clinical markers for patient stratification
42
44. Mains steps of ChIP-Seq technique
+ Control (e.g., input DNA)
35-100bp
Valouev et al., Nat Methods 2008
44
>20M reads
45. Framework for the analysis of histone
modification profiles & TFBSs
• Nebula: web-service for analysis of ChIP-seq data
V. Boeva, A. Lermine et al, Bioinformatics, 2012
nebula.curie.fr
45
46. Nebula: web-service for analysis of ChIP-seq
data
• Peak calling
• Calculation of the density and cumulative distribution of peak locations
relative to gene transcription start sites
• Annotation of peaks with genomic features and genes with peak
information
0.00.20.4
0 0.5 1 1.5 2
down-regulated
no-response
up-regulated
Distance from TSS (Kb)
Proportionofgeneswithapeak
atagivendistance(cumulative)
-2000 -1000 0 1000 2000
2e-076e-07
ChIP
Control
Distance from TSS (bp)
Proportionofgeneswithapeak
atagivendistance(density)
Enh. Prom. Imm.Down. Intrag. GeneDown. F.Intron Exons 2,3,etc.Introns E.I.Junctions
Proportionofgeneswithapeak
0.00.10.20.30.40.5
down-regulated
no-response
up-regulated
Control
10 20 30 40 50
110010000
Peak height
Peakcount
ChIP
Control
GeneDown. Enh. Imm.Down. Interg. Intrag. Prom.
Proportionofpeaks
0.00.10.20.30.4
ChIP
Control
D E
CBA
Some graphs produced produced by Nebula
V. Boeva, A. Lermine et al, Bioinformatics, 2012
46
47. There is another Nebula instance
(https://galaxy-public.curie.fr/)
48. There are other ChIP-Seq tool boxes
(http://cistrome.org/ap/)
49. Read alignment: .fastq format 49
• Illumina or SOLiD row data format: . fastq
A quality value Q is an integer mapping of p
(i.e., the probability that the corresponding
base call is incorrect).
Phred quality score:
𝑄 = −10 log10(𝑝)
10 corresponds to probability of error = 0.1
20 corresponds to probability of error = 0.01
30 corresponds to probability of error = 0.001
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
Sanger
Phred+64
Phred+33
@HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
TTAATTGGTAAATAAATCTCCTAATAGCTTAGATNTTACCTTNNNNNNNNNN
+HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
efcfffffcfeefffcffffffddf`feed]`]_Ba_^__[YBBBBBBBBBB
@3_36_77_R17C1
T23031.313.20222.2.0220222.2.2.22002.2.2222222..222
+
'/%&/!&'#!#%##&!%!%$&#%##!#!#!$##$&!#!%*##'%,!!(#)
50. Read alignment to the reference genome
• Any tool will be OK:
– BWA
– Bowtie
– GEM
– Novoalign
50
ACTGATGCGATGCATGCGATGCTGCATTACGGCATGCTAGCTAGCTGCAGTAGATCGCA
ATGCTGCATTACGGA
Read (50-150bp)
Genome (30Mb-3Gb)
51. SAM and BAM (binary SAM)
• SAM = Sequence Alignment/Map
51
57. There is > a dozen tools to detect read
clusters (or peaks)
HMCan
GLITR
F-Seq
SICER
FindPeaks
QuEST
PeakSeq
Spp
MACS
ERANGE
Useq
SiSSRs
57
CCAT
FindPeaks
MACS2
ZINBA
HMCan
BayesPeak
SICER
MOSAiCS
CisGenome
MUSIC
MACS
BroadPeak
TFs and narrow histone marks:
Narrow and/or broad histone marks:
58. There exist two main methods to construct
peaks
Read clusters Peaks
Tag extension
Fragmentcount
Adopted from S. Pepke et al., 2009 Nat Methods
+ different statistical methods to eliminate ‘false’ peaks (low or short peaks)
two ways
58
59. Quality measures: ChIP-seq signal-to-noise
ratio
From the ENCODE consortium:
Fraction of reads in peaks (FRiP):
FRiP = Npeak/Nnonred
Npeak is the number of reads falling within peak regions
Cross-correlation profiles (CCPs):
- Normalized strand coefficient NSC = Cfrag/Cmin
- Relative strand correlation
RSC = (Cfrag − Cmin)/(Cread − Cmin)
where Cmin is the minimum CC observed; Cfrag is CC
corresponding to the fragment length; Cread is CC
corresponding the read length.
59
FRiP>1%
NSC ≥ 1.05 and an RSC ≥ 0.8
60. Quality measures: irreproducible discovery
rate
The irreproducible discovery rate (IDR) assesses the rank
consistency of common peaks between two replicates.
Based on a copula mixture model, IDR estimates the
reproducibility of each peak pair, and reports the expected rate
of irreproducible discoveries in the obtained peaks in a similar
way to the FDR.
Package 'idr' at CRAN-R
60
62. Primary analysis of ChIP-seq data
• Peak calling
• Differential peak calling (Condition 1 vs 2)
• Detection of chromatin states
• Super-enhancer calling
62
63. One should apply specific methods to detect
histone modifications in cancer
• Specific feature of cancer samples: large copy
number changes
Lung Adenocarcinoma 24 color karyotype
63
64. Standard methods for signal detection can
miss signal in regions of loss in cancer
Copy number profile
MACS
SICER
H3K27me3peaks
Position along chr8
Peaks predicted by tools:
Zhang,Y. et al. (2008) Genome Biol., 9,
R137
Zang,C. et al. (2009) Bioinformatics, 25,
1952–1958.
chr8
64
65. Solution: explicit normalization for copy
number status
• Hidden Markov model after correction of ChIP-seq
signal for copy number and GC-content bias
H. Ashoor et al, Bioinformatics, 2013
Software:
HMCan
www.cbrc.kaust.
edu.sa/hmcan
65
66. HMCan uses FREEC’s algorithm for
annotation of copy number alterations
Copy number profile for Hela-S3 cell line obtained using the
Input data (ENCODE dataset)
V. Boeva et al, Bioinformatics, 2011
66
67. Peaks predicted by HMCan do not show
copy number bias
H. Ashoor et al, Bioinformatics, 2013
67
Copy number
HMCan
MACS
SICER
69. HMCan-diff: a method to detect changes in histone
marks in cells with different genetic backgrounds
69
H. Ashoor et al, Submitted
data simulated without
copy number bias
data simulated with
copy number bias
70. HMCan-diff: a method to detect changes in histone
marks in cells with different genetic backgrounds
70
H. Ashoor et al, Submitted
• Library size correction
• GC-content correction
• Copy number correction
• Variable signal-to-noise ratio
correction
• Iterative Hidden Markov Models
71. ChIP-seq post-processing methods: calling
chromatin states
• ChromHMM and Segway were developed to
systematically identify the specific combination
patterns of histone modifications as a chromatin
state
71
R Predicted repressed or low-activity region
T Predicted transcribed region
WE Predicted weak enhancer or open chromatin cis-regulatory element
E Predicted enhancer
CTCF CTCF-enriched element
PF Predicted promoter flanking region
TSS Predicted promoter region including TSS
7states
72. Definition of Super-enhancers using
H3K27ac read counts
ROSE
Enhancer rank
H3K27acreadcount(ChIP-Input)
Super-enhancers
Enhancers
73. For super-enhancer calling in cancer data:
use LILY
73
Without copy number correction With copy number correction
LILYROSE
V. Boeva et al, Nature Genetics, 2017
74. Summary of the Methods part
• Main steps of the primary ChIP-seq data
analysis:
– Alignment of reads
– Peak calling
– Quality controls
– Annotation of peaks according to genes
• Possible next steps:
– ROSE or LILY for H3K27ac
– ChromHMM for chromatin states
74
Notes de l'éditeur
Sequence pattern influence the epigenetic landscape, and through this influence determine the functional regions of the genome
Sequence pattern influence the epigenetic landscape, and through this influence determine the functional regions of the genome
Direct effect or just correlation?
Interestingly, these histone marks form groups. For example, in the active promoter region, we find
In cancer, we observed that these epigenetic profiles can be changed..
21 December 1804 – 19 April 1881; was a British Conservative politician and writer, who twice served as Prime Minister
A small star (but it is actually a big, quickly developing field)
amira
The irreproducible discovery rate (IDR) framework for assessing reproducibility of ChIP-seq data sets. (A–C) Reproducibility analysis for a pair of high-quality RAD21 ChIP-seq replicates. (D,E) The same analysis for a pair of low quality SPT20 ChIP-seq replicates. (A,D) Scatter plots of signal scores of peaks that overlap in each pair of replicates. (B,E) Scatter plots of ranks of peaks that overlap in each pair of replicates. Note that low ranks correspond to high signal and vice versa. (C,F) The estimated IDR as a function of different rank thresholds. (A,B,D,E) Black data points represent pairs of peaks that pass an IDR threshold of 1%, whereas the red data points represent pairs of peaks that do not pass the IDR threshold of 1%. The RAD21 replicates show high reproducibility with ∼30,000 peaks passing an IDR threshold of 1%, whereas the SPT20 replicates show poor reproducibility with only six peaks passing the 1% IDR threshold.
Recalucated ENCODE cancer datasets to remove copy number bias. Continue working on replicate and differential binding