1. ENCODE
Encyclopedia of DNA Elements
Outline
What and who is ENCODE
Key ENCODE topics and most important papers for our research
ENCODE data – make use of the encyclopedia…
Maté Ongenaert
2. What and who is ENCODE
Main aims, funding and the institutions/labs behind the 200 M $
Who?
International consortium
Funded by NHGRI – National Human Genome Research Institute
200 million dollar
Main collaborators (for human data)
Broad Institute (ChIP-seq)
HudsonAlpha Institute for Biotechnology (methylation)
Sanger Institute (RNA-seq)
Duke University (DNAse)
Yale University (Pol II)
EBI (data analysis)
Main aims
“Build a comprehensive parts list of functional elements in the
human genome, including elements that act at the protein and
RNA levels, and regulatory elements that control cells and
circumstances in which a gene is active”
3. What and who is ENCODE
Main aims, funding and the institutions/labs behind the 200 M $
What’s so hot… It has been running for years?
Started in 2007 – pilot project
1% of the genome
2007-2012
Since then, introduction of new technologies
Higher throughput
Genome-wide
Much more samples and different tissues (different ‘tiers’ – see
later)
Better data analysis and integration
4. What and who is ENCODE
Main aims, funding and the institutions/labs behind the 200 M $
What’s so hot… It has been running for years?
World wide press attention
5. What and who is ENCODE
Main aims, funding and the institutions/labs behind the 200 M $
What’s so hot… It has been running for
years?
World wide press attention…
and criticisms
“Popular” media focus on the “junk DNA
aspect”
The authors also claim in their press-
release that > 80% of the genome is
‘biologically active’ (<> may be involved
in regulation in one way or another <>
junk DNA)
ENCODE reveals for the fist time a lot of
factors of the very complex switching
board controlling expression / …
6. What and who is ENCODE
Main aims, funding and the institutions/labs behind the 200 M $
What’s so hot… It has been running for years?
30 (!) research papers published in three journals at the same time
7. ENCODE
Encyclopedia of DNA Elements
Outline
What and who is ENCODE
Key ENCODE topics and most important papers for our research
ENCODE data – make use of the encyclopedia…
8. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Key topics
Transcription factor binding motifs
Chromatin patterns at transcription factor binding sites
Characterization of intergenic regions and gene definitions
RNA and chromatin modification patterns around promoters
Epigenetic regulation of RNA processing
Non-coding RNA characterisation
DNA-methylation
Enhancer discovery and characterization
3D connections across the genome
Characterisation of network topology
Machine learning approaches to genomics
Impact of functional information on understanding variation
Impact of evolutionary selection on functional regions
9. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Main paper
95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction
Classifying the genome into seven chromatin states indicates an initial set of 399,124 regions
with enhancer-like features and 70,292 regions with promoter-like features
It is possible to correlate quantitatively RNA sequence production and processing with both
chromatin marks and transcription factor binding at promoters, indicating that promoter
functionality can explain most of the variation in RNA expression
Single nucleotide polymorphisms (SNPs) associated with disease by GWAS are enriched
within non-coding functional elements, with a majority residing in or near ENCODE-defined
regions that are outside of protein-coding genes. In many cases, the disease phenotypes
can be associated with a specific cell type or transcription factor
10. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Main paper
Techniques used:
RNA-seq
ChIP-seq
DNAse-seq
DNA-methylation arrays and bisulfite seq
FAIRE-seq
Tier 1: three cell lines (K652 – GM12878 – H1 hESC)
Tier 2: cell line panel (HeLa-S3 – HepG2 – HUVECs)
Tier 3 (all other cell types)
Total: 1640 datasets / 147 different cell types
12. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Main paper
95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction
Classifying the genome into seven chromatin states indicates an initial set of 399,124 regions
with enhancer-like features and 70,292 regions with promoter-like features
It is possible to correlate quantitatively RNA sequence production and processing with both
chromatin marks and transcription factor binding at promoters, indicating that promoter
functionality can explain most of the variation in RNA expression
Single nucleotide polymorphisms (SNPs) associated with disease by GWAS are enriched
within non-coding functional elements, with a majority residing in or near ENCODE-defined
regions that are outside of protein-coding genes. In many cases, the disease phenotypes
can be associated with a specific cell type or transcription factor
13. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Expression – chromatin state Expression – transcription factors
14. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Expression – transcription factors
15. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Chromatin state patterns at
transcription-factor binding
sites
16. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Co-association between transcription factors (K562)
17. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Insight in genomic variation – allele specific variation
18. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Main paper
95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction
Classifying the genome into seven chromatin states indicates an initial set of 399,124 regions
with enhancer-like features and 70,292 regions with promoter-like features
It is possible to correlate quantitatively RNA sequence production and processing with both
chromatin marks and transcription factor binding at promoters, indicating that promoter
functionality can explain most of the variation in RNA expression
Single nucleotide polymorphisms (SNPs) associated with disease by GWAS are enriched
within non-coding functional elements, with a majority residing in or near ENCODE-defined
regions that are outside of protein-coding genes. In many cases, the disease phenotypes
can be associated with a specific cell type or transcription factor
19. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Overlap SNPs with
regulatory elements
20. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Overlap SNPs with regulatory elements and ‘open’ chromatin
21. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Other important papers to us
22. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Accessible chromatin landscape
DNAseI treatment
Combined analysis with TFs and H3K4me3
Identification of “accessible” chromatin regions
23. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Accessible chromatin landscape – location of accessible regions
24. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Accessible chromatin landscape – association with ChIP-seq and TFs
25. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Accessible chromatin landscape – novel transcripts
26. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Other important papers to us
27. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Landscape of transcription
RNA-seq
Get a grip on what is transcribed, including novel transcripts and RNAs
28. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Landscape of transcription – nucleolar fraction vs. whole cell
29. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Landscape of transcription
30. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Other important papers to us
31. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Long-range interaction of promoters
5C mapping (chromatin interaction mapping technology)
Long-range interactions of promoter regions
32. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Long-range interaction of promoters
33. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Other important papers to us
34. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Other important papers to us
35. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Transcriptional regulation
ChIP-seq <> expression detection
Predict transcriptional regulation
36. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Transcriptional regulation – predict transcription
37. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Transcriptional regulation – expression prediction
38. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Transcriptional regulation – TFs predict location of histone modifications
39. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Transcriptional regulation – model
40. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Other important papers to us
41. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Cell-type specific gene expression from open chromatin regions
42. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Other important papers to us
43. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Cell-type specific TF binding
44. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Other important papers to us
45. Key ENCODE topics
Main ENCODE topics and selection of most important papers
SNPs in regulatory regions
46. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Other important papers to us
47. Key ENCODE topics
Main ENCODE topics and selection of most important papers
TF binding - interactions
48. Key ENCODE topics
Main ENCODE topics and selection of most important papers
TF binding – cell-type specificity
49. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Other important papers to us
50. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Classification of genomic regions
51. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Classification of genomic regions
52. Key ENCODE topics
Main ENCODE topics and selection of most important papers
Classification of genomic regions
53. ENCODE
Encyclopedia of DNA Elements
Outline
What and who is ENCODE
Key ENCODE topics and most important papers for our research
ENCODE data – make use of the encyclopedia…
54. ENCODE data
Data availability
Data availability
All data is available, from raw data to final processed data
For end-level users:
- Tracks in the UCSC browser with desired level of detail
Visualize tracks and explore genomic context
For end-level users and bio-IT:
- In UCSC “Table browser” and other UCSC tools
Export genomic information, including processed data
For high end-level users and Bio-IT:
- Raw data and semi-processed data in GEO and others
55. ENCODE data
Data availability
Tracks in the UCSC browser with desired level of detail
56. ENCODE data
Data availability
Tracks in the UCSC table browser