SlideShare une entreprise Scribd logo
1  sur  51
New insights into the human
   genome by ENCODE
What is a gene???




             • Union of genomic sequences encoding a
               coherent set of potentially overlapping
               functional products.
    ENCODE


                                                 (Gerstein et al., 2007)
Its been ten years since scientists sequenced the human
genome




But What do all these letters????????
21,000 genes
ENCODE- the Encyclopedia of
DNA Elements has ANSWERS



                              Aiming to
                              delineate all of
                              the functional
                              elements
                              encoded in the
                              human genome
                              sequence
ENCODE Consortium




         (The ENCODE Project Consortium, 2011)
Pilot Phase    • 2003-2007




               Technology
                             • 2007-2012
              development    • 30 papers
                 phase


                             Production
                               phase
Major methods

         Data production and
         initial analysis
         Accessing ENCODE
         data
ENCODE
         Working with ENCODE
         data

         Data analysis

         Limitations

         Threads – Nature
         explorer
Major Methods




      (The ENCODE Project Consortium, 2004)
Overall data flow




          (The ENCODE Project Consortium, 2011)
(The ENCODE Project Consortium, 2011)
RNA-seq – Isolation of RNA sequences followed by high-throughput
sequencing


CAGE – Capture of the methylated cap at the 5’end of RNA, followed
by high-throughput sequencing


RNA-PET – Simultaneous capture of RNAs with both a 5’methyl cap
and a poly(A) tail


ChIP-seq - Chromatin immunoprecipitation followed by sequencing


FAIRE-seq - Formaldehyde assisted isolation of regulatory
elements. Crosslinking, phenol extraction, and sequencing the DNA
fragments in the aqueous phase
(The ENCODE Project Consortium, 2011)
ENCODE cell types




          (The ENCODE Project Consortium, 2011)
ENCODE data production and initial analyses

•   Since 2007, ENCODE has developed methods and performed a large
    number of sequence-based studies to map functional elements across
    the human genome.

•   The elements mapped (and approaches used) include

      RNA transcribed regions (RNA-seq, CAGE, RNA-PET and manual
       annotation),

      Protein-coding regions (mass spectrometry),

      Transcription-factor-binding sites (ChIP-seq and DNase-seq),

      Chromatin structure (DNase-seq, FAIRE-seq, histone ChIP-seq),

      DNA methylation sites (RRBS assay)



                                       (The ENCODE Project Consortium, 2012)
Transcribed and protein-coding regions



•   In total, GENCODE-annotated exons of protein-coding genes cover 2.94% of the
    genome or 1.22% for protein-coding exons.

•   Protein-coding genes span 33.45% from the outermost start to stop codons, or
    39.54% from promoter to poly(A) site.

•   Additional protein-coding genes remain to be found.

•   In addition, they annotated 8,801 automatically derived small RNAs and 9,640
    manually curated long non-coding RNA (lncRNA) loci

•   The GENCODE annotated 11,224 pseudogenes



                                             (The ENCODE Project Consortium, 2012)
Process flow of experimental evaluation of
                               pseudogene transcription




Experimental validation
results showing the
transcription of pseudogenes
in different tissues




                                                          (Pei et al., 2012)
ENCODE gene and transcript annotations.




                     (The ENCODE Project Consortium, 2011)
RNA


•   They sequenced RNA from different cell lines and multiple
    subcellular fractions to develop an extensive RNA expression
    catalogue.



•   They used CAGE-seq (5’cap-targeted RNA isolation and
    sequencing) to identify 62,403 (TSSs) in tier 1 and2 cell types




                                      (The ENCODE Project Consortium, 2012)
A large majority of GENCODE elements are detected by
                     RNA-seq data




                                       (Djebali et al., 2012)
Protein bound regions


•   119 different DNA-binding proteins and a number of RNA
    polymerase components in 72 cell types using ChIP-seq



•   Overall, 636,336 binding regions covering 231 mega bases
    (8.1%) of the genome are enriched for regions bound by DNA-
    binding proteins across all cell types.




                                       (The ENCODE Project Consortium, 2012)
Occupancy of transcription factors and RNA
polymerase 2 on human chromosome 6p as
        determined by ChIP-seq
(The ENCODE Project Consortium, 2011)
DNase I hypersensitive sites and footprinting
•   Chromatin accessibility characterized by DNase I hypersensitivity
    is the hallmark of regulatory DNA regions.

•   2.89 million unique, non-overlapping (DHSs) by DNase-seq in 125
    cell types – lie distal to TSSs

•   In tier 1 and tier 2 cell types - 205,109 DHSs per cell type,
    encompassing an average of 1.0% of the genomic sequence in
    each cell type, and 3.9% in aggregate.




                                      (The ENCODE Project Consortium, 2012)
Density of DNase I cleavage sites for selected cell types




                                             (Thurman et al., 2012)
•   On average, 98.5% of the occupancy sites of transcription factors

    mapped by ENCODE ChIP-seq



•   Using genomic DNase I footprinting on 41 cell types they

    identified 8.4million distinct DNase I footprints




                                        (The ENCODE Project Consortium, 2012)
Regions of histone modification


 •   They assayed chromosomal locations for up to 12 histone
     modifications and variants in 46 cell types, across tier 1 and 2.




(http://www.factorbook.org)             (The ENCODE Project Consortium, 2012)
DNA methylation


•   They used reduced representation bisulphite sequencing (RRBS)
    to profile DNA methylation quantitatively for an average of 1.2
    million CpGs in each of 82 cell lines and tissues (8.6% of non-
    repetitive genomic CpGs), including CpGs in intergenic regions,
    proximal promoters and intragenic regions.




                                     (The ENCODE Project Consortium, 2012)
Proteomics


 To assess putative protein products generated from novel RNA
  transcripts and isoforms, proteins are sequenced and quantified
  by mass spectrometry and mapped back to their encoding
  transcripts.



 K562 and GM12878 – protein study begun




                                    (The ENCODE Project Consortium, 2011)
ENCODE chromatin annotations in the HLA
                locus




                     (The ENCODE Project Consortium, 2011)
Accessing ENCODE Data

ENCODE Data Release and Use Policy
•   The ENCODE Data Release and Use Policy is described at
    http://www.encodeproject.org/ENCODE/terms.html.



•   ENCODE data are released for viewing in a publicly accessible
    browser (initially at http://genome-preview.ucsc.edu/ENCODE
    and, after additional quality checks, at http://encodeproject.org)



Public Repositories

•   UCSC Genome Browser database (http://genome.ucsc.edu).




                                       (The ENCODE Project Consortium, 2011)
UCSC Portal
Working with ENCODE Data


Using ENCODE Data in the UCSC Browser

•   Many users will want to view and interpret the ENCODE data for
    particular genes of interest. At the online ENCODE portal
    (http://encodeproject.org), users should follow a ‘‘Genome
    Browser’’ link to visualize the data in the context of other genome
    annotations.




                                       (The ENCODE Project Consortium, 2011)
ENCODE Data Analysis


•   Development and implementation of algorithms and pipelines for
    processing and analyzing data - major activity of the ENCODE
    Project.




        • Short sequences
                                2nd Phase         •Integrating the
          are aligned to                           identified regions
          the reference     • Identifying the      of enriched signal
          genome              enriched regions
                                                   with each other
                                                   and with other
                                                   data types
               1st Phase                                3rd Phase

                                            (The ENCODE Project Consortium, 2011)
Analysis tools applied by the ENCODE
             consortium




                   (The ENCODE Project Consortium, 2011)
Integrating ENCODE with other projects and the
               Scientific Community

1. defining promoter and enhancer regions by combining transcript
   mapping and biochemical marks,



2. delineating distinct classes of regions within the genomic
   landscape by their specific combinations of biochemical and
   functional characteristics, and



3. defining transcription factor co-associations and regulatory
   networks.



                                      (The ENCODE Project Consortium, 2011)
•   ENCODE Project - interpretation of human genome variation that
    is associated with disease or quantitative phenotypes



•   Integrate with 1,000 Genomes Project - how SNPs and structural
    variation may affect transcript, regulatory and DNA methylation
    data



•   ENCODE - GWAS and other sequence variation driven studies of
    human phenotypes



    Major contributor not only of data but also novel technologies for
                    deciphering the human genome


                                       (The ENCODE Project Consortium, 2011)
Limitations of ENCODE Annotations


•   Cell types - physiologically and genetically inhomogeneous.

•   Local micro-environments in culture may also vary

•   Use of DNA sequencing to annotate functional genomic features is
    also constrained.

•   Considerable quantitative variation in the signal strength along
    the genome




                                       (The ENCODE Project Consortium, 2011)
Challenges

•   Adult human body contains several hundred distinct cell types

•   Each of which expresses a unique subset of the 1,800 TFs
    encoded in the human genome

•   Brain alone contains thousands of types of neurons that are likely
    to express not only different sets of TFs but also a larger variety
    of non-coding RNAs

•   A truly comprehensive atlas of human functional elements is not
    practical with current technologies




                                          (The ENCODE Project Consortium, 2011)
Outcome

•   Understanding of the human genome

•   The broad coverage of ENCODE annotations enhances our
    understanding of common diseases with a genetic component,
    rare genetic diseases

•   119 of 1,800 known transcription factors and 13 of more than 60
    currently known histone or DNA modifications across 147 cell
    types

•   Overall these data reflect a minor fraction of the potential
    functional information encoded in the human genome




                                    (The ENCODE Project Consortium, 2012)
http://www.nature.com/encode/#/threads
13 Threads
1.   Transcription factor motifs
2.   Chromatin patterns at transcription factor binding sites
3.   Characterization of intergenic regions and gene definition
4.   RNA and chromatin modification patterns around promoters
5.   Epigenetic regulation of RNA processing
6.   Non-coding RNA characterization
7.   DNA methylation
8.   Enhancer discovery and characterization
9.   Three-dimensional connections across the genome
10. Characterization of network topology
11. Machine learning approaches to genomics
12. Impact of functional information on understanding variation
13. Impact of evolutionary selection on functional regions
Schematic overview of the functional SNP
               approach




                                (Schaub et al., 2012)
Comparison of GWAS identified loci with
            ENCODE data
(Boyle et al., 2012)
Future goal


•   Mechanistic processes that generate these elements and how and
    where they function

•   Enlarge the data set to additional factors, modifications and cell
    types, complementing the other related projects

•   Constitute foundational resources for human genomics, allowing a
    deeper interpretation of the organization of gene and regulatory
    information and the mechanisms of regulation, and thereby
    provide important insights into human health and disease




                                      (The ENCODE Project Consortium, 2012)
Project is still far from complete
Conclusion




For update: https://www.facebook.com/ENCODEProject
Encode – assign word to letter
Thank you:)

Contenu connexe

Tendances

NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGAayushi Pal
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingPALANIANANTH.S
 
Genomic mapping, genetic mapping
Genomic mapping, genetic mappingGenomic mapping, genetic mapping
Genomic mapping, genetic mappingKAUSHAL SAHU
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)LOGESWARAN KA
 
Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Hamza Khan
 
15 molecular markers techniques
15 molecular markers techniques15 molecular markers techniques
15 molecular markers techniquesAVINASH KUSHWAHA
 
Microarray (DNA and SNP microarray)
Microarray (DNA and SNP microarray)Microarray (DNA and SNP microarray)
Microarray (DNA and SNP microarray)Hamza Khan
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNAmaryamshah13
 
Probe labeling
Probe labelingProbe labeling
Probe labelingAman Ullah
 
Structural genomics
Structural genomicsStructural genomics
Structural genomicsAshfaq Ahmad
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 
Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...
Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...
Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...Thermo Fisher Scientific
 
DNA Microarray introdution and application
DNA Microarray introdution and applicationDNA Microarray introdution and application
DNA Microarray introdution and applicationNeeraj Sharma
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Karan Veer Singh
 

Tendances (20)

NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Genomic mapping, genetic mapping
Genomic mapping, genetic mappingGenomic mapping, genetic mapping
Genomic mapping, genetic mapping
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 
Microarray
MicroarrayMicroarray
Microarray
 
Microarray CGH
Microarray CGHMicroarray CGH
Microarray CGH
 
Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 
15 molecular markers techniques
15 molecular markers techniques15 molecular markers techniques
15 molecular markers techniques
 
Microarray (DNA and SNP microarray)
Microarray (DNA and SNP microarray)Microarray (DNA and SNP microarray)
Microarray (DNA and SNP microarray)
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNA
 
Final ppt
Final pptFinal ppt
Final ppt
 
Probe labeling
Probe labelingProbe labeling
Probe labeling
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Genomics
GenomicsGenomics
Genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...
Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...
Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
 
DNA Microarray introdution and application
DNA Microarray introdution and applicationDNA Microarray introdution and application
DNA Microarray introdution and application
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,
 

Similaire à New insights into the human genome from ENCODE

New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12Ranjani Reddy
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingThomas Keane
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalENCODE-DCC
 
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONGI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONENCODE-DCC
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Dr. Mukesh Chavan
 
Dawn Field: the Genomics Standards Consortium (GSC)
Dawn Field: the Genomics Standards Consortium (GSC)Dawn Field: the Genomics Standards Consortium (GSC)
Dawn Field: the Genomics Standards Consortium (GSC)GigaScience, BGI Hong Kong
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked DataENCODE-DCC
 
Next generation sequences
Next generation sequencesNext generation sequences
Next generation sequencesBhanu Krishan
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
ApplicationNote-Brian-D-Gregory_1008V1
ApplicationNote-Brian-D-Gregory_1008V1ApplicationNote-Brian-D-Gregory_1008V1
ApplicationNote-Brian-D-Gregory_1008V1Jason Holzman
 
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...IBM India Smarter Computing
 

Similaire à New insights into the human genome from ENCODE (20)

New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-Editing
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
functional genomics.ppt
functional genomics.pptfunctional genomics.ppt
functional genomics.ppt
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE Portal
 
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONGI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...
 
Arrays
ArraysArrays
Arrays
 
Dawn Field: the Genomics Standards Consortium (GSC)
Dawn Field: the Genomics Standards Consortium (GSC)Dawn Field: the Genomics Standards Consortium (GSC)
Dawn Field: the Genomics Standards Consortium (GSC)
 
Synthetic Genome
Synthetic Genome Synthetic Genome
Synthetic Genome
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked Data
 
Next generation sequences
Next generation sequencesNext generation sequences
Next generation sequences
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
ApplicationNote-Brian-D-Gregory_1008V1
ApplicationNote-Brian-D-Gregory_1008V1ApplicationNote-Brian-D-Gregory_1008V1
ApplicationNote-Brian-D-Gregory_1008V1
 
Science Project Title
Science Project TitleScience Project Title
Science Project Title
 
DNA Barcoding
DNA BarcodingDNA Barcoding
DNA Barcoding
 
12 arrays
12 arrays12 arrays
12 arrays
 
12 arrays
12 arrays12 arrays
12 arrays
 
Microbial physiology in genomic era
Microbial physiology in genomic eraMicrobial physiology in genomic era
Microbial physiology in genomic era
 
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
 

Plus de Senthil Natesan

Centre of innovation, Agricultural College and Research Institute,Madurai
Centre of innovation, Agricultural College and Research Institute,MaduraiCentre of innovation, Agricultural College and Research Institute,Madurai
Centre of innovation, Agricultural College and Research Institute,MaduraiSenthil Natesan
 
wheat association mapping LTN
wheat association mapping LTNwheat association mapping LTN
wheat association mapping LTNSenthil Natesan
 
Paradigm shift in breeding for Sugarcane to Energycane – An exclusive biofuel...
Paradigm shift in breeding for Sugarcane to Energycane – An exclusive biofuel...Paradigm shift in breeding for Sugarcane to Energycane – An exclusive biofuel...
Paradigm shift in breeding for Sugarcane to Energycane – An exclusive biofuel...Senthil Natesan
 
The need for nutrient efficient rice varieties Status and prospects
The need for nutrient efficient rice varieties Status and prospectsThe need for nutrient efficient rice varieties Status and prospects
The need for nutrient efficient rice varieties Status and prospectsSenthil Natesan
 
Deployment of rust resistance genes in wheat varieties
Deployment of rust resistance genes in wheat varietiesDeployment of rust resistance genes in wheat varieties
Deployment of rust resistance genes in wheat varietiesSenthil Natesan
 
Genomics Assisted Breeding for Resilient Rice: Progress and Prospects
Genomics Assisted Breeding for Resilient Rice: Progress and ProspectsGenomics Assisted Breeding for Resilient Rice: Progress and Prospects
Genomics Assisted Breeding for Resilient Rice: Progress and ProspectsSenthil Natesan
 
COCONUT GENETIC RESOURCES CONSERVATION & UTILIZATION IN INDIA
COCONUT GENETIC RESOURCES CONSERVATION & UTILIZATION IN INDIACOCONUT GENETIC RESOURCES CONSERVATION & UTILIZATION IN INDIA
COCONUT GENETIC RESOURCES CONSERVATION & UTILIZATION IN INDIASenthil Natesan
 
Germplasm conservation in Oil Palm
Germplasm conservation in  Oil PalmGermplasm conservation in  Oil Palm
Germplasm conservation in Oil PalmSenthil Natesan
 
Improvement of Medicinal Plants: Challenges and Innovative Approaches
Improvement of Medicinal Plants: Challenges and Innovative ApproachesImprovement of Medicinal Plants: Challenges and Innovative Approaches
Improvement of Medicinal Plants: Challenges and Innovative ApproachesSenthil Natesan
 
Role of induced mutations in legume improvement-Dr.Souframanien
Role of induced mutations in legume improvement-Dr.SouframanienRole of induced mutations in legume improvement-Dr.Souframanien
Role of induced mutations in legume improvement-Dr.SouframanienSenthil Natesan
 
Towards improvement of oil content in safflower (Carthamus tinctorius L.)
Towards improvement of oil content in safflower (Carthamus tinctorius L.)Towards improvement of oil content in safflower (Carthamus tinctorius L.)
Towards improvement of oil content in safflower (Carthamus tinctorius L.)Senthil Natesan
 
New paradigm in Seed industry
New paradigm in Seed industryNew paradigm in Seed industry
New paradigm in Seed industrySenthil Natesan
 
Genomics platform for agriculture-CAT lecture
Genomics platform for agriculture-CAT lectureGenomics platform for agriculture-CAT lecture
Genomics platform for agriculture-CAT lectureSenthil Natesan
 
Castor database ; Casterdp
Castor database ; Casterdp Castor database ; Casterdp
Castor database ; Casterdp Senthil Natesan
 
Engineering fatty acid biosynthesis
Engineering fatty acid biosynthesisEngineering fatty acid biosynthesis
Engineering fatty acid biosynthesisSenthil Natesan
 
Cellular signal transduction pathways under abiotic stress
Cellular signal transduction pathways under abiotic stressCellular signal transduction pathways under abiotic stress
Cellular signal transduction pathways under abiotic stressSenthil Natesan
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by SequencingSenthil Natesan
 

Plus de Senthil Natesan (20)

Centre of innovation, Agricultural College and Research Institute,Madurai
Centre of innovation, Agricultural College and Research Institute,MaduraiCentre of innovation, Agricultural College and Research Institute,Madurai
Centre of innovation, Agricultural College and Research Institute,Madurai
 
Indian agriculture
Indian agriculture Indian agriculture
Indian agriculture
 
wheat association mapping LTN
wheat association mapping LTNwheat association mapping LTN
wheat association mapping LTN
 
Paradigm shift in breeding for Sugarcane to Energycane – An exclusive biofuel...
Paradigm shift in breeding for Sugarcane to Energycane – An exclusive biofuel...Paradigm shift in breeding for Sugarcane to Energycane – An exclusive biofuel...
Paradigm shift in breeding for Sugarcane to Energycane – An exclusive biofuel...
 
The need for nutrient efficient rice varieties Status and prospects
The need for nutrient efficient rice varieties Status and prospectsThe need for nutrient efficient rice varieties Status and prospects
The need for nutrient efficient rice varieties Status and prospects
 
Deployment of rust resistance genes in wheat varieties
Deployment of rust resistance genes in wheat varietiesDeployment of rust resistance genes in wheat varieties
Deployment of rust resistance genes in wheat varieties
 
Caster pollination
Caster pollination Caster pollination
Caster pollination
 
Genomics Assisted Breeding for Resilient Rice: Progress and Prospects
Genomics Assisted Breeding for Resilient Rice: Progress and ProspectsGenomics Assisted Breeding for Resilient Rice: Progress and Prospects
Genomics Assisted Breeding for Resilient Rice: Progress and Prospects
 
COCONUT GENETIC RESOURCES CONSERVATION & UTILIZATION IN INDIA
COCONUT GENETIC RESOURCES CONSERVATION & UTILIZATION IN INDIACOCONUT GENETIC RESOURCES CONSERVATION & UTILIZATION IN INDIA
COCONUT GENETIC RESOURCES CONSERVATION & UTILIZATION IN INDIA
 
Germplasm conservation in Oil Palm
Germplasm conservation in  Oil PalmGermplasm conservation in  Oil Palm
Germplasm conservation in Oil Palm
 
Improvement of Medicinal Plants: Challenges and Innovative Approaches
Improvement of Medicinal Plants: Challenges and Innovative ApproachesImprovement of Medicinal Plants: Challenges and Innovative Approaches
Improvement of Medicinal Plants: Challenges and Innovative Approaches
 
Role of induced mutations in legume improvement-Dr.Souframanien
Role of induced mutations in legume improvement-Dr.SouframanienRole of induced mutations in legume improvement-Dr.Souframanien
Role of induced mutations in legume improvement-Dr.Souframanien
 
Towards improvement of oil content in safflower (Carthamus tinctorius L.)
Towards improvement of oil content in safflower (Carthamus tinctorius L.)Towards improvement of oil content in safflower (Carthamus tinctorius L.)
Towards improvement of oil content in safflower (Carthamus tinctorius L.)
 
New paradigm in Seed industry
New paradigm in Seed industryNew paradigm in Seed industry
New paradigm in Seed industry
 
Genomics platform for agriculture-CAT lecture
Genomics platform for agriculture-CAT lectureGenomics platform for agriculture-CAT lecture
Genomics platform for agriculture-CAT lecture
 
Castor database ; Casterdp
Castor database ; Casterdp Castor database ; Casterdp
Castor database ; Casterdp
 
Engineering fatty acid biosynthesis
Engineering fatty acid biosynthesisEngineering fatty acid biosynthesis
Engineering fatty acid biosynthesis
 
Edible vaccine
Edible vaccineEdible vaccine
Edible vaccine
 
Cellular signal transduction pathways under abiotic stress
Cellular signal transduction pathways under abiotic stressCellular signal transduction pathways under abiotic stress
Cellular signal transduction pathways under abiotic stress
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 

Dernier

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 

Dernier (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 

New insights into the human genome from ENCODE

  • 1. New insights into the human genome by ENCODE
  • 2. What is a gene??? • Union of genomic sequences encoding a coherent set of potentially overlapping functional products. ENCODE (Gerstein et al., 2007)
  • 3. Its been ten years since scientists sequenced the human genome But What do all these letters????????
  • 5. ENCODE- the Encyclopedia of DNA Elements has ANSWERS Aiming to delineate all of the functional elements encoded in the human genome sequence
  • 6. ENCODE Consortium (The ENCODE Project Consortium, 2011)
  • 7. Pilot Phase • 2003-2007 Technology • 2007-2012 development • 30 papers phase Production phase
  • 8.
  • 9. Major methods Data production and initial analysis Accessing ENCODE data ENCODE Working with ENCODE data Data analysis Limitations Threads – Nature explorer
  • 10. Major Methods (The ENCODE Project Consortium, 2004)
  • 11. Overall data flow (The ENCODE Project Consortium, 2011)
  • 12. (The ENCODE Project Consortium, 2011)
  • 13. RNA-seq – Isolation of RNA sequences followed by high-throughput sequencing CAGE – Capture of the methylated cap at the 5’end of RNA, followed by high-throughput sequencing RNA-PET – Simultaneous capture of RNAs with both a 5’methyl cap and a poly(A) tail ChIP-seq - Chromatin immunoprecipitation followed by sequencing FAIRE-seq - Formaldehyde assisted isolation of regulatory elements. Crosslinking, phenol extraction, and sequencing the DNA fragments in the aqueous phase
  • 14. (The ENCODE Project Consortium, 2011)
  • 15. ENCODE cell types (The ENCODE Project Consortium, 2011)
  • 16. ENCODE data production and initial analyses • Since 2007, ENCODE has developed methods and performed a large number of sequence-based studies to map functional elements across the human genome. • The elements mapped (and approaches used) include  RNA transcribed regions (RNA-seq, CAGE, RNA-PET and manual annotation),  Protein-coding regions (mass spectrometry),  Transcription-factor-binding sites (ChIP-seq and DNase-seq),  Chromatin structure (DNase-seq, FAIRE-seq, histone ChIP-seq),  DNA methylation sites (RRBS assay) (The ENCODE Project Consortium, 2012)
  • 17. Transcribed and protein-coding regions • In total, GENCODE-annotated exons of protein-coding genes cover 2.94% of the genome or 1.22% for protein-coding exons. • Protein-coding genes span 33.45% from the outermost start to stop codons, or 39.54% from promoter to poly(A) site. • Additional protein-coding genes remain to be found. • In addition, they annotated 8,801 automatically derived small RNAs and 9,640 manually curated long non-coding RNA (lncRNA) loci • The GENCODE annotated 11,224 pseudogenes (The ENCODE Project Consortium, 2012)
  • 18. Process flow of experimental evaluation of pseudogene transcription Experimental validation results showing the transcription of pseudogenes in different tissues (Pei et al., 2012)
  • 19. ENCODE gene and transcript annotations. (The ENCODE Project Consortium, 2011)
  • 20. RNA • They sequenced RNA from different cell lines and multiple subcellular fractions to develop an extensive RNA expression catalogue. • They used CAGE-seq (5’cap-targeted RNA isolation and sequencing) to identify 62,403 (TSSs) in tier 1 and2 cell types (The ENCODE Project Consortium, 2012)
  • 21. A large majority of GENCODE elements are detected by RNA-seq data (Djebali et al., 2012)
  • 22. Protein bound regions • 119 different DNA-binding proteins and a number of RNA polymerase components in 72 cell types using ChIP-seq • Overall, 636,336 binding regions covering 231 mega bases (8.1%) of the genome are enriched for regions bound by DNA- binding proteins across all cell types. (The ENCODE Project Consortium, 2012)
  • 23. Occupancy of transcription factors and RNA polymerase 2 on human chromosome 6p as determined by ChIP-seq
  • 24. (The ENCODE Project Consortium, 2011)
  • 25. DNase I hypersensitive sites and footprinting • Chromatin accessibility characterized by DNase I hypersensitivity is the hallmark of regulatory DNA regions. • 2.89 million unique, non-overlapping (DHSs) by DNase-seq in 125 cell types – lie distal to TSSs • In tier 1 and tier 2 cell types - 205,109 DHSs per cell type, encompassing an average of 1.0% of the genomic sequence in each cell type, and 3.9% in aggregate. (The ENCODE Project Consortium, 2012)
  • 26. Density of DNase I cleavage sites for selected cell types (Thurman et al., 2012)
  • 27. On average, 98.5% of the occupancy sites of transcription factors mapped by ENCODE ChIP-seq • Using genomic DNase I footprinting on 41 cell types they identified 8.4million distinct DNase I footprints (The ENCODE Project Consortium, 2012)
  • 28. Regions of histone modification • They assayed chromosomal locations for up to 12 histone modifications and variants in 46 cell types, across tier 1 and 2. (http://www.factorbook.org) (The ENCODE Project Consortium, 2012)
  • 29. DNA methylation • They used reduced representation bisulphite sequencing (RRBS) to profile DNA methylation quantitatively for an average of 1.2 million CpGs in each of 82 cell lines and tissues (8.6% of non- repetitive genomic CpGs), including CpGs in intergenic regions, proximal promoters and intragenic regions. (The ENCODE Project Consortium, 2012)
  • 30. Proteomics  To assess putative protein products generated from novel RNA transcripts and isoforms, proteins are sequenced and quantified by mass spectrometry and mapped back to their encoding transcripts.  K562 and GM12878 – protein study begun (The ENCODE Project Consortium, 2011)
  • 31. ENCODE chromatin annotations in the HLA locus (The ENCODE Project Consortium, 2011)
  • 32. Accessing ENCODE Data ENCODE Data Release and Use Policy • The ENCODE Data Release and Use Policy is described at http://www.encodeproject.org/ENCODE/terms.html. • ENCODE data are released for viewing in a publicly accessible browser (initially at http://genome-preview.ucsc.edu/ENCODE and, after additional quality checks, at http://encodeproject.org) Public Repositories • UCSC Genome Browser database (http://genome.ucsc.edu). (The ENCODE Project Consortium, 2011)
  • 34. Working with ENCODE Data Using ENCODE Data in the UCSC Browser • Many users will want to view and interpret the ENCODE data for particular genes of interest. At the online ENCODE portal (http://encodeproject.org), users should follow a ‘‘Genome Browser’’ link to visualize the data in the context of other genome annotations. (The ENCODE Project Consortium, 2011)
  • 35. ENCODE Data Analysis • Development and implementation of algorithms and pipelines for processing and analyzing data - major activity of the ENCODE Project. • Short sequences 2nd Phase •Integrating the are aligned to identified regions the reference • Identifying the of enriched signal genome enriched regions with each other and with other data types 1st Phase 3rd Phase (The ENCODE Project Consortium, 2011)
  • 36. Analysis tools applied by the ENCODE consortium (The ENCODE Project Consortium, 2011)
  • 37. Integrating ENCODE with other projects and the Scientific Community 1. defining promoter and enhancer regions by combining transcript mapping and biochemical marks, 2. delineating distinct classes of regions within the genomic landscape by their specific combinations of biochemical and functional characteristics, and 3. defining transcription factor co-associations and regulatory networks. (The ENCODE Project Consortium, 2011)
  • 38. ENCODE Project - interpretation of human genome variation that is associated with disease or quantitative phenotypes • Integrate with 1,000 Genomes Project - how SNPs and structural variation may affect transcript, regulatory and DNA methylation data • ENCODE - GWAS and other sequence variation driven studies of human phenotypes Major contributor not only of data but also novel technologies for deciphering the human genome (The ENCODE Project Consortium, 2011)
  • 39. Limitations of ENCODE Annotations • Cell types - physiologically and genetically inhomogeneous. • Local micro-environments in culture may also vary • Use of DNA sequencing to annotate functional genomic features is also constrained. • Considerable quantitative variation in the signal strength along the genome (The ENCODE Project Consortium, 2011)
  • 40. Challenges • Adult human body contains several hundred distinct cell types • Each of which expresses a unique subset of the 1,800 TFs encoded in the human genome • Brain alone contains thousands of types of neurons that are likely to express not only different sets of TFs but also a larger variety of non-coding RNAs • A truly comprehensive atlas of human functional elements is not practical with current technologies (The ENCODE Project Consortium, 2011)
  • 41. Outcome • Understanding of the human genome • The broad coverage of ENCODE annotations enhances our understanding of common diseases with a genetic component, rare genetic diseases • 119 of 1,800 known transcription factors and 13 of more than 60 currently known histone or DNA modifications across 147 cell types • Overall these data reflect a minor fraction of the potential functional information encoded in the human genome (The ENCODE Project Consortium, 2012)
  • 43. 13 Threads 1. Transcription factor motifs 2. Chromatin patterns at transcription factor binding sites 3. Characterization of intergenic regions and gene definition 4. RNA and chromatin modification patterns around promoters 5. Epigenetic regulation of RNA processing 6. Non-coding RNA characterization 7. DNA methylation 8. Enhancer discovery and characterization 9. Three-dimensional connections across the genome 10. Characterization of network topology 11. Machine learning approaches to genomics 12. Impact of functional information on understanding variation 13. Impact of evolutionary selection on functional regions
  • 44. Schematic overview of the functional SNP approach (Schaub et al., 2012)
  • 45. Comparison of GWAS identified loci with ENCODE data
  • 46.
  • 47. (Boyle et al., 2012)
  • 48. Future goal • Mechanistic processes that generate these elements and how and where they function • Enlarge the data set to additional factors, modifications and cell types, complementing the other related projects • Constitute foundational resources for human genomics, allowing a deeper interpretation of the organization of gene and regulatory information and the mechanisms of regulation, and thereby provide important insights into human health and disease (The ENCODE Project Consortium, 2012)
  • 49. Project is still far from complete Conclusion For update: https://www.facebook.com/ENCODEProject
  • 50. Encode – assign word to letter

Notes de l'éditeur

  1. These analyses reveal that the human genome encodes a diversearray of transcripts. For example, in the proto-oncogene TP53locus, RNA-seq data indicate that, while TP53transcripts areaccurately assigned to the minus strand, those for the oppositelytranscribed, adjacent geneWRAP53emanate from the plus strand(Figure 3). An independent transcript within the first intron ofTP53is also observed in both GM12878 and K562 cells (Figure 3).
  2. Theupper portion shows the ChIP-seq signal of five sequence-specific transcription factors and RNA Pol2 throughout the 58.5 Mb of the short arm ofhuman chromosome 6 of the human lymphoblastoid cell line GM12878. Input control signal is shown below the RNA Pol2 data. At this level ofresolution, the sites of strongest signal appear as vertical spikes in blue next to the name of each experiment (‘‘BATF,’’ ‘‘EBF,’’ etc.).
  3. 116 kb segment of the HLA region is expanded; here, individual sites of occupancy can be seen mappingto specific regions of the three HLA genes shown at the bottom, with asterisks indicating binding sites called by peak calling software. Finally, thelower left region shows a 3,500 bp region around two tandem histone genes, with RNA Pol2 occupancy at both promoters and two of the fivetranscription factors, BATF and cFos, occupying sites nearby.
  4. They organized all the information associated with each transcription factor including the ChIP-seq peaks, discovered motifs and associated histone modification patterns in FactorBook (http://www.factorbook.org), a public resource that will be updated as the project proceeds.
  5. After curation and review at the Data Coordination Center, all processed ENCODE data are publicly released to the UCSC Genome Browser database (http://genome.ucsc.edu).
  6. Three differenttypes of regulatory data are represented for an area of the genome: motif-based predictions, DNase I hypersensitivity peaks, and ChIP-seq peaks. Thisregion contains six SNPs. SNP1 is associated with a phenotype in a genome-wide association study. SNP3 is an eQTL associated with changes in geneexpression in a different study. SNP6 overlaps a predicted motif, a DNase Ihypersensitivity peak, and a ChIP-seq peak. There are, therefore, multiplesources of evidence that SNP6 is in a regulatory region. Furthermore,SNP6 is in perfect linkage disequilibrium (r2=1.0) with SNP1 and SNP3,meaning that there is transitive evidence due to the LD that SNP6 is alsoassociated with the phenotype and is also an eQTL. SNP6 is therefore themost likely functional SNP in this associated region.
  7. Aggregate overlap of phenotypes to selected transcription-factor-binding sites (left matrix) or DHSsin selected cell lines (right matrix), with a count of overlaps between thephenotype and the cell line/factor. Values in blue squares pass an empiricalP-value threshold#0.01 (based on the same analysis of overlaps betweenrandomly chosen, GWAS-matched SNPs and these epigenetic features) andhave at least a count of three overlaps. ThePvalue for the total number ofphenotype–transcription factor associations is,0.001
  8. several SNPsassociatedwithCrohn’s disease andotherinflammatorydiseases that reside inalarge gene desert on chromosome 5, along with some epigenetic featuresindicative of function. The SNP (rs11742570) strongly associated to Crohn’sdisease overlaps a GATA2 transcription-factor-binding signal determined inHUVECs. This region is also DNase I hypersensitive inHUVECsandT-helperTH1 andTH2 cells. An interactive version of this figure is available in the onlineversion of the paper
  9. Users are able to interface with our database by entering lists of SNVs or regions to identify common SNVs at http://www.RegulomeDB.org/ (a). They are then presented with a sorted list of the most important SNVs (b). These SNVs can be examined for the evidence used to rank them as well as a citation for the evidence.
  10. Scientists in the Encyclopedia of DNA Elements Consortium have applied 24 experiment types (across) to more than 150 cell lines (down) to assign functions to as many DNA regions as possible — but the project is still far from complete