SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
What’s Beyond the Finished
Human Genome Sequence?
Abstract
The Human Genome Project is a project that has changed the course of biomedicine and biotechnology, since its
conception in 1991, and its ‘completion’ in 2003. With an effervescent scientific community its reach has spread far
and wide, with a number of projects spawning from the initial results. Over the past decade research has built upon
the foundations of this project, and is now reaching a point of completing genomic analysis. From the data collected,
research is now looking to moving forward in to a more understudied area of science, an area that has a lot of untapped
potential…the proteome.
Baby Steps for Big Science
The year is 1990, the U.S. Department of
Environmental Energy (DOE), and the National Institute
of Health (NIH) have invested $3-billion into a project
that was to completely alter the course of modern
science. It is at this point that The Human Genome
Project (HGP) sprung into life. ‘The Five-Year Plan’ was
proposed members of congressional appropriations
committees, detailing the project aims from 1991-1995
(Human Genome Program, 1990). Those five years
encompassed the establishment of the building blocks
that would form the project core; the human gene
mapping repository (Pearson, 1991), and the
International IMAGE Consortium (Lennon, et al., 1996).
This was followed by the completion of the first round
of genetic mapping, and the publication of a number of
high- and moderate-resolution physical maps of
chromosomes 16, 19, 3, 11, 12, and 22 (Ashworth, et
al., 1995; Bell, et al., 1995; Doggett, et al., 1995;
Gemmill, et al., 1995; Krauter, et al., 1995;
Quackenbush, et al., 1995).
From 1996-2003 the sequencing of the genome
progressed in leaps and bounds, with the high-
resolution mapping of chromosomes 7 and X in early
1997 (Bouffard, et al., 1997; Nagaraja, et al., 1997).
These successes were quickly followed up in 1998 by
the DOE and NIH revealing another 5-year plan, aiming
to complete the HGP by 2003 (Collins, et al., 1998), as
well as publication of GeneMap’98, doubling the
number of known genes (Deloukas & et-al., 1998).
1999 encompassed the complete sequencing and
analysis of chromosome 22 (Dunham & et-al., 1999),
the creation of the first public SNP consortium by major
drug firms (Thorisson & Stein, 2003), and a milestone
in sequencing the first 1-billion base pairs.
Come 25th
June 2000, President Clinton and HGP
leaders announced the completion of a working draft
sequence of the human genome at “a historic White
House event” (The White House, 2000), later published
by the International Human Genome Sequencing
Consortium, and Celera Genomics in the February
(Consortium, 2001; Venter & et-al., 2001). Between
late 2001 and early 2003 saw the publications of the
SNP map of the human genome (International SNP
Map Working Group, 2001), as well as the complete
sequence analysis of chromosome 14 (Heilig, et al.,
2003).
The completion of the 13-year project was announced
in the April of 2003, with the genome being sequenced
to 99.9% accuracy (National Human Genome Research
Institute, 2003). The results obtained from the HGP
have acted as the basis of all human genome studies of
the past 11 years.
Main Body (1000)
Chromosomes, and Chromatin- Unzipping Your Genes
The first four years after completion of the HGP (2003-
2006) saw heavy focus on complete sequencing and
analysis of the remaining 19 un-sequenced
chromosomes (Table 1). The research painted an
interesting picture of genetic disease. Across the 23
human chromosomes, 16 possess links to a number of
diseases, including cancers, and hereditary syndromes
(Mungall, et al., 2003; Deloukas, et al., 2004; Dunham,
et al., 2004; Humphray, et al., 2004; Ross, et al., 2005;
Gregory, et al., 2006; Zody, et al., 2006). As well as this,
it has given insight into the gene, and chromatin
distribution across the 21 autosomes, with
chromosome 19 possessing the highest gene density in
the genome (Grimwood, et al., 2004), and
chromosome 18 the lowest (Nusbaum, et al., 2005).
However, there were still holes in the map of the
human genome, in the form of euchromatic sequences,
with 341 gaps being unaccounted for 10 years ago (The
International Genome Sequencing Consortium, 2004).
In recent years this numbers has reduced to 160
(Genovese, et al., 2013), but still to this date structural
variation is poorly understood. However, by single-
molecule sequencing, it has been implied that there is
a 3:1 insertional bias corresponding to complex
insertions, and long short tandem repeats (Chaisson, et
al., 2014), suggesting greater complexity of the human
genome than first thought. Although, there is potential
for these to be resolved by used of longer-read
sequencing technology.
Decoding ENCODE
Sequencing of the 23 human chromosomes initiated a
system of cataloguing of most human genes encoding
proteins, and other important elements (e.g. non-
coding regulatory RNAs) (Venter & et-al., 2001). This
list has become a keystone in systems biology, giving
insight into how different structures/systems are
connected, their dynamics, and how function relates to
this (Hood, 2008). This catalogue manifested itself in
the form of the ENCODE (Encyclopedia Of DNA
Elements) Project, launched in 2003, looking to identify
all functional elements in the human genome sequence
(The ENCODE Project Consortium, 2004). The pilot
project focused on a 30 megabase region of the human
genome sequence, the results of which were published
in mid-2007, with a number of revelations as to
microRNA transcripts within the genome (Saini, et al.,
2007). Primarily it was implied that the human genome
was pervasively transcribed, with many novel non-
protein-coding regions (The ENCODE Project
Consortium, 2007; Alexander, et al., 2010). However, it
a number of non-coding regions produce thousands of
specifically-regulated lincRNAs enriched for trait-
associated Single Nucleotide Polymorphisms (SNPs)
(Hangaur, et al., 2013), as well as offering landing spots
for proteins that influence gene activity (Pennisi, 2012).
These discoveries allow for interrogation of newly-
discovered novel intergenic functional elements, such
as loss of function alleles and variants in non-essential
genes (MacArthur, et al., 2012).
SNP-ing Hap Hazardly
A map published in 2001 indicated that the human
genome possesses roughly 1.42 million SNPs. Roughly
60,000 falling within exon regions (coding and UTRs),
85% of which are within 5kb of the nearest SNP
(Sachidanandam, et al., 2001). From initial
identification of these commonly-occurring DNA
sequence variations, a number of associations have
been made with them, including; population diversity,
individuality, disease susceptibility, and individual
response to medicine (Shastry, 2002). The
International HapMap Project was founded in 2003 as
a means to determine sequence variation within the
human genome, using populations, including those in;
Africa, Asia, and Europe (The International HapMap
Consortium, 2003). By October, the HapMap project
had produced a haplotype map of the entire genome
identifying complete genotypes for over a million of
the identified SNPs, including DNA variations across 4
populations (The International HapMap Consortium,
2005). The results obtained have helped significantly in
moving forward research in genetically inherited
diseases (Manolio, et al., 2008), cancers (Hung, et al.,
2008; Cao, et al., 2013), and syndromes (Nezos, et al.,
2014; Vattikuti, et al., 2012), allowing for identification
of loci that heavily influence their manifestation. A
Phase II HapMap was generated in 2007, doubling the
number of known SNPs. The results also showed novel
aspects of linkage disequilibrium and that 10-30% of
pairs of individuals within a population share at least
one region of recent-ancestry extended genetic
identity, with 1% of common variants being untaggable
(The International HapMap Consortium, 2007).
Evidence has also shown that certain races show
susceptibility loci for particular diseases, dictated by
haplotype, and copy-number variations (Jakobsson, et
al., 2008). For example Asians show increased
expression of SNP rs9485372 near the TAB2 gene,
increasing breast cancer susceptibility (Long, et al.,
2012). In addition to SNPs, deletion polymorphisms in
the human genome have been determined using this
database. By analysing SNP genotype data form
parent-offspring trios, high-resolution population
surveys of deletion polymorphisms were produced.
The results from the experiment identified 586 distinct
regions harbouring deletion polymorphisms, 278
observed in unrelated individuals (Conrad, et al., 2006;
McCarroll, et al., 2006).
Chromosome
Identity
Initial Sequence
& Analysis Primary Findings Additional Details
1 2006
3,141 protein-encoding genes, and
991 pseudogenes
Mutations and rearrangements prevalent in cancer and many other diseases.
2 2005
1,346 protein-coding genes, and
1,239 pseudogenes
Unique to human lineage; product of head-to-head fusion of two intermediate-sized
ancestral chromosomes
3 2006
1,425 protein-encoding genes, 8
novel genes, 27 novel transcripts, 3
putative genes, and 122 pseudogenes
Comprises of just four contigs
Lowest rate of segmental duplication in the genome
Chemokine receptor gene cluster
Numerous loci involved in human cancers, e.g. gene encoding FHIT
4 2005
796 protein-coding genes, and 778
pseudogenes
Genes associated with; Huntington’s disease, Wolf-Hirschhorn syndrome, polycystic
kidney disease, and muscular dystrophy
5 2004
923 manually curated protein-coding
genes
One of the largest human chromosomes, yet lowest gene density
Encode protocadherin and interleukin gene families
6 2003
1,557 genes identified, and 633
pseudogenes
(6% of genome)
Genes directly implicated in; cancer, schizophrenia, autoimmunity, and many more.
7 2003
1,150 protein-coding genes, and
additional 941 pseudogenes
Unusual amount of segmentally duplicated sequence
8 2006
793 protein-encoding genes, and 301
pseudogenes
15Mb region on distal 8p with high mutation rate, possessing genes related to;
innate immunity, the nervous system, and MCPH1 gene cluster
9 2004 1,149 genes, and 426 pseudogenes
identified
Largest autosomal block of heterochromatin
Genes implicated in; mate-to-female sex reversal, cancer, and neurodegenerative
disease
10 2004
816 protein-coding genes, 430
pseudogenes identified
Identified 67 antisense transcripts
PTEN tumour suppressor, and RET proto-oncogene identified
11 2006 1,524 protein-coding genes, and 765
pseudogenes
40% of olfactory receptor genes in human genome located in 28 gene clusters along
chromosome.
85 genes encode for disorders.
12 2006
1,400 coding genes, and 487 loci that
have direct implications with human
disease
The q arm contains one of the largest linkage disequilibrium in entire human
genome
13 2004
633 genes identified, 296
pseudogenes, and 105 putative non-
coding RNA genes
Genes of interest; BCRA2 gene, RB1 gene
DAOA locus (bipolar + schizophrenia)
14 2003
1,050 genes identified, 393
pseudogenes
Two loci of crucial importance for immune system.
>60 disease genes localised
15 2006
695 protein-encoding genes, and 250
pseudogenes
High rate of segmental duplication, can result in; Prader-Willi and Angelman
syndromes.
16 2004
880 protein-coding genes, 19 RNA
transfer genes, 341 pseudogenes,
and 3 RNA pseudogenes
Genes identified; metallothionein, cadherin, and Iroquois families
Disease genes; polycystic kidney disease, acute myelomonocytic leukaemia.
17 2006 1,226 protein-encoding genes, and
274 pseudogenes
Second highest gene density in genome.
Implicated in wide range of genetic diseases, including; BRCA1, NF1, TP53, NAHR,
HNPP, SMS, and CMT1A
18 2005
377 protein-encoding genes, and 171
pseudogenes
Lowest gene density of any human chromosome.
Number of genetic disorders from trisomy and aneuploidy of gene
19 2004 1,461 protein-coding genes, and 321
pesudogenes
Highest density of all human chromosomes
Genes implicated in; Mendelian disorders (hypercholesterolaemia, insulin-resistant
diabetes)
20 2001
727 protein-encoding genes, and 168
pseudogenes
Genes encoding; protease inhibitors with antibacterial and antiviral activities,
reproductive proteins SEMG1+2
21 2000
127 protein-encoding genes, 98
predicted genes, and 59 pseudogenes
Several anonymous loci for monogenic disorders and predispositions for common
complex disorders mapped. Loss of heterozygosity observed in regions associated
with solid tumours.
X 2005
99 protein-encoding genes, 113 X-
linked genes
Number of genes expressed in various tumour types.
10% encode for Mendelian diseases.
Y 2003
MSY region is 95% of chromosomes’
length
78 protein-coding genes
Mosaic of heterochromatic sequences, 3 classes of euchromatic (X-transposed, X-
degenerate, and ampliconic).
Table 1: The results obtained from sequencing chromosomes 1-21, X, and Y, both before, and after, completion of the HGP (2001-2006).
The number of functional genes associated with each were shown to be highly variable, and a number were shown to possess links to a
number of genetic diseases (Hattori, et al., 2000; Deloukas, et al., 2001; Heilig, et al., 2003; Hillier, et al., 2003; Mungall, et al., 2003;
Skaletsky, et al., 2003; Deloukas, et al., 2004; Dunham, et al., 2004; Grimwood, et al., 2004; Humphray, et al., 2004; Martin, et al., 2004;
Schmutz, et al., 2004; Hillier, et al., 2005; Nusbaum, et al., 2005; Nusbaum, et al., 2005; Ross, et al., 2005; Scherer, et al., 2005; Gregory, et
al., 2006; Muzny, et al., 2006; Taylor, et al., 2006; Zody, et al., 2006; Zody, et al., 2006).
Methylation- How Well Does Our DNA Age?
DNA cytosine methylation is a stable epigenetic
modification integral to genome regulation,
development, and disease, via modulation of the
transcriptional plasticity of the genome (Eckhardt, et
al., 2006). It achieves this by interfering with the
transcription of genes by directly impeding binding of
transcription factor binding motifs (Choy, et al., 2010),
or by recruitment of histone deacetylases via methyl-
CpG-binding domain proteins (Esteller, 2006). Single-
base-resolution maps were generated for features of
the mammalian genome, including embryonic stem
cells (ESCs) and fetal fibroblasts (Lister, et al., 2009).
This has shown that the mechanism of methylation is
variable, and dependent on cell type. Whereas
methylation in non-CG contexts result in gene body
enrichment, resulting in depletion in protein binding
sites and enhancers. There is also evidence that de
novo methyltransferase activity is used to maintain
cellular pluripotency (Lister, et al., 2009). DNA
methylation is variable between species, a defining
factor for differentiation of humans from other
mammalian species (Pai, et al., 2011). However, upon
comparison of methylation patters with chimpanzees,
the T-DMR patterns were conserved between humans
and chimpanzees. However, levels of methylation
dictates the distinction between the two species, a
subset of genes underlies 12-18% of differences in
gene expression levels between the two species (Pai, et
al., 2011). As it stands, DNA methylation has taken us
closer to understanding variation that affects gene
expression between primate species. It has been
shown that DNA methylation age is a potential
measure of the cumulative effect of an epigenetic
maintenance system, and could be used to address
gaps in the knowledge when it comes to;
developmental biology, cancer and aging (Horvath,
2013).
Genomes Through The Ages
2007 saw the sequencing of the first full genome of an
individual human, consisting of ~32 million random
DNA fragments. It was sequenced b Sanger dideoxy
technology (Levy, et al., 2007). Upon comparison to the
National Center for Biotechnology Information (NCBI)
database it revealed the presence of over 4.1 million
DNA variants, encompassing 12.3 Mb, 22% of events
were non-SNP DNA variation, indication of how
important non-SNP genetic alterations are in the
diploid gene structure. The research, ended up
providing a base for future genome comparisons, and
facilitating the era of individualised genomic
information. This study prompted an explosion of
research into the genomes of past cultures and species.
The major contender in this project was the genome of
Neanderthals. In 2010, a draft sequence of our 30,000
year old relatives was produced (Green, et al., 2010),
followed up in 2013 with a complete sequence (Prufer,
et al., 2013). This high-quality sequence, gave insights
into the gene flow events that occurred between
Neanderthals, Denisovans and early humans, giving
interbreeding models that provide an insight into the
loci ancestry, and genome haplotypes that have given
rise to modern humans (Sankararaman, et al., 2014).
Historical, religious, and cultural traditions also have an
influence on geneflow and distribution, with ethno-
religious communities, sharing common traits (SNPs)
and phenotypic characteristics, which we may follow
back to old world populations (Behar, et al., 2010;
Hellenthal, et al., 2014). We can also extrapolate, to
produce global migration patterns, and monitor rise of
new, unique populations, such as Native Americans,
and Inuits (Rasmussen, et al., 2010; Rasmussen, et al.,
2014). Distinctive evidence has also shown that
Khosian and Bantu genomes are significantly different
from those previously mentioned on the nuclear
marker and mitochondrial levels (Schuster, et al., 2010).
How VARY Interesting…..
The human genome is an extremely complex system,
built upon the principle of variation, giving rise to
individuality. With most SNP’s having been assessed,
research has begun to focus on heritable components
of complex traits, and the variation that leads to their
manifestation (Frazer, et al., 2009). Genetic variation in
expression can be attributed to a number of things,
primarily meiosis, where recombination rates have
been seen to vary tremendously across the genome,
occurring in narrow ‘hotspots’, shown through linkage
disequilibrium (LD) and sperm-typing studies (Coop, et
al., 2008). These have also shown links to between-sex
variation. However, a number of structural variations
arise in DNA greater than 1kilobase in length, dictating
a number of insertion and deletion variances between
individuals (Conrad, et al., 2010; Kidd, et al., 453). The
variations observed within the genome range from
common and inconsequential to rare and deleterious
(Pelak, et al., 2010; Robinson, et al., 2014). Although
there has been a great deal of progress in identifying
disease variants, a large number remain unexplained,
and progress is being made in order to develop a high-
resolution map of functional human genetic variation
by studying numerous, geographically different
populations (International HapMap 3 Consortium,
2010; The 1000 Genomes Project Consortium, 2012;
Lappalainen, et al., 2013).
Proteins and You, the Future
With the human genome nearing complete sequencing,
there was only one logical step to take, to attempt to
sequence the human proteome. The Human Proteome
Project (HPP) was established in 2011, with the
intention of mapping the entire human proteome
(Legrain, et al., 2011). It aims to observe all of the
proteins produced by sequences translated from the
human genome, with about 30% of the estimated
20,300 protein-coding genes lacking sufficient protein-
level evidence in 2011 (Legrain, et al., 2011). However,
since this a draft map of the human proteome has been
generated by use of high-resolution Fourier-transform
mass spectrometry (Kim, et al., 2014). The map
constitutes of over 84% of the total 20,300 protein-
coding genes. For the remaining 16% of the proteome,
a number of further proteomic methodologies should
be employed, including; multiple protease analysis, N-
termini capture, Pot-translational enrichment of
modified peptides, fractionation, and technologies
such as; top-down mass spectrometry, and electron
transfer dissociation. As well as this, broadening the
tissue types tested.
References
Alexander, R. P. et al., 2010. Annotating non-coding regions of the
genome.. Nature Reviews. Genetics, 11(8), pp. 559-571.
Ashworth, L. K. et al., 1995. An integrated metric physical map of
human chromosome 19.. Nature Genetics, 11(4), pp. 422-427.
Behar, D. M. et al., 2010. The genome-wide structure of the
Jewish people. Nature, 466(7303), pp. 238-242.
Bell, C. J. et al., 1995. Integration of physical, breakpoint and
genetic maps of chromosome 22. Localization of 587 yeast
artificial chromosomes with 238 mapped markers. Human
Molecular Genetics, 4(1), pp. 59-69.
Bouffard, G. G. et al., 1997. A physical map of human chromosome
7: an integrated YAC contig map with average STS spacing of 79kb.
Genome Research, 7(7), pp. 673-92.
Cao, X. et al., 2013. RRM1 and RMM2 pharmacogenetics:
asociation with phenotypes in HapMap cell lines and acute
myeloid leukaemia patients. Pharmacogenomics, 14(2), pp. 1449-
1466.
Chaisson, M. J. P. et al., 2014. Resolving the complexity of the
human genome using single-molecule sequencing. Nature, 000(0),
pp. 1-11.
Choy, M.-K.et al., 2010. Genome-wide conserved concensus
transcription factor binding motifs are hyper-methylated. BMC
Genomics, 519(11), pp. 1-10.
Collins, F. S. et al., 1998. New goals for the U.S. Human Genome
Project 1998-2003. Science, 282(5389), pp. 682-689.
Conrad, D. F. et al., 2006. A high-resolution survey of deletion
polymorpism in the human genome. Nature Genetics, Volume 38,
pp. 75-81.
Conrad, D. F. et al., 2010. Origins and functional impact of copy
number variation in the human genome. Nature, Volume 464, pp.
704-712.
Consortium, 2001. Initial sequencing and analysis of the human
genome. Nature, 409(6822), pp. 860-921.
Coop, G. et al., 2008. High-Resolution Mapping of Crossovers
Reveals Extensive Variation in Fine-Scale Recombination Patterns
Among Humans. Science, 319(5868), pp. 1395-1398.
Deloukas, P. et al., 2004. The DNA sequence and comparative
analysis of human chromosome 10. Nature, Volume 429, pp. 375-
381.
Deloukas, P. & et-al., 1998. A physical map of 30,000 human
genes.. Science, 282(5389), pp. 744-746.
Deloukas, P. et al., 2001. The DNA sequence and comparative
analysis of human chromosome 20. Nature, Volume 414, pp. 865-
871.
Doggett, N. A. et al., 1995. An integrated physical map of human
chromosome 16.. Nature, 377(4), pp. 335-65.
Dunham, A. et al., 2004. The DNA sequence and analysis of human
chromosome 13.. Nature, 428(6982), pp. 522-528.
Dunham, I. & et-al., 1999. The DNA sequence of human
chromosome 22. Nature, Volume 402, pp. 489-495.
Eckhardt, F. et al., 2006. DNA methylation profiling of human
chromosomes 6, 20, and 22. Nature Genetics, 38(12), pp. 1378-
1385.
Esteller, M., 2006. CpG island methylation and histone
modifications: biology and clinical significance.. Ernst Schering
Research Foundation Workshop, Volume 57, pp. 115-126.
Frazer, K. A., Murray, S. S., Schork, N. J. & Topol, E. J., 2009.
Human genetic variation and its contribution to complex traits..
Nature Reviews. Genetics, 10(4), pp. 241-251.
Gemmill, R. M. et al., 1995. A second-generation YAC contig map
of human chromosome 3.. Nature, 337(4), pp. 299-319.
Genovese, G. et al., 2013. Using population admixture to help
complete maps of the human genome. Nature Genetics, Volume
45, pp. 406-414.
Green, R. E. et al., 2010. A Draft Sequence of the Neanderthal
Genome. Science, 328(5979), pp. 710-722.
Gregory, S. G. et al., 2006. The DNA sequence and biological
annotation of human chromosome 1. Nature, Volume 441, pp.
315-321.
Grimwood, J. et al., 2004. The DNA sequence and biology of
human chromosome 19. Nature, Volume 428, pp. 529-535.
Hangaur, M. J., Vaughn, I. W. & McManus, M. T., 2013. Pervasive
Transcription of the Human Genome Produces Thousands of
Previously Unidentified Long Intergenic Noncoding RNAs. PLOS
Genetics, 9(6), pp. 1-13.
Hattori, M. et al., 2000. The DNA sequence of human chromosome
21. Nature, Volume 405, pp. 311-319.
Heilig, R. et al., 2003. The DNA sequence and analysis of human
chromosome 14. Nature, 421(6923), pp. 601-607.
Hellenthal, G. et al., 2014. A Genetic Atlas of Human Admixture
History. Science, 343(6172), pp. 747-751.
Hillier, L. D. et al., 2005. Generation and annotation of the DNA
sequences of human chromosomes 2 and 4. Nature, Volume 434,
pp. 724-731.
Hillier, L. W. et al., 2003. The DNA sequence of human
chromosome 7. Nature, 424(6945), pp. 157-164.
Hood, L., 2008. A personal journey of discovery: developing
technology and changing biology. Annual Review of ANalytical
Chemistry, Volume 1, pp. 1-43.
Horvath, S., 2013. DNA methylation age of human tissues and cell
types. Genome Biology, 14(10), pp. 2-19.
Human Genome Program, 1990. Five-Year Plan Goes to Capitol
Hill. Human Genome News, 2(1).
Humphray, S. J. et al., 2004. DNA sequence and analysis of human
chromosome 9. Nature, 429(6990), pp. 369-374.
Hung, R. J. et al., 2008. A susceptibility locus for lung cancer maps
to nicotinic acetylcholine receptor subunit genes on 15q25.
Nature, Volume 452, pp. 633-637.
International HapMap 3 Consortium, 2010. Integrating common
and rare genetic variation in diverse human populations.. Nature,
467(7311), pp. 52-58.
International SNP Map Working Group, 2001. A map of human
genome sequence variation containing 1.42 million single
nucleotide polymorphisms.. Nature, 409(6822), pp. 928-933.
Jakobsson, M. et al., 2008. Genotype, haplotype and copy-number
variation in worldwide human populations.. Nature, 451(7181),
pp. 998-1003.
Kidd, J. M. et al., 453. Mapping and sequencing of structural
variation from eight human genomes. Nature, Volume 453, pp.
56-64.
Kim, M. S. et al., 2014. A draft map of the human proteome..
Nature, 509(7502), pp. 575-81.
Krauter, K. et al., 1995. A second-generation YAC contig map of
human chromosome 12.. Nature, 377(4), pp. 321-333.
Lappalainen, T. et al., 2013. Transcriptome and genome
sequencing uncovers functional variation in humans. Nature,
501(7468), pp. 506-511.
Legrain, P. et al., 2011. The human proteome project: current
state and future direction.. Molecular Cell Proteomics, 10(7).
Lennon, G., Auffray, C., Polymeropoulous, M. & Soares, M. B.,
1996. The I.M.A.G.E. Consortium: an integrated molecular analysis
of genomes and their expression.. Genomics, 33(1), pp. 151-152.
Levy, S. et al., 2007. The Diploid Genome Sequence of an
Individual Human. PLOS Biology, 5(10), pp. 2113-2144.
Lister, R. et al., 2009. Human DNA metylomes at base resolution
show widespread epigenomic differences. Nature, Volume 462,
pp. 315-322.
Long, J. et al., 2012. Genome-Wide Association Study in East
Asians Identifies Novel Susceptibility Loci for Breast Cancer. PLOS
Genetics, 8(2), pp. 1-10.
MacArthur, D. G. et al., 2012. A systematic survey of loss-of-
function variants in human protein-coding genes.. Science,
335(6670), pp. 823-828.
Manolio, T. A., Brooks, L. D. & Collins, F. S., 2008. A HapMap
harvest of insights into the genetics of common disease. The
Journal of Clinical Investigation, 118(5), pp. 1590-1605.
Martin, J. et al., 2004. The sequence and analysis of duplication-
rich huma chromosome 16. Nature, 432(7020), pp. 988-994.
McCarroll, S. a. et al., 2006. Common deletion polymorphisms in
the human genome. Nature Genetics, 38(1), pp. 86-92.
Mungall, A. J. et al., 2003. The DNA sequence and analysis of
human chromosome 6. Nature, Volume 425, pp. 805-811.
Muzny, D. M. et al., 2006. The DNA sequence, annotation and
analysis of human chromosome 3. Nature, Volume 440, pp. 1194-
1198.
Nagaraja, R. et al., 1997. X chromosome map at 75-kb STS
resolution, revealing extremes of recombination and GC content..
Genome Research, 7(3), pp. 210-222.
National Human Genome Research Institute, 2003. All Goals
Achieved; New Vision for Genome Research Unveiled. [Online]
Available at: www.genome.gov/11006929
[Accessed 26 October 2014].
Nezos, A. et al., 2014. B-cell activating factor genetic variants in
lymphomaghenesis associated with primary Sjorgen's Syndrome.
Journal of Autoimmunity, Volume 51, pp. 89-98.
Nusbaum, C. et al., 2005. DNA sequence and analysis of human
chromosome 8. Nature, Volume 439, pp. 331-335.
Nusbaum, C. et al., 2005. DNA sequence and analysis of human
chromosome 18. Nature, Volume 437, pp. 551-555.
Pai, A. A. et al., 2011. A Genome-wide Study of DNA Methylation
Patterns and Gene Expression Levels in Multiple Human and
Chimpanzee Tissues. PLOS Genetics, 7(2), pp. 1-11.
Pearson, P. L., 1991. The genome data base (GDB)--a human gene
mapping repository.. Nucleic Acid Research, Volume 19`, pp. 2237-
2239.
Pelak, K. et al., 2010. The Characterization of Twenty Sequenced
Human Genomes. PLOS Genetics, 6(9), pp. 1-10.
Pennisi, E., 2012. ENCODE Project Writes Eulogy for Junk DNA.
Science , 337(6099), pp. 1159-1161.
Prufer, K. et al., 2013. The complete genome sequence of a
Neanderthal from the Altai Mountains. Nature, 505(7481), pp. 43-
49.
Quackenbush, J. et al., 1995. An STS content map of human
chromosome 11: localization of 910 YAC clones and 109 islands..
Genomics, 29(2), pp. 512-25.
Rasmussen, M. et al., 2014. The genome of a Late Pleistocene
human from a Clovis burial site in western Montana. Nature,
Volume 506, pp. 225-229.
Rasmussen, M. et al., 2010. Ancient human genome sequence of
an extinct Palaeo-Eskimo. Nature, Volume 463, pp. 757-762.
Robinson, M. R., Wray, N. R. & Visscher, P. M., 2014. Explaining
additional genetic variation in complex traits. Trends in Genetics,
30(4), pp. 124-132.
Ross, M. T. et al., 2005. The DNA sequence of the human X
chromosome. Nature, 434(7031), pp. 325-337.
Sachidanandam, R. et al., 2001. A map of human genome
sequence variation containing 1.42 million single nucleotide
polymorphisms.. Nature, 409(6822), pp. 928-933.
Saini, H. K., Griffiths-Jones, S. & Enright, A. J., 2007. Genomic
analysis of human microRNA transcripts. Proceedings of the
National Academy of Sciences of the United States of America,
104(45), pp. 17719-17724.
Sankararaman, S. et al., 2014. The genomic landscape of
Neanderthal ancestry in present-day humans. Nature, 507(7492),
pp. 354-357.
Scherer, S. e. et al., 2005. The finished DNA sequence of human
chromosome 12. Nature, Volume 440, pp. 346-351.
Schmutz, J. et al., 2004. The DNA sequence and comparative
analysis of human chromosome 5. Nature, Volume 431, pp. 268-
274.
Schuster, S. C. et al., 2010. Complete Khoisan and Bantu genomes
from southern Africa. Nature, Volume 463, pp. 943-947.
Shastry, B. S., 2002. Jornal of human Genetics. SNP alleles in
human disease and evolution, 47(11), pp. 561-566.
Skaletsky, H. et al., 2003. The male-specific region of the human Y
chromosome is a mosaic of discrete sequence classes. Nature,
Volume 423, pp. 825-537.
Taylor, T. D. et al., 2006. Human chromosome 11 DNA sequence
and analysis including novel gene identification. Nature,
400(7083), pp. 497-500.
The 1000 Genomes Project Consortium, 2012. An integrated map
of genetic variation from 1,092 human genomes. Nature,
491(7422), pp. 56-65.
The ENCODE Project Consortium, 2004. The ENCODE
(ENCyclopedia Of DNA Elements) Project. Science, Volume 306,
pp. 636-640.
The ENCODE Project Consortium, 2007. Identification and analysis
of functional elements in 1% of the human genome by the
ENCODE pilot project. Nature, Volume 447, pp. 799-816.
The International Genome Sequencing Consortium, 2004.
Finishing the euchromatic sequence of teh human genome.
Nature, Volume 431, pp. 931-945.
The International HapMap Consortium, 2003. The International
HapMap Project. Nature, Volume 426, pp. 789-796.
The International HapMap Consortium, 2005. A haplotype map of
the human genome. Nature, Volume 437, pp. 1299-1320.
The International HapMap Consortium, 2007. A second
genertation human halotype map of over 1.3 million SNPs. Nature,
Volume 449, pp. 851-861.
The White House, 2000. PRESIDENT CLINTON ANNOUNCES THE
COMPLETION OF THE FIRST SURVEY OF THE ENTIRE HUMAN
GENOME Hails Public and Private Efforts Leading to This Historic
Achievement, Washington DC: The White House Briefing Room.
Thorisson, G. A. & Stein, L. D., 2003. The SNP Consortium website:
past, present and future. Nucleic Acid Research, 31(1), pp. 124-
127.
Vattikuti, S., Guo, J. & Chow, C. C., 2012. Heritability and Genetic
Correlations Explained by Common SNPs for Metabolic Syndrome
Traits. PLOS Genetics, 8(3), pp. 1-8.
Venter, J. C. & et-al., 2001. The Sequence of the Human Genome.
Science, 291(5507), pp. 1304-1351.
Zody, M. C. et al., 2006. DNA sequence of human chomosome 17
and analysis of rearrangement in the human lineage. Nature,
Volume 440, pp. 1045-1049.
Zody, M. C. et al., 2006. Analysis of the DNA sequence and
duplication history of human chromosome 15. Nature, Volume
440, pp. 671-675.

Contenu connexe

Tendances

Cytogenetic an Experimental Monitoring Test for Plant Extracts
Cytogenetic an Experimental Monitoring Test for Plant ExtractsCytogenetic an Experimental Monitoring Test for Plant Extracts
Cytogenetic an Experimental Monitoring Test for Plant ExtractsIOSRJPBS
 
Post human genome project
Post human genome projectPost human genome project
Post human genome projectIrene Daniel
 
The human genome project vlad mike mike leo duff
The human genome project vlad mike mike leo duffThe human genome project vlad mike mike leo duff
The human genome project vlad mike mike leo duffguest73a974
 
Human genome project [autosaved]
Human genome project [autosaved]Human genome project [autosaved]
Human genome project [autosaved]keerthi samuel
 
The human genome project
The human genome projectThe human genome project
The human genome projectSahil Biswas
 
Human genome project (2) converted
Human genome project (2) convertedHuman genome project (2) converted
Human genome project (2) convertedGAnchal
 
art%3A10.1186%2Fs13742-015-0095-0
art%3A10.1186%2Fs13742-015-0095-0art%3A10.1186%2Fs13742-015-0095-0
art%3A10.1186%2Fs13742-015-0095-0Vladimir Brukhin
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Projectkhamere
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Hong ChangBum
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahuKAUSHAL SAHU
 
A study on the dimensions of
A study on the dimensions ofA study on the dimensions of
A study on the dimensions ofAlexander Decker
 
Senior Project Presentation[1]
Senior Project Presentation[1]Senior Project Presentation[1]
Senior Project Presentation[1]debka
 
Colon Cancer PDX Organoids and Cell Lines
Colon Cancer PDX Organoids and Cell LinesColon Cancer PDX Organoids and Cell Lines
Colon Cancer PDX Organoids and Cell LinesSean Maden
 
The effect of microgravity on cell death, cell growth and cell cycle on breas...
The effect of microgravity on cell death, cell growth and cell cycle on breas...The effect of microgravity on cell death, cell growth and cell cycle on breas...
The effect of microgravity on cell death, cell growth and cell cycle on breas...Journal of Research in Biology
 

Tendances (19)

Cytogenetic an Experimental Monitoring Test for Plant Extracts
Cytogenetic an Experimental Monitoring Test for Plant ExtractsCytogenetic an Experimental Monitoring Test for Plant Extracts
Cytogenetic an Experimental Monitoring Test for Plant Extracts
 
Post human genome project
Post human genome projectPost human genome project
Post human genome project
 
The human genome project vlad mike mike leo duff
The human genome project vlad mike mike leo duffThe human genome project vlad mike mike leo duff
The human genome project vlad mike mike leo duff
 
Human genome project [autosaved]
Human genome project [autosaved]Human genome project [autosaved]
Human genome project [autosaved]
 
The human genome project
The human genome projectThe human genome project
The human genome project
 
Human genome project (2) converted
Human genome project (2) convertedHuman genome project (2) converted
Human genome project (2) converted
 
art%3A10.1186%2Fs13742-015-0095-0
art%3A10.1186%2Fs13742-015-0095-0art%3A10.1186%2Fs13742-015-0095-0
art%3A10.1186%2Fs13742-015-0095-0
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Project
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
 
Leaders in Science and Society - Prof Richard Gibbs
Leaders in Science and Society - Prof Richard GibbsLeaders in Science and Society - Prof Richard Gibbs
Leaders in Science and Society - Prof Richard Gibbs
 
ACC Cancer Cell May 2016
ACC Cancer Cell May 2016ACC Cancer Cell May 2016
ACC Cancer Cell May 2016
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahu
 
Rym kefi (1)
Rym kefi (1)Rym kefi (1)
Rym kefi (1)
 
A study on the dimensions of
A study on the dimensions ofA study on the dimensions of
A study on the dimensions of
 
Senior Project Presentation[1]
Senior Project Presentation[1]Senior Project Presentation[1]
Senior Project Presentation[1]
 
Hgp
HgpHgp
Hgp
 
s12935-014-0115-7
s12935-014-0115-7s12935-014-0115-7
s12935-014-0115-7
 
Colon Cancer PDX Organoids and Cell Lines
Colon Cancer PDX Organoids and Cell LinesColon Cancer PDX Organoids and Cell Lines
Colon Cancer PDX Organoids and Cell Lines
 
The effect of microgravity on cell death, cell growth and cell cycle on breas...
The effect of microgravity on cell death, cell growth and cell cycle on breas...The effect of microgravity on cell death, cell growth and cell cycle on breas...
The effect of microgravity on cell death, cell growth and cell cycle on breas...
 

Similaire à Whats Beyond The Finished Human Genome Sequence

Complete assignment on human Genome Project
Complete assignment on human Genome ProjectComplete assignment on human Genome Project
Complete assignment on human Genome Projectaafaq ali
 
Human genome
Human genomeHuman genome
Human genomeshoaa311
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECTNusrat Gulbarga
 
humangenomeproject-170217141511.pdf
humangenomeproject-170217141511.pdfhumangenomeproject-170217141511.pdf
humangenomeproject-170217141511.pdfMaryDiana27
 
The human genome project was started in 1990 with the goal of sequencing and ...
The human genome project was started in 1990 with the goal of sequencing and ...The human genome project was started in 1990 with the goal of sequencing and ...
The human genome project was started in 1990 with the goal of sequencing and ...Rania Malik
 
HUMAN GENOME PROJECT by Ali Afzal.pptx
HUMAN GENOME PROJECT by Ali Afzal.pptxHUMAN GENOME PROJECT by Ali Afzal.pptx
HUMAN GENOME PROJECT by Ali Afzal.pptxAliAfzal576752
 
Human genome project(ibri)
Human genome project(ibri)Human genome project(ibri)
Human genome project(ibri)ajay vishwakrma
 
human genome project_094513.pptx
human genome project_094513.pptxhuman genome project_094513.pptx
human genome project_094513.pptxpadmasriv25
 
Describe in your own words the benefits, but also the problems of ha.pdf
Describe in your own words the benefits, but also the problems of ha.pdfDescribe in your own words the benefits, but also the problems of ha.pdf
Describe in your own words the benefits, but also the problems of ha.pdfarenamobiles123
 

Similaire à Whats Beyond The Finished Human Genome Sequence (20)

Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Project
 
Complete assignment on human Genome Project
Complete assignment on human Genome ProjectComplete assignment on human Genome Project
Complete assignment on human Genome Project
 
Human genome project 1
Human genome project 1Human genome project 1
Human genome project 1
 
Human genome
Human genomeHuman genome
Human genome
 
Human genome project and elsi
Human genome project and elsiHuman genome project and elsi
Human genome project and elsi
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECT
 
humangenomeproject-170217141511.pdf
humangenomeproject-170217141511.pdfhumangenomeproject-170217141511.pdf
humangenomeproject-170217141511.pdf
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
The human genome project was started in 1990 with the goal of sequencing and ...
The human genome project was started in 1990 with the goal of sequencing and ...The human genome project was started in 1990 with the goal of sequencing and ...
The human genome project was started in 1990 with the goal of sequencing and ...
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
HUMAN GENOME PROJECT by Ali Afzal.pptx
HUMAN GENOME PROJECT by Ali Afzal.pptxHUMAN GENOME PROJECT by Ali Afzal.pptx
HUMAN GENOME PROJECT by Ali Afzal.pptx
 
Human genome project(ibri)
Human genome project(ibri)Human genome project(ibri)
Human genome project(ibri)
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Project
 
Biology Literature Review Example
Biology Literature Review ExampleBiology Literature Review Example
Biology Literature Review Example
 
Biology Literature Review Example
Biology Literature Review ExampleBiology Literature Review Example
Biology Literature Review Example
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Project
 
human genome project_094513.pptx
human genome project_094513.pptxhuman genome project_094513.pptx
human genome project_094513.pptx
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Describe in your own words the benefits, but also the problems of ha.pdf
Describe in your own words the benefits, but also the problems of ha.pdfDescribe in your own words the benefits, but also the problems of ha.pdf
Describe in your own words the benefits, but also the problems of ha.pdf
 
 

Whats Beyond The Finished Human Genome Sequence

  • 1. What’s Beyond the Finished Human Genome Sequence? Abstract The Human Genome Project is a project that has changed the course of biomedicine and biotechnology, since its conception in 1991, and its ‘completion’ in 2003. With an effervescent scientific community its reach has spread far and wide, with a number of projects spawning from the initial results. Over the past decade research has built upon the foundations of this project, and is now reaching a point of completing genomic analysis. From the data collected, research is now looking to moving forward in to a more understudied area of science, an area that has a lot of untapped potential…the proteome. Baby Steps for Big Science The year is 1990, the U.S. Department of Environmental Energy (DOE), and the National Institute of Health (NIH) have invested $3-billion into a project that was to completely alter the course of modern science. It is at this point that The Human Genome Project (HGP) sprung into life. ‘The Five-Year Plan’ was proposed members of congressional appropriations committees, detailing the project aims from 1991-1995 (Human Genome Program, 1990). Those five years encompassed the establishment of the building blocks that would form the project core; the human gene mapping repository (Pearson, 1991), and the International IMAGE Consortium (Lennon, et al., 1996). This was followed by the completion of the first round of genetic mapping, and the publication of a number of high- and moderate-resolution physical maps of chromosomes 16, 19, 3, 11, 12, and 22 (Ashworth, et al., 1995; Bell, et al., 1995; Doggett, et al., 1995; Gemmill, et al., 1995; Krauter, et al., 1995; Quackenbush, et al., 1995). From 1996-2003 the sequencing of the genome progressed in leaps and bounds, with the high- resolution mapping of chromosomes 7 and X in early 1997 (Bouffard, et al., 1997; Nagaraja, et al., 1997). These successes were quickly followed up in 1998 by the DOE and NIH revealing another 5-year plan, aiming to complete the HGP by 2003 (Collins, et al., 1998), as well as publication of GeneMap’98, doubling the number of known genes (Deloukas & et-al., 1998). 1999 encompassed the complete sequencing and analysis of chromosome 22 (Dunham & et-al., 1999), the creation of the first public SNP consortium by major drug firms (Thorisson & Stein, 2003), and a milestone in sequencing the first 1-billion base pairs. Come 25th June 2000, President Clinton and HGP leaders announced the completion of a working draft sequence of the human genome at “a historic White House event” (The White House, 2000), later published by the International Human Genome Sequencing Consortium, and Celera Genomics in the February (Consortium, 2001; Venter & et-al., 2001). Between late 2001 and early 2003 saw the publications of the SNP map of the human genome (International SNP Map Working Group, 2001), as well as the complete sequence analysis of chromosome 14 (Heilig, et al., 2003). The completion of the 13-year project was announced in the April of 2003, with the genome being sequenced to 99.9% accuracy (National Human Genome Research Institute, 2003). The results obtained from the HGP have acted as the basis of all human genome studies of the past 11 years. Main Body (1000) Chromosomes, and Chromatin- Unzipping Your Genes The first four years after completion of the HGP (2003- 2006) saw heavy focus on complete sequencing and analysis of the remaining 19 un-sequenced chromosomes (Table 1). The research painted an interesting picture of genetic disease. Across the 23 human chromosomes, 16 possess links to a number of diseases, including cancers, and hereditary syndromes (Mungall, et al., 2003; Deloukas, et al., 2004; Dunham, et al., 2004; Humphray, et al., 2004; Ross, et al., 2005;
  • 2. Gregory, et al., 2006; Zody, et al., 2006). As well as this, it has given insight into the gene, and chromatin distribution across the 21 autosomes, with chromosome 19 possessing the highest gene density in the genome (Grimwood, et al., 2004), and chromosome 18 the lowest (Nusbaum, et al., 2005). However, there were still holes in the map of the human genome, in the form of euchromatic sequences, with 341 gaps being unaccounted for 10 years ago (The International Genome Sequencing Consortium, 2004). In recent years this numbers has reduced to 160 (Genovese, et al., 2013), but still to this date structural variation is poorly understood. However, by single- molecule sequencing, it has been implied that there is a 3:1 insertional bias corresponding to complex insertions, and long short tandem repeats (Chaisson, et al., 2014), suggesting greater complexity of the human genome than first thought. Although, there is potential for these to be resolved by used of longer-read sequencing technology. Decoding ENCODE Sequencing of the 23 human chromosomes initiated a system of cataloguing of most human genes encoding proteins, and other important elements (e.g. non- coding regulatory RNAs) (Venter & et-al., 2001). This list has become a keystone in systems biology, giving insight into how different structures/systems are connected, their dynamics, and how function relates to this (Hood, 2008). This catalogue manifested itself in the form of the ENCODE (Encyclopedia Of DNA Elements) Project, launched in 2003, looking to identify all functional elements in the human genome sequence (The ENCODE Project Consortium, 2004). The pilot project focused on a 30 megabase region of the human genome sequence, the results of which were published in mid-2007, with a number of revelations as to microRNA transcripts within the genome (Saini, et al., 2007). Primarily it was implied that the human genome was pervasively transcribed, with many novel non- protein-coding regions (The ENCODE Project Consortium, 2007; Alexander, et al., 2010). However, it a number of non-coding regions produce thousands of specifically-regulated lincRNAs enriched for trait- associated Single Nucleotide Polymorphisms (SNPs) (Hangaur, et al., 2013), as well as offering landing spots for proteins that influence gene activity (Pennisi, 2012). These discoveries allow for interrogation of newly- discovered novel intergenic functional elements, such as loss of function alleles and variants in non-essential genes (MacArthur, et al., 2012). SNP-ing Hap Hazardly A map published in 2001 indicated that the human genome possesses roughly 1.42 million SNPs. Roughly 60,000 falling within exon regions (coding and UTRs), 85% of which are within 5kb of the nearest SNP (Sachidanandam, et al., 2001). From initial identification of these commonly-occurring DNA sequence variations, a number of associations have been made with them, including; population diversity, individuality, disease susceptibility, and individual response to medicine (Shastry, 2002). The International HapMap Project was founded in 2003 as a means to determine sequence variation within the human genome, using populations, including those in; Africa, Asia, and Europe (The International HapMap Consortium, 2003). By October, the HapMap project had produced a haplotype map of the entire genome identifying complete genotypes for over a million of the identified SNPs, including DNA variations across 4 populations (The International HapMap Consortium, 2005). The results obtained have helped significantly in moving forward research in genetically inherited diseases (Manolio, et al., 2008), cancers (Hung, et al., 2008; Cao, et al., 2013), and syndromes (Nezos, et al., 2014; Vattikuti, et al., 2012), allowing for identification of loci that heavily influence their manifestation. A Phase II HapMap was generated in 2007, doubling the number of known SNPs. The results also showed novel aspects of linkage disequilibrium and that 10-30% of pairs of individuals within a population share at least one region of recent-ancestry extended genetic identity, with 1% of common variants being untaggable (The International HapMap Consortium, 2007). Evidence has also shown that certain races show susceptibility loci for particular diseases, dictated by haplotype, and copy-number variations (Jakobsson, et al., 2008). For example Asians show increased expression of SNP rs9485372 near the TAB2 gene, increasing breast cancer susceptibility (Long, et al., 2012). In addition to SNPs, deletion polymorphisms in the human genome have been determined using this database. By analysing SNP genotype data form parent-offspring trios, high-resolution population surveys of deletion polymorphisms were produced. The results from the experiment identified 586 distinct regions harbouring deletion polymorphisms, 278
  • 3. observed in unrelated individuals (Conrad, et al., 2006; McCarroll, et al., 2006). Chromosome Identity Initial Sequence & Analysis Primary Findings Additional Details 1 2006 3,141 protein-encoding genes, and 991 pseudogenes Mutations and rearrangements prevalent in cancer and many other diseases. 2 2005 1,346 protein-coding genes, and 1,239 pseudogenes Unique to human lineage; product of head-to-head fusion of two intermediate-sized ancestral chromosomes 3 2006 1,425 protein-encoding genes, 8 novel genes, 27 novel transcripts, 3 putative genes, and 122 pseudogenes Comprises of just four contigs Lowest rate of segmental duplication in the genome Chemokine receptor gene cluster Numerous loci involved in human cancers, e.g. gene encoding FHIT 4 2005 796 protein-coding genes, and 778 pseudogenes Genes associated with; Huntington’s disease, Wolf-Hirschhorn syndrome, polycystic kidney disease, and muscular dystrophy 5 2004 923 manually curated protein-coding genes One of the largest human chromosomes, yet lowest gene density Encode protocadherin and interleukin gene families 6 2003 1,557 genes identified, and 633 pseudogenes (6% of genome) Genes directly implicated in; cancer, schizophrenia, autoimmunity, and many more. 7 2003 1,150 protein-coding genes, and additional 941 pseudogenes Unusual amount of segmentally duplicated sequence 8 2006 793 protein-encoding genes, and 301 pseudogenes 15Mb region on distal 8p with high mutation rate, possessing genes related to; innate immunity, the nervous system, and MCPH1 gene cluster 9 2004 1,149 genes, and 426 pseudogenes identified Largest autosomal block of heterochromatin Genes implicated in; mate-to-female sex reversal, cancer, and neurodegenerative disease 10 2004 816 protein-coding genes, 430 pseudogenes identified Identified 67 antisense transcripts PTEN tumour suppressor, and RET proto-oncogene identified 11 2006 1,524 protein-coding genes, and 765 pseudogenes 40% of olfactory receptor genes in human genome located in 28 gene clusters along chromosome. 85 genes encode for disorders. 12 2006 1,400 coding genes, and 487 loci that have direct implications with human disease The q arm contains one of the largest linkage disequilibrium in entire human genome 13 2004 633 genes identified, 296 pseudogenes, and 105 putative non- coding RNA genes Genes of interest; BCRA2 gene, RB1 gene DAOA locus (bipolar + schizophrenia) 14 2003 1,050 genes identified, 393 pseudogenes Two loci of crucial importance for immune system. >60 disease genes localised 15 2006 695 protein-encoding genes, and 250 pseudogenes High rate of segmental duplication, can result in; Prader-Willi and Angelman syndromes. 16 2004 880 protein-coding genes, 19 RNA transfer genes, 341 pseudogenes, and 3 RNA pseudogenes Genes identified; metallothionein, cadherin, and Iroquois families Disease genes; polycystic kidney disease, acute myelomonocytic leukaemia. 17 2006 1,226 protein-encoding genes, and 274 pseudogenes Second highest gene density in genome. Implicated in wide range of genetic diseases, including; BRCA1, NF1, TP53, NAHR, HNPP, SMS, and CMT1A 18 2005 377 protein-encoding genes, and 171 pseudogenes Lowest gene density of any human chromosome. Number of genetic disorders from trisomy and aneuploidy of gene 19 2004 1,461 protein-coding genes, and 321 pesudogenes Highest density of all human chromosomes Genes implicated in; Mendelian disorders (hypercholesterolaemia, insulin-resistant diabetes) 20 2001 727 protein-encoding genes, and 168 pseudogenes Genes encoding; protease inhibitors with antibacterial and antiviral activities, reproductive proteins SEMG1+2 21 2000 127 protein-encoding genes, 98 predicted genes, and 59 pseudogenes Several anonymous loci for monogenic disorders and predispositions for common complex disorders mapped. Loss of heterozygosity observed in regions associated with solid tumours. X 2005 99 protein-encoding genes, 113 X- linked genes Number of genes expressed in various tumour types. 10% encode for Mendelian diseases. Y 2003 MSY region is 95% of chromosomes’ length 78 protein-coding genes Mosaic of heterochromatic sequences, 3 classes of euchromatic (X-transposed, X- degenerate, and ampliconic). Table 1: The results obtained from sequencing chromosomes 1-21, X, and Y, both before, and after, completion of the HGP (2001-2006). The number of functional genes associated with each were shown to be highly variable, and a number were shown to possess links to a number of genetic diseases (Hattori, et al., 2000; Deloukas, et al., 2001; Heilig, et al., 2003; Hillier, et al., 2003; Mungall, et al., 2003; Skaletsky, et al., 2003; Deloukas, et al., 2004; Dunham, et al., 2004; Grimwood, et al., 2004; Humphray, et al., 2004; Martin, et al., 2004; Schmutz, et al., 2004; Hillier, et al., 2005; Nusbaum, et al., 2005; Nusbaum, et al., 2005; Ross, et al., 2005; Scherer, et al., 2005; Gregory, et al., 2006; Muzny, et al., 2006; Taylor, et al., 2006; Zody, et al., 2006; Zody, et al., 2006).
  • 4. Methylation- How Well Does Our DNA Age? DNA cytosine methylation is a stable epigenetic modification integral to genome regulation, development, and disease, via modulation of the transcriptional plasticity of the genome (Eckhardt, et al., 2006). It achieves this by interfering with the transcription of genes by directly impeding binding of transcription factor binding motifs (Choy, et al., 2010), or by recruitment of histone deacetylases via methyl- CpG-binding domain proteins (Esteller, 2006). Single- base-resolution maps were generated for features of the mammalian genome, including embryonic stem cells (ESCs) and fetal fibroblasts (Lister, et al., 2009). This has shown that the mechanism of methylation is variable, and dependent on cell type. Whereas methylation in non-CG contexts result in gene body enrichment, resulting in depletion in protein binding sites and enhancers. There is also evidence that de novo methyltransferase activity is used to maintain cellular pluripotency (Lister, et al., 2009). DNA methylation is variable between species, a defining factor for differentiation of humans from other mammalian species (Pai, et al., 2011). However, upon comparison of methylation patters with chimpanzees, the T-DMR patterns were conserved between humans and chimpanzees. However, levels of methylation dictates the distinction between the two species, a subset of genes underlies 12-18% of differences in gene expression levels between the two species (Pai, et al., 2011). As it stands, DNA methylation has taken us closer to understanding variation that affects gene expression between primate species. It has been shown that DNA methylation age is a potential measure of the cumulative effect of an epigenetic maintenance system, and could be used to address gaps in the knowledge when it comes to; developmental biology, cancer and aging (Horvath, 2013). Genomes Through The Ages 2007 saw the sequencing of the first full genome of an individual human, consisting of ~32 million random DNA fragments. It was sequenced b Sanger dideoxy technology (Levy, et al., 2007). Upon comparison to the National Center for Biotechnology Information (NCBI) database it revealed the presence of over 4.1 million DNA variants, encompassing 12.3 Mb, 22% of events were non-SNP DNA variation, indication of how important non-SNP genetic alterations are in the diploid gene structure. The research, ended up providing a base for future genome comparisons, and facilitating the era of individualised genomic information. This study prompted an explosion of research into the genomes of past cultures and species. The major contender in this project was the genome of Neanderthals. In 2010, a draft sequence of our 30,000 year old relatives was produced (Green, et al., 2010), followed up in 2013 with a complete sequence (Prufer, et al., 2013). This high-quality sequence, gave insights into the gene flow events that occurred between Neanderthals, Denisovans and early humans, giving interbreeding models that provide an insight into the loci ancestry, and genome haplotypes that have given rise to modern humans (Sankararaman, et al., 2014). Historical, religious, and cultural traditions also have an influence on geneflow and distribution, with ethno- religious communities, sharing common traits (SNPs) and phenotypic characteristics, which we may follow back to old world populations (Behar, et al., 2010; Hellenthal, et al., 2014). We can also extrapolate, to produce global migration patterns, and monitor rise of new, unique populations, such as Native Americans, and Inuits (Rasmussen, et al., 2010; Rasmussen, et al., 2014). Distinctive evidence has also shown that Khosian and Bantu genomes are significantly different from those previously mentioned on the nuclear marker and mitochondrial levels (Schuster, et al., 2010). How VARY Interesting….. The human genome is an extremely complex system, built upon the principle of variation, giving rise to individuality. With most SNP’s having been assessed, research has begun to focus on heritable components of complex traits, and the variation that leads to their manifestation (Frazer, et al., 2009). Genetic variation in expression can be attributed to a number of things, primarily meiosis, where recombination rates have been seen to vary tremendously across the genome, occurring in narrow ‘hotspots’, shown through linkage disequilibrium (LD) and sperm-typing studies (Coop, et al., 2008). These have also shown links to between-sex variation. However, a number of structural variations arise in DNA greater than 1kilobase in length, dictating a number of insertion and deletion variances between individuals (Conrad, et al., 2010; Kidd, et al., 453). The variations observed within the genome range from common and inconsequential to rare and deleterious (Pelak, et al., 2010; Robinson, et al., 2014). Although
  • 5. there has been a great deal of progress in identifying disease variants, a large number remain unexplained, and progress is being made in order to develop a high- resolution map of functional human genetic variation by studying numerous, geographically different populations (International HapMap 3 Consortium, 2010; The 1000 Genomes Project Consortium, 2012; Lappalainen, et al., 2013). Proteins and You, the Future With the human genome nearing complete sequencing, there was only one logical step to take, to attempt to sequence the human proteome. The Human Proteome Project (HPP) was established in 2011, with the intention of mapping the entire human proteome (Legrain, et al., 2011). It aims to observe all of the proteins produced by sequences translated from the human genome, with about 30% of the estimated 20,300 protein-coding genes lacking sufficient protein- level evidence in 2011 (Legrain, et al., 2011). However, since this a draft map of the human proteome has been generated by use of high-resolution Fourier-transform mass spectrometry (Kim, et al., 2014). The map constitutes of over 84% of the total 20,300 protein- coding genes. For the remaining 16% of the proteome, a number of further proteomic methodologies should be employed, including; multiple protease analysis, N- termini capture, Pot-translational enrichment of modified peptides, fractionation, and technologies such as; top-down mass spectrometry, and electron transfer dissociation. As well as this, broadening the tissue types tested. References Alexander, R. P. et al., 2010. Annotating non-coding regions of the genome.. Nature Reviews. Genetics, 11(8), pp. 559-571. Ashworth, L. K. et al., 1995. An integrated metric physical map of human chromosome 19.. Nature Genetics, 11(4), pp. 422-427. Behar, D. M. et al., 2010. The genome-wide structure of the Jewish people. Nature, 466(7303), pp. 238-242. Bell, C. J. et al., 1995. Integration of physical, breakpoint and genetic maps of chromosome 22. Localization of 587 yeast artificial chromosomes with 238 mapped markers. Human Molecular Genetics, 4(1), pp. 59-69. Bouffard, G. G. et al., 1997. A physical map of human chromosome 7: an integrated YAC contig map with average STS spacing of 79kb. Genome Research, 7(7), pp. 673-92. Cao, X. et al., 2013. RRM1 and RMM2 pharmacogenetics: asociation with phenotypes in HapMap cell lines and acute myeloid leukaemia patients. Pharmacogenomics, 14(2), pp. 1449- 1466. Chaisson, M. J. P. et al., 2014. Resolving the complexity of the human genome using single-molecule sequencing. Nature, 000(0), pp. 1-11. Choy, M.-K.et al., 2010. Genome-wide conserved concensus transcription factor binding motifs are hyper-methylated. BMC Genomics, 519(11), pp. 1-10. Collins, F. S. et al., 1998. New goals for the U.S. Human Genome Project 1998-2003. Science, 282(5389), pp. 682-689.
  • 6. Conrad, D. F. et al., 2006. A high-resolution survey of deletion polymorpism in the human genome. Nature Genetics, Volume 38, pp. 75-81. Conrad, D. F. et al., 2010. Origins and functional impact of copy number variation in the human genome. Nature, Volume 464, pp. 704-712. Consortium, 2001. Initial sequencing and analysis of the human genome. Nature, 409(6822), pp. 860-921. Coop, G. et al., 2008. High-Resolution Mapping of Crossovers Reveals Extensive Variation in Fine-Scale Recombination Patterns Among Humans. Science, 319(5868), pp. 1395-1398. Deloukas, P. et al., 2004. The DNA sequence and comparative analysis of human chromosome 10. Nature, Volume 429, pp. 375- 381. Deloukas, P. & et-al., 1998. A physical map of 30,000 human genes.. Science, 282(5389), pp. 744-746. Deloukas, P. et al., 2001. The DNA sequence and comparative analysis of human chromosome 20. Nature, Volume 414, pp. 865- 871. Doggett, N. A. et al., 1995. An integrated physical map of human chromosome 16.. Nature, 377(4), pp. 335-65. Dunham, A. et al., 2004. The DNA sequence and analysis of human chromosome 13.. Nature, 428(6982), pp. 522-528. Dunham, I. & et-al., 1999. The DNA sequence of human chromosome 22. Nature, Volume 402, pp. 489-495. Eckhardt, F. et al., 2006. DNA methylation profiling of human chromosomes 6, 20, and 22. Nature Genetics, 38(12), pp. 1378- 1385. Esteller, M., 2006. CpG island methylation and histone modifications: biology and clinical significance.. Ernst Schering Research Foundation Workshop, Volume 57, pp. 115-126. Frazer, K. A., Murray, S. S., Schork, N. J. & Topol, E. J., 2009. Human genetic variation and its contribution to complex traits.. Nature Reviews. Genetics, 10(4), pp. 241-251. Gemmill, R. M. et al., 1995. A second-generation YAC contig map of human chromosome 3.. Nature, 337(4), pp. 299-319. Genovese, G. et al., 2013. Using population admixture to help complete maps of the human genome. Nature Genetics, Volume 45, pp. 406-414. Green, R. E. et al., 2010. A Draft Sequence of the Neanderthal Genome. Science, 328(5979), pp. 710-722. Gregory, S. G. et al., 2006. The DNA sequence and biological annotation of human chromosome 1. Nature, Volume 441, pp. 315-321. Grimwood, J. et al., 2004. The DNA sequence and biology of human chromosome 19. Nature, Volume 428, pp. 529-535. Hangaur, M. J., Vaughn, I. W. & McManus, M. T., 2013. Pervasive Transcription of the Human Genome Produces Thousands of Previously Unidentified Long Intergenic Noncoding RNAs. PLOS Genetics, 9(6), pp. 1-13. Hattori, M. et al., 2000. The DNA sequence of human chromosome 21. Nature, Volume 405, pp. 311-319. Heilig, R. et al., 2003. The DNA sequence and analysis of human chromosome 14. Nature, 421(6923), pp. 601-607. Hellenthal, G. et al., 2014. A Genetic Atlas of Human Admixture History. Science, 343(6172), pp. 747-751. Hillier, L. D. et al., 2005. Generation and annotation of the DNA sequences of human chromosomes 2 and 4. Nature, Volume 434, pp. 724-731. Hillier, L. W. et al., 2003. The DNA sequence of human chromosome 7. Nature, 424(6945), pp. 157-164. Hood, L., 2008. A personal journey of discovery: developing technology and changing biology. Annual Review of ANalytical Chemistry, Volume 1, pp. 1-43. Horvath, S., 2013. DNA methylation age of human tissues and cell types. Genome Biology, 14(10), pp. 2-19. Human Genome Program, 1990. Five-Year Plan Goes to Capitol Hill. Human Genome News, 2(1). Humphray, S. J. et al., 2004. DNA sequence and analysis of human chromosome 9. Nature, 429(6990), pp. 369-374. Hung, R. J. et al., 2008. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature, Volume 452, pp. 633-637. International HapMap 3 Consortium, 2010. Integrating common and rare genetic variation in diverse human populations.. Nature, 467(7311), pp. 52-58. International SNP Map Working Group, 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.. Nature, 409(6822), pp. 928-933. Jakobsson, M. et al., 2008. Genotype, haplotype and copy-number variation in worldwide human populations.. Nature, 451(7181), pp. 998-1003. Kidd, J. M. et al., 453. Mapping and sequencing of structural variation from eight human genomes. Nature, Volume 453, pp. 56-64. Kim, M. S. et al., 2014. A draft map of the human proteome.. Nature, 509(7502), pp. 575-81. Krauter, K. et al., 1995. A second-generation YAC contig map of human chromosome 12.. Nature, 377(4), pp. 321-333. Lappalainen, T. et al., 2013. Transcriptome and genome sequencing uncovers functional variation in humans. Nature, 501(7468), pp. 506-511.
  • 7. Legrain, P. et al., 2011. The human proteome project: current state and future direction.. Molecular Cell Proteomics, 10(7). Lennon, G., Auffray, C., Polymeropoulous, M. & Soares, M. B., 1996. The I.M.A.G.E. Consortium: an integrated molecular analysis of genomes and their expression.. Genomics, 33(1), pp. 151-152. Levy, S. et al., 2007. The Diploid Genome Sequence of an Individual Human. PLOS Biology, 5(10), pp. 2113-2144. Lister, R. et al., 2009. Human DNA metylomes at base resolution show widespread epigenomic differences. Nature, Volume 462, pp. 315-322. Long, J. et al., 2012. Genome-Wide Association Study in East Asians Identifies Novel Susceptibility Loci for Breast Cancer. PLOS Genetics, 8(2), pp. 1-10. MacArthur, D. G. et al., 2012. A systematic survey of loss-of- function variants in human protein-coding genes.. Science, 335(6670), pp. 823-828. Manolio, T. A., Brooks, L. D. & Collins, F. S., 2008. A HapMap harvest of insights into the genetics of common disease. The Journal of Clinical Investigation, 118(5), pp. 1590-1605. Martin, J. et al., 2004. The sequence and analysis of duplication- rich huma chromosome 16. Nature, 432(7020), pp. 988-994. McCarroll, S. a. et al., 2006. Common deletion polymorphisms in the human genome. Nature Genetics, 38(1), pp. 86-92. Mungall, A. J. et al., 2003. The DNA sequence and analysis of human chromosome 6. Nature, Volume 425, pp. 805-811. Muzny, D. M. et al., 2006. The DNA sequence, annotation and analysis of human chromosome 3. Nature, Volume 440, pp. 1194- 1198. Nagaraja, R. et al., 1997. X chromosome map at 75-kb STS resolution, revealing extremes of recombination and GC content.. Genome Research, 7(3), pp. 210-222. National Human Genome Research Institute, 2003. All Goals Achieved; New Vision for Genome Research Unveiled. [Online] Available at: www.genome.gov/11006929 [Accessed 26 October 2014]. Nezos, A. et al., 2014. B-cell activating factor genetic variants in lymphomaghenesis associated with primary Sjorgen's Syndrome. Journal of Autoimmunity, Volume 51, pp. 89-98. Nusbaum, C. et al., 2005. DNA sequence and analysis of human chromosome 8. Nature, Volume 439, pp. 331-335. Nusbaum, C. et al., 2005. DNA sequence and analysis of human chromosome 18. Nature, Volume 437, pp. 551-555. Pai, A. A. et al., 2011. A Genome-wide Study of DNA Methylation Patterns and Gene Expression Levels in Multiple Human and Chimpanzee Tissues. PLOS Genetics, 7(2), pp. 1-11. Pearson, P. L., 1991. The genome data base (GDB)--a human gene mapping repository.. Nucleic Acid Research, Volume 19`, pp. 2237- 2239. Pelak, K. et al., 2010. The Characterization of Twenty Sequenced Human Genomes. PLOS Genetics, 6(9), pp. 1-10. Pennisi, E., 2012. ENCODE Project Writes Eulogy for Junk DNA. Science , 337(6099), pp. 1159-1161. Prufer, K. et al., 2013. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature, 505(7481), pp. 43- 49. Quackenbush, J. et al., 1995. An STS content map of human chromosome 11: localization of 910 YAC clones and 109 islands.. Genomics, 29(2), pp. 512-25. Rasmussen, M. et al., 2014. The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature, Volume 506, pp. 225-229. Rasmussen, M. et al., 2010. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature, Volume 463, pp. 757-762. Robinson, M. R., Wray, N. R. & Visscher, P. M., 2014. Explaining additional genetic variation in complex traits. Trends in Genetics, 30(4), pp. 124-132. Ross, M. T. et al., 2005. The DNA sequence of the human X chromosome. Nature, 434(7031), pp. 325-337. Sachidanandam, R. et al., 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.. Nature, 409(6822), pp. 928-933. Saini, H. K., Griffiths-Jones, S. & Enright, A. J., 2007. Genomic analysis of human microRNA transcripts. Proceedings of the National Academy of Sciences of the United States of America, 104(45), pp. 17719-17724. Sankararaman, S. et al., 2014. The genomic landscape of Neanderthal ancestry in present-day humans. Nature, 507(7492), pp. 354-357. Scherer, S. e. et al., 2005. The finished DNA sequence of human chromosome 12. Nature, Volume 440, pp. 346-351. Schmutz, J. et al., 2004. The DNA sequence and comparative analysis of human chromosome 5. Nature, Volume 431, pp. 268- 274. Schuster, S. C. et al., 2010. Complete Khoisan and Bantu genomes from southern Africa. Nature, Volume 463, pp. 943-947. Shastry, B. S., 2002. Jornal of human Genetics. SNP alleles in human disease and evolution, 47(11), pp. 561-566. Skaletsky, H. et al., 2003. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature, Volume 423, pp. 825-537.
  • 8. Taylor, T. D. et al., 2006. Human chromosome 11 DNA sequence and analysis including novel gene identification. Nature, 400(7083), pp. 497-500. The 1000 Genomes Project Consortium, 2012. An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422), pp. 56-65. The ENCODE Project Consortium, 2004. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science, Volume 306, pp. 636-640. The ENCODE Project Consortium, 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature, Volume 447, pp. 799-816. The International Genome Sequencing Consortium, 2004. Finishing the euchromatic sequence of teh human genome. Nature, Volume 431, pp. 931-945. The International HapMap Consortium, 2003. The International HapMap Project. Nature, Volume 426, pp. 789-796. The International HapMap Consortium, 2005. A haplotype map of the human genome. Nature, Volume 437, pp. 1299-1320. The International HapMap Consortium, 2007. A second genertation human halotype map of over 1.3 million SNPs. Nature, Volume 449, pp. 851-861. The White House, 2000. PRESIDENT CLINTON ANNOUNCES THE COMPLETION OF THE FIRST SURVEY OF THE ENTIRE HUMAN GENOME Hails Public and Private Efforts Leading to This Historic Achievement, Washington DC: The White House Briefing Room. Thorisson, G. A. & Stein, L. D., 2003. The SNP Consortium website: past, present and future. Nucleic Acid Research, 31(1), pp. 124- 127. Vattikuti, S., Guo, J. & Chow, C. C., 2012. Heritability and Genetic Correlations Explained by Common SNPs for Metabolic Syndrome Traits. PLOS Genetics, 8(3), pp. 1-8. Venter, J. C. & et-al., 2001. The Sequence of the Human Genome. Science, 291(5507), pp. 1304-1351. Zody, M. C. et al., 2006. DNA sequence of human chomosome 17 and analysis of rearrangement in the human lineage. Nature, Volume 440, pp. 1045-1049. Zody, M. C. et al., 2006. Analysis of the DNA sequence and duplication history of human chromosome 15. Nature, Volume 440, pp. 671-675.