❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
Genomics: Organization of Genome, Strategies of Genome Sequencing, Model Plant Genome Project, Functional Analysis of Genes
1. Genomics: Organization of Genome, Strategies of
Genome Sequencing, Model Plant Genome Project,
Functional Analysis of Genes
Promila Sheoran
PhD Biotechnology
GJU S&T Hisar
2. Genome Organization
•The word “genome,” coined by German botanist Hans Winkler
in 1920, was derived simply by combining gene and the final
syllable of chromosome.
• If not specified, “genome” usually refers to the nuclear
genome!
•An organism’s genome is defined as the complete haploid
genetic complement of a typical cell.
• The genetic content of the organelles in the cell is not
considered part of the nuclear genome.
• In diploid organisms, sequence variations exist between the
two copies of each chromosome present in a cell.
•The genome is the ultimate source of information about an
organism.
3. Continue…
•The number of genomes sequenced in their entirety is
now in the thousands and includes organisms ranging from
bacteria to mammals.
•The first complete genome to be sequenced was that of
the bacterium Haemophilus influenzae, in 1995.
•The first eukaryotic genome sequence, that of the yeast
Saccharomyces cerevisiae, followed in 1996.
• The genome sequence for the bacterium Escherichia coli
became available in 1997 .
4. Hierarchy of gene organization
Gene – single unit of genetic function
Operon – genes transcribed in single transcript
Regulon – genes controlled by same
regulator
Modulon – genes modulated by
same stimulus
Element – plasmid, chrom-
osome,phage
Genome
** order of ascending
complexity
5. Prokaryotes and Eukaryotes genome
Prokaryotes Eukaryotes
Single cell Single or multi cell
No nucleus Nucleus
One piece of circular DNA Chromosomes
No mRNA post transcriptional
modification
Exons/Introns splicing
6. Prokaryotic Genome Organization
Prokaryotes
The genome of E. coli contains 4X106
base pairs
> 90% of DNA encode protein
Lacks a membrane-bound nucleus.
Circular DNA and supercoiled
domain
Histones not present
7. o Prokaryotic genomes generally contain
one large circular piece of DNA
referred to as a "chromosome" (not a
true chromosome in the eukaryotic
sense).
o Some bacteria have linear
"chromosomes".
o Many bacteria have small circular DNA
structures called plasmids which can
be swapped between neighbors and
across bacterial species.
Continue…
8. o The term plasmid was first introduced
by the American molecular biologist
Joshua Lederberg in 1952.
o A plasmid is separate from, and can
replicate independently of, the
chromosomal DNA.
o Plasmid size varies from 1 to over 1,000
(kbp).
Plasmid
9. Eukaryotic genome organization
More about the nuclear genome:
• Multiple linear chromosomes, 5000 to 50000 genes
• Mono-cistronic transcription units
• Discontinuous coding regions (introns and exons)
• Large amounts of non-coding DNA
• Transcription and translation take place in different compartments
• Variety of RNA genes: rRNA, tRNA, snRNA (small nuclear), sno (small
nucleolar), microRNAs, etc.
• Often diploid genomes and obligatory sexual reproduction
• Standard mechanism of recombination: meiosis
• Multiple genomes: nuclear, plastid genome, mitochondria, chloroplasts
• Plastid genomes resemble prokaryotic genomes
10. EUKARYOTIC GENOME
‘The nucleus is heart of the cell, which serves as the main distinguishing
feature of the eukaryotic cells. It is an organelle submerged in its sea of
turbulent cytoplasm which has the genetic information encoding the past history
and future prospects of the cell. Nucleus contains many thread like coiled
structures which remain suspended in the nucleoplasm which are known as
chromatin substance’
Chromatin is the complex combination of DNA and proteins that makes up
chromosomes.
The major proteins involved in chromatin are histone proteins; although many
other chromosomal proteins have prominent roles too.
.
The functions of chromatin is to package DNA into smaller volume to fit in the
cell, to strengthen the DNA to allow mitosis and meiosis and to serve as a
mechanism to control gene expression and DNA replication.
11. ORGANIZATION OF CHROMATIN
In resting non-dividing eukaryotic cells, the genome is in the form
of nucleoprotein-complex- the chromatin.
(randomly dispersed in the nuclear matrix as interwoven network of fine chromatin threads)
The information stored in DNA is organized, replicated and read with the help
of a variety of DNA-binding proteins:
Structural Proteins- Histones(Packing proteins):
Main structural proteins found in eukaryotic cells
Low molecular weight basic proteins with high proportion of
positively charged amino acids,
Bound to DNA along most of its length,
The positive charge helps histones to bind to DNA and play a
crucial role in packing of long DNA molecules.
Functional Proteins- Non- Histones:
Associated with gene regulation and other functions of chromatin.
12. Hierarchy of Chromatin Organization in the Cell Nucleus:
Nuclear Matrix Associated Chromatin Loops
13. Next Generation Sequencing
• DNA sequencing is the process of determining the precise
order of nucleotides within a DNA molecule
• Refers to non-Sanger-based high-throughput DNA sequencing
technologies
17. 1. Solid-phase amplification can produce 100-200 million spatially separated
clusters, providing free ends to which a universal sequencing primer can be
hybridized to initiate the NGS reaction
18.
19.
20.
21. 454 Sequencing
• Emulsion-based sample preparation (emPCR)
• Pyrosequencing: non-electrophoretic, bioluminescence
method that measures the release of inorganic
pyrophosphate by proportionally converting it into visible light
using a series of enzymatic reaction
27. Sequence Assembly
• Sequence assembly refers to aligning and merging fragments
of a much longer DNA sequence in order to reconstruct the
original sequence.
• First sequence assemblers began to appear in the late 1980s
and early 1990s
28. Why We Need genome assemblers
• Terabytes of sequencing data which need processing
on computing clusters
• Identical and nearly identical sequences increase the time and
space complexity of algorithms exponentially;
• Errors in the fragments from the sequencing instruments
29. Basic Principles Of Assembly
• Sequence and quality data are read and the reads are
cleaned.
• Overlaps are detected between reads. False overlaps,
duplicate reads, chimeric reads and reads with self-matches
are also identified
• The reads are grouped to form a contig layout of the finished
sequence.
• A multiple sequence alignment of the reads is performed,
and a consensus sequence is constructed for each contig
layout
• Possible sites of mis-assembly are identified by combining
manual inspection with quality value validation.
31. Mapping Assembly
• Assembles reads against an existing backbone sequence,
building a sequence that is similar but not necessarily
identical to the backbone sequence
• Compared to de novo assembly, the mapping of resequenced
reads to a template genome is a computationally easier
problem
• Use seeding techniques
• Seeds of fixed length allow for not more than one or two
mismatches. In addition, the capability to detect insertions
and deletions is very limited and most programs can only
detect indels in subsequent alignment runs
32. Tools for Mapping Assembly
• MAQ-Particularly designed for Illumina
• SOAP-program for efficient gapped and ungapped alignment
of short oligonucleotides onto reference sequences
• SHRiMP -Developed with Applied Biosystem
• SOCS - Aligns SOLiD data
• Eland -Efficient Large-Scale Alignment of Nucleotide
Databases
• GMAP- Genomic Mapping and Alignment Program for mRNA
and EST Sequences
34. De-novo assembly
• Assembles short reads to create full-length sequences.
• De novo assembly software must deal with sequencing errors,
repeat structures, and the computational complexity of
processing large volumes of data.
35. De-novo assembly tools
• ABySS - Assembly By Short Sequences designed for very short
reads
• ALLPATHS-De novo assembly of whole-genome shotgun
microreads
• Velvet -designed for short read sequencing technologies
• Edena- Exact DE Novo Assembler
• MIRA2- Mimicking Intelligent Read Assembly is able to
perform true hybrid de-novo assembly
40. Arabidopsis thaliana genome project
Arabidopsis: The Model Plant
• Relative genetic simplicity
• Fast life cycle
• Susceptibility to manipulation through genetic engineering
• Convenience and abundance
• Basic similarities to other crops
41. Arabidopsis genome
• Contains about 125 Mb of sequence
• Contains 25,500 genes
• 5 chromosomes
• Has 35% unique genes
42. Arabidopsis Genome Initiative (AGI)
• Collaboration of the U.S. Department of Energy and the U.S.
Department of Agriculture, The European Union, the
Government of France, and the Chiba Prefectural Government
in Japan
• August 1996- National Science Foundation (NSF) in Arlington,
VA
43. Major Highlights of Genome Project
•1990- Arabidosis genome project initiated
•1995 standard bac and p1 libraries constructed
•1996- Arabidopsis genome initiative organized
•1997-physical maps of all chromosomes completed
•1999- chromosomes 2 and 4 sequenced
•2000- completion of genome sequence
44. Applications
• Understanding Photosensitivity
• Creating Healthier Edible
• Manufacturing Biodegradable Plastics.
• Making Vegetables and Fruits Cheaper and Hardier
• Improving Erosion Resistance
• Understanding How Plants Flower
45. Rice Genome Project
Rice genome
• Smallest among grass genomes (Wheat, oat, rye, Barley, corn)
• Size: 430 Mbp (3.3 X Arabidopsis)
• 12 chromosomes
• Approximately 62,435 genes
• Repetitive elements: Most in intergenic regions versus in
introns in humans
46. IRGSP (International Rice Genome Sequencing
Project)
• Established in 1997
• Comprised of ten members: Japan, the United States of
America, China, Taiwan, Korea, India, Thailand, France, Brazil,
and the United Kingdom
• IRGSP adopts the clone-by-clone shotgun sequencing strategy
47. Milestones
• 1997- sequencing of rice genome was initiated as an international
collaboration among 10 countries
• 1998- IRGSP (International Rice Genome Sequencing Project) was
launched under the coordination of the Rice Genome Project (RGP) of
genome
• 2000- Monsanto Co produced a draft sequence of BAC contigs covering
260 Mb of rice geome; 95% of rice genes were identified
• 2001- syngenta produced a draft sequence and identified 32000 to 50000
genes with 99.8% accuracy and identified 99% of rice genes
• 2002- IRGSP finished high quality draft sequence (clone-by-clone
approach) with a sequence length excluding overlaps , of 366 Mb
corresponding to ~92% of rice genome
• 2004- IRGSP produce the high-quality sequence of entire rice genome with
99.99% accuracy and without any sequence gap
48. Applications
• First crop plant to be sequenced, therefore have a great
impact in agriculture
• Useful in understanding the genome of other crops in the
grass family including corn, wheat, barley, rye and sorghum
• Identification of agronomically important traits - genes that
affect growth habit to promote yield and photoperiod genes
to extend the range of elite cultivars.
49. Tomato Genome Project
• Tomato (Solanum Lycopersicon)
– economically important crop worldwide,
– intensively investigated and
– model system for genetic studies in plants.
• Characteristics:
– Simple diploid genetics: 12 chromosome pairs and
950 Mb genome size.
– Short generation time
– Routine transformation technology
– Rich genetic and genomic resources.
50. International Tomato Genome Sequencing
Project
• Started in 2004
• Participants were Korea, China, the United Kingdom, India,
the Netherlands, France, Japan, Spain, Italy and the United
States
• The initial approach was to sequence only the euchromatic
sequence using a BAC-by-BAC approach
• In 2009, a complementary whole-genome shotgun approach
was initiated and finally sequenced in 2012.
51. Applications
• Tomato as a reference genome sequence
• Understanding Diversification & Adaptation
• Exploring the Role of Natural Diversity in the Genetic
Improvement of Crops
52. Chickpea Genome Project
• Second most widely grown legume crop after soybean
• Approximately 28,269 genes of chickpea were identified
• Approximately 738 Mb genomic sequence
• Half (49.41%) of the chickpea genome is composed of
transposable elements and unclassified repeats
53. International Chickpea Genome Sequencing
Consortium
• Role of ICGSC:
1. To ensure data and information on the chickpea is
readily available to all researchers,
2. To help avoid duplication of research efforts,
3. To provide a framework for accessing national and
international collaboration,
4. To help keep chickpea research at the cutting edge of
genetic research.
54. Applications
• The sequencing would help reduce the time to breed new
chick pea varieties as plant breeders would now have access
to genes with the required traits.
• The availability of these genome sequences facilitate de novo
assembly of the genomes of other important but less-studied
legume crops.
55. Poplar genome project
• First tree DNA to be sequenced because of relatively compact
genetic complement
• Genome sequence was published in 2006.
• Third plant genome to be published
• Contains a whole genome duplication
• Includes ~370 megabases of sequence
• 19 chromosomes
• 41,377 protein coding genes
56. International Populus Genome Consortium
Goals
• Examine the suite of genetic resources in Populus that are
currently available to the scientific community,
• Integrate genomics with physiology and ecology in an effort to
understand and manipulate tree growth, development and
function
• Develop the ability to attain predictive understanding of tree
growth, development, and complex function.
57. Applications
• Offers the opportunity and modify to study genes related to
commercial important traits
• Opportunity to better understand the distribution of genes
across the landscape
• Poplar genome project covers the promise and possibility of
uncovering and understanding mechanisms uniquely
associated with perennial woody plant growth, development
and ecology.
• Able to address issues related to interpret annual cycling of
nutrients, water movement up dozens of meters in height,
perennial crown development and wood formation.
58. Function analysis of genes
Different tools
1. Virus-induced gene silencing (VIGS)
2. CRES-T
3. RNA Interference
59. Virus-induced gene silencing (VIGS)
• Effective strategy for rapid functional analysis of genes in
plant tissues
• Elegant tool for functional characterization of genes
associated with abiotic stress response
• VIGS is rapid (3–4 weeks from infection to silencing)
• Does not require development of stable transformants
• Allows characterization of phenotypes that might be lethal in
stable lines
• Offers the potential to silence either individual or multiple
members of a gene family
Example
• Knockdown of TaNAC1 with barley stripe mosaic virus-induced
gene silencing (BSMV-VIGS) enhanced stripe rust resistance
60. CRES-T
• Chimeric REpressor Gene-Silencing Technology (CRES-T)
• Chimeric repressor produced by fusion of a transcription
factor to the plant-specific repression domain (SRDX)
suppresses the target genes of a transcription factor
• Useful tool for functional analysis of redundant plant
transcription factors and the manipulation of plant traits
61. About RNAi
• RNA interference (RNAi) is a system within living cells that
takes part in controlling which genes are active and how
active they are. Two types of small RNA molecules –
microRNA (miRNA) and small interfering RNA (siRNA) – are
central to RNA interference.
• RNAs are the direct products of genes, and these small RNAs
can bind to other specific RNAs (mRNA) and either increase or
decrease their activity, for example by preventing a
messenger RNA from producing a protein.
• RNA interference has an important role in defending cells
against parasitic genes – viruses and transposons – but also in
directing development as well as gene expression in general.
62. The Mechanism of RNA Interference
The long dsRNAs enter a cellular pathway that is
commonly referred to as the RNA interference
(RNAi) pathway.
First, the dsRNAs get processed into 20-25
nucleotide (nt) small interfering RNAs (siRNAs) by
an RNase III-like enzyme called Dicer.
Then, the siRNAs assemble into endoribonuclease-
containing complexes known as RNA-induced
silencing complexes (RISCs), unwinding in the
process.
The siRNA strands subsequently guide the RISCs to
complementary RNA molecules, where they cleave
and destroy the cognate RNA
Cleavage of cognate RNA takes place near the
middle of the region bound by the siRNA strand.
63. Approaches for candidate gene discovery
Approaches for
candidate gene
discovery
Traditional
candidate gene
approach
Position
dependent
strategy
Comparative
genomics
strategy
Function
dependent
strategy
Combined
strategy
Digital
candidate gene
approach
64. Traditional candidate gene approach
Position dependent strategy
• Identification of candidate gene is based on the physical
linkage information in a QTL-identified chromosomal segment
• Example – position of QTLs controlling field blast resistance in
rice
• Isolation of Arabidopsis AB13 gene
65. Comparative genomics strategy
• Includes comparative functional genomics strategy and
comparative structural genomics strategy
• Candidate genes may be functionally conserved or structurally
homologous genes
66. Function dependent strategy
• Results in the functional candidate gene approach, in which a
putative candidate gene is the one that could be statistically
detected from the genes controlling large components of
inheritable gene expression variation.
• Example- identification of new disease resistance genes in
Tobacco
67. Combined strategy
• Combines at least two strategies
• Genetical genomic approach originating from function-
dependent strategy provides powerful means to identify
candidate genes.
• Example- selection of candidate genes for grape
proanthocyanidin pathway
68. Digital candidate gene approach
(DigiCGA)
• Novel web resource-based candidate gene identification approach.
• DigiCGA can be defined as an approach that objectively extract,
filter, (re)assemble, or (re)analyze all possible resources available
derived from the public web databases mainly in accordance with
the principles of biological ontology and complex statistical
methods to make computational identification of the potential
candidate genes of specific interest.
• A combination of RNA-seq and DGE analysis based on the next
generation sequencing technology was shown to be a powerful
method for identifying candidate genes encoding enzymes
responsible for the biosynthesis of novel secondary metabolites in a
non-model plant. Seven CYP450s and five UDPGs were selected as
potential candidates involved in mogrosides biosynthesis. The
transcriptome data from this study provides an important resource
for understanding the formation of major bioactive constituents in
the fruit extract from S. grosvenorii.
69. Deciphering the function of gene in plant
secondary metabolism
• To complete the metabolic map for an entire class of
compounds, it is essential to identify gene-metabolite
correlations of a metabolic pathway
• Effective approach to predict genes involved in the same
metabolic pathway is the co-expression analysis.
• Co-expression analysis can be conducted using datasets from
RNA-seq or microarray obtained in expressly designed
experiments or also by comparing already existing data
publicly available
70.
71. Example
• Comparative coexpression analysis between tomato and
potato coupled with chemical profiling revealed an array of 10
genes that partake in SGA biosynthesis. Following systematic
functional analysis, a revised SGA biosynthetic pathway
starting from cholesterol up to the tetrasaccharide moiety
linked to the tomato SGA aglycone. Silencing GLYCOALKALOID
METABOLISM 4 prevented accumulation of SGAs in potato
tubers and tomato fruit. This may provide a means for
removal of unsafe, antinutritional substances present in these
widely used food crops.
72. Gene Inactivation
•The ability to manipulate gene expression levels has been
essential to the study of gene function and biological
processes.
• Classically, whole body deletions of genes were generated
via homologous recombination.
• The last few years have seen a revolution in the
approaches scientists use to inactivate gene expression,
such as the development of highly efficient ribonucleic acid
interference (RNAi) delivery systems, Gene knock out and
anti-sense.
73. Gene Knockout
•A gene knockout (abbreviation: KO) is a genetic technique in
which one of an organism's genes is made inoperative
("knocked out" of the organism).
•Also known as knockout organisms or simply knockouts, they
are used in learning about a gene that has been sequenced,
but which has an unknown or incompletely known function.
•Researchers draw inferences from the difference between
the knockout organism and normal individuals.
74. KNOCK OUT MICE
• A mouse in which a gene has been deleted/mutated
(gene is inactivated)
• Specific gene is targeted
• The loss of gene activity often causes changes in a
mouse's phenotype and thus provides valuable
information on the function of the gene.
75. Researchers who developed the technology for the
creation of knockout mice won Nobel Prize in the
year 2007
• The Nobel Prize in Physiology or Medicine 2007 was awarded
jointly to Mario R. Capecchi, Sir Martin J. Evans and Oliver
Smithies "for their discoveries of principles for introducing
specific gene modifications in mice by the use of embryonic
stem cells".
76. GENERATION OF KNOCKOUT MICE BY
HOMOLOGOUS RECOMBINATION
• Creating a knockout construct
• Introduce the knockout construct into mouse embryonic stem
cells (ES) in culture
• Screen ES cells and select those whose DNA includes the new
genes
• Implant selected cells into normal mouse embryos , making
“chimeras”
• Implant chimeric embryos in pseudopregnant females
• Females give birth to chimeric offsprings, which are
subsequently bred to verify transmission of the new gene,
producing a mutant mouse line
77. Knockout construct:
• The gene to be knocked out is isolated from a mouse gene
library. Then a new DNA sequence is engineered which is very
similar to the original gene and its immediate neighbour
sequence, except that it is changed sufficiently to make the
gene inoperable. Usually, the new sequence is also given
a marker gene, a gene that normal mice don't have and that
confers resistance to a certain toxic agent or that produces an
observable change (e.g. colour or fluorescence).
78.
79.
80. Knockout Mice to study genetic diseases
• Knockout mice make good model systems for investigating the
nature of genetic diseases and the efficacy of different types
of treatment and for developing effective gene therapies to
cure these often devastating diseases
• For instance, the knockout mice for CFTR gene show
symptoms similar to those of humans with cystic fibrosis
81. Drawbacks of knockout mice
• About 15% of gene knockouts are developmentally lethal and
therefore cannot grow into adult mice. Thus it becomes
difficult to determine the gene function in adults.
• Many genes that participate in interesting gene pathways are
essential for either mouse development, viability or fertility.
Therefore , a traditional knock out of the gene can never lead
to the establishment of knockout mouse strain for analysis
82. Antisense RNA-Technology
•Antisense RNA is a single-stranded RNA that is
complementary to a messenger RNA (mRNA) strand
transcribed within a cell.
•They are introduced in a cell to inhibit the translation
machinery by base pairing with the sense RNA and
activating the RNase H, to develop a particular novel
transgenic.
mRNA sequence(sense)
Antisense RNA UACUUUGGGCAC
AUGAAACCCGUG
83. How it Differ from RNAi
•The intended effect of the both technique is same but the
processing is a little bit different in both.
•Antisense technology degrade the mRNA by RNaseH while
RNAi employed enzyme Dicer for degradation.
•RNAi are twice larger than antisense oligonucleotide.
84. Nature’s Antisense System
•There is a HOK (host killing)/SOK(suppress killing) system of
postsegregational killing employed by R1 plasmid in E.Coli.
•When E.Coli cell undergo cell division the daughter cell
inherit the hok toxin gene and sok gene from the parents but
due to the short half life the sok gets degraded quickly.
•So in a normal cell hok protein get over expressed and cell
die. But if the cell inherit a R1 plasmid which has a sok gene
and sok specific promoter to transcribe sok gene then sok
over expressed the hok and by base pairing with hok, it inhibit
the translation of hok protein
85.
86. Flavr-Savr
•Flavr-Savr the first FDA approved GM food developed by
Calgene in 1992.
•Licensed in may 17, 1994.
•Ripening of tomato causes production of an enzyme
Polygalactouronase in a gradual increasing level, which is
responsible for softening of the tomato and which becomes
the cause of rottening.
•So, tomato never last for few extra days in ripening condition
without rottening.
•Calgene introduced a gene in plant which synthesize a
complementary mRNA to PG gene and inhibiting the synthesis
of PG enzyme.
87. INDIAN CONTRIBUTION
•NIPGR, (National institute of Plant Genome Research) in
feb,2010 has developed a tomato by antisense technology
which can last long upto 45 days. So no need to pick up the
green tomatoes and forcefully ripen them with ethylene and no
longer to take tension whether they are going to reach the
market shelves or no need hurry up in your kitchen before they
go meshy.
•NIPGR scientist had silenced the expression of two important
gene which are responsible for loss in firmness and textures
during ripening.
88. The two gene silenced are
alpha-man and beta-hex of
Glycosyl hydrolase, a kind of
enzyme that breaks the
chemical bond holding a sugar
to either another sugar or
some other molecule, like a
protein.
89. Challenges to antisense technology…
1. One major challenge to antisense technology (and RNAi) is
the difficulty of getting it into the body. Delivery of the
treatment to the brain, for use in diseases like HD, is
especially challenging because it must cross the blood-brain
barrier.
2. The second major challenge to antisense technology is its
inevitable toxic effects. Although antisense technology is
engineered to be very specific, it can still cause unintended
damage because it would regulate both the mutant and
normal Huntington alleles.