This document discusses genetic engineering tools and techniques. It explains that genetic engineers use restriction enzymes, plasmids, and promoters to insert genes from one organism into another. The inserted genes are expressed to produce proteins like insulin. Gel electrophoresis and fluorescent markers are described as tools to analyze DNA and identify transformed bacteria. The overall aim is to manipulate genes between species to study and modify organisms.
2. Learning Outcomes
Here’s what we should be able to:
Explain the principles of Genetic Technology.
Describe the tools and techniques available to the genetic engineer
Describe the use of the polymerase chain reaction (PCR) and get
electrophoresis.
Explain the use of microarrays.
Outline the use of bioinformatics in sequencing genome.
3. Introduction to
Genetic Engineering 03
04
01
02 Tools for gene
technologist
Restriction Enzymes
Vectors
• Inserting a gene into a
plasmid vector
• Getting the plasmids
into bacteria
• Identifying bacteria
with recombinant DNA
• Insulin Production
• Other Genetic Markers
Promoters
Gel Electrophoresis
Electrophoresis of proteins
Electrophoresis of DNA
Polymerase Chain Reaction
Microarrays
• Table of contents
Bioinformatics
4. Genetic engineering involves the manipulation of
naturally occurring processes and enzymes. The
aim of genetic engineering is to remove a gene (or
genes) from one organism and transfer it into
another so that the gene is expressed in its new
host. The DNA that has been altered by this
process and which now contains lengths of
nucleotides from two different organisms is called
recombinant DNA (rDNA). The organism which
now expresses the new gene or genes is known
as a transgenic organism or a genetically modified
organism (GMO).
Introduction to Genetic Engineering
5. • Examples of Genetic Modifications:
1. The gene for human insulin has been inserted into bacteria which then produce human insulin
which can be collected and purified for medical use for diabetics.
2. Crop plants, such as wheat and maize, have been genetically modified to contain a gene from a
bacterium that produces a poison that kills insects, making them resistant to insect pests such as
caterpillars.
3. Crop plants have also been genetically modified to make them resistant to certain herbicides
(chemicals that kill plants), meaning that when the herbicide is sprayed on the crop it only kills
weeds and does not affect the crop plant.
4. Some crops have been genetically modified to produce additional vitamins, e.g. ‘golden rice’
contains genes from another plant and a bacterium which make the rice grains produce a chemical
that is turned into vitamin A in the human body, which could help prevent deficiency diseases in
certain areas of the world.
6. Genetic engineering provides a way of
overcoming barriers to gene transfer between
species. Indeed the genes are often taken from
an organism in a different kingdom, such as a
bacterial gene inserted into a plant or a human
gene inserted into a bacterium. Unlike selective
breeding, where whole sets of genes are
involved, genetic engineering often results in the
transfer of a single gene.
Let’s look at more detail.
7. • The cells that have the new gene are
identified and cloned.
• The vector takes the gene into the cells.
• The gene that is required is
identified. It may be cut from a
chromosome, made from mRNA
by reverse transcription or
synthesized from nucleotides.
• Multiple copies of the gene are
made using the technique known
as the polymerase chain reaction
(PCR).
• The gene is inserted into a vector
which delivers the gene to the
cells of the organism. Examples
of vectors are plasmids, viruses
and liposomes.
• An overview of gene transfer
8.
9.
10.
11.
12.
13. To perform these steps, the genetic engineer
needs a ‘toolkit’ consisting of:
■ enzymes, such as restriction endonucleases, ligase
and reverse transcriptase
■ vectors, including plasmids and viruses
■ genes coding for easily identifiable substances that can
be used as markers.
14. Restriction enzymes
Restriction endonucleases are a class of enzymes
from bacteria which recognize and break down the
DNA of invading viruses known as bacteriophages
(phages for short). Bacteria make enzymes that cut
phage DNA into smaller pieces. These enzymes cut
the sugar–phosphate backbone of DNA at specific
places within the molecule. This is why they are known
as endonucleases (‘endo’ means within). Their role in
bacteria is to restrict a viral infection, hence the name
restriction endonuclease or restriction enzyme.
• Tools for the gene technologist
15. Each restriction enzyme binds to a specific target site on DNA and cuts at that site. Bacterial DNA is protected
from such an attack either by chemical markers or by not having the target sites. These target sites, or restriction
sites, are specific sequences of bases. For example, the restriction enzyme called BamHI always cuts DNA where
there is a GGATCC sequence on one strand and its complementary sequence, CCTAGG, on the other. You will
notice that this sequence reads the same in both directions: it is a palindrome. Many, but not all, restriction sites
are palindromic. Restriction enzymes either cut straight across the sugar-phosphate backbone to give blunt ends
or they cut in a staggered fashion to give sticky ends. Sticky ends are short lengths of unpaired bases. They are
known as sticky ends because they can easily form hydrogen bonds with complementary sequences. of bases on
other pieces of DNA cut with the same restriction enzyme. When long pieces of DNA are cut with a restriction
enzyme, there will be a mixture of different lengths. To find the specific piece of DNA required involves separating
the lengths of DNA using gel electrophoresis and using gene probes, which are described further.
16.
17. Restriction enzymes are named by an abbreviation which
indicates their origin . Roman numbers are added to distinguish
different enzymes from the same source. For example, EcoRI comes
from Escherichia coli (strain RY13),and was the first to be identified
from this source. Now that many proteins have been sequenced, itis
possible to use the genetic code to synthesize
DNA artificially from nucleotides rather than cutting
it out of chromosomal DNA or making it by reverse
transcription. Genes, and even complete genomes, can be
made directly from DNA nucleotides without the need
for template DNA. Scientists can do this by choosing
codons for the amino acid sequence that they need. The
sequence of nucleotides is held in a computer that directs
the synthesis of short fragments of DNA. These fragments
are then joined together to make a longer sequence of
nucleotides that can be inserted into plasmids for use
in genetic engineering. This method is used to generate
novel genes that are used, for example, in the synthesis
of vaccines and they have even been used to
produce the genomes of bacteria consisting of a million
base pairs.
18.
19. • Vectors
A vector is a DNA molecule (often plasmid or virus) that is used as a vehicle to carry a particular
DNA segment into a host cell as part of a cloning or recombinant DNA technique. Plasmids are the
most commonly used vector but viruses and liposomes (a small vesicle with a phospholipid layer)
can also be used to transfer genes.
They contain the same genetic code as the organisms we are taking the genes from, meaning they can
easily ‘read’ it and produce the same proteins.
There are no ethical concerns over their manipulation and growth (unlike if animals were used, as they can
feel pain and distress)
The presence of plasmids in bacteria, separate from the main bacterial chromosome, makes them easy to
remove and manipulate to insert genes into them and then place back inside the bacterial cells.
20. • Inserting a gene into a plsmid vector
In order to get a new gene into a recipient cell, a go-between called a vector often has
to be used. One type of vector is a plasmid . These are small, circular pieces of
double-stranded DNA. Plasmids occur naturally in bacteria and often contain genes
for antibiotic resistance. They can be exchanged between bacteria – even between
different species of bacteria. If a genetic engineer inserts a piece of DNA into a
plasmid, then the plasmid can be used to take the DNA into a bacterial cell.
To get the plasmids, the bacteria containing them are treated with enzymes to break down their cell walls. The
‘naked’ bacteria are then spun at high speed in a centrifuge, so that the relatively large bacterial chromosomes
are separated from the much smaller plasmids.
The circular DNA of the plasmid is cut open using a restriction enzyme. The same enzyme as the one
used to cut out the gene should be used, so that the sticky ends are complementary. If a restriction
enzyme is used that gives blunt ends, then sticky ends need to be attached to both the gene and the
plasmid DNA.
21.
22. The opened plasmids and the lengths of DNA are mixed together. Some of the plasmid sticky ends pair up with
the sticky ends on the new gene. The enzyme DNA ligase is used to link together the sugar–phosphate
backbones of the DNA molecule and the plasmid, producing a closed circle of double-stranded DNA containing
the new gene. This is now recombinant DNA.
Bacterial plasmids can be modified to produce good vectors. Plasmids can also be made artificially. For example,
the pUC group of plasmids have:
• a low molecular mass, so they are readily taken up by bacteria
• an origin of replication so they can be copied
• several single target sites for different restriction enzymes in a short length of DNA called a polylinker
• one or more marker genes, allowing identification of cells that have taken up the plasmid.
23. Plasmids are not the only type of vector that can be used.
Viruses can also be used as vectors. A third group of vectors are
liposomes, which are tiny spheres of lipid containing the DNA.
24.
25.
26.
27. • Getting the plasmids into bacteria
The next step in the process is to get bacteria to take up the
plasmids. The bacteria are treated by putting them into a
solution with a high concentration of calcium ions, then cooled
and given a heat shock to increase the chances of plasmids
passing through the cell surface membrane. A small proportion
of the bacteria, perhaps 1%, take up plasmids with the gene,
and are said to be transformed. The rest either take up
plasmids that have closed without incorporating a gene or do
not take up any plasmids at all.
28. • Identifying Bacteria with recombinant DNA
It is important to identify which bacteria have been successfully
transformed so that they can be used to make the gene product. This
used to be done by spreading the bacteria on agar plates each
containing an antibiotic. So if, for example, the insulin gene had been
inserted into the plasmid at a point in the gene for tetracycline resistance
in pBR322, then any bacteria which had taken up plasmids with the
recombinant DNA would not be able to grow on agar containing
tetracycline. However, this technique has fallen out of favor, and has
largely been replaced by simpler methods of identifying transformed
bacteria.
29. DNA polymerase in bacteria copies the plasmids; the
bacteria then divide by binary fission so that each
daughter cell has several copies of the plasmid. The
bacteria transcribe the new gene and may translate it to
give the required gene product, such as insulin.
31. There were problems in locating and isolating the gene coding for human
insulin from all of the rest of the DNA in a human cell. Instead of cutting
out the gene from the DNA in the relevant chromosome, researchers
extracted mRNA for insulin from pancreatic β cells, which are the only
cells to express the insulin gene. These cells contain large quantities of
mRNA for insulin as they are its only source in the body. The mRNA was
then incubated with the enzyme reverse transcriptase which comes from
the group of viruses called retroviruses . As its name suggests, this
enzyme reverses transcription, using mRNA as a template to make single
stranded DNA. These single-stranded DNA molecules were then
converted to double-stranded DNA molecules using DNA polymerase to
assemble nucleotides to make the complementary strand. The genetic
engineers now had insulin genes that they could insert into plasmids to
transform the bacterium Escherichia coli.
32. The main advantage of this form of
insulin is that there is now a reliable
supply available to meet the increasing
demand. Supplies are not dependent on
factors such as availability through the
meat trade.
33.
34.
35. Genetic engineers have changed the nucleotide
sequence of the insulin gene to give molecules with
different amino acid sequences. These insulin
analogues have different properties; for example,
they can either act faster than animal insulin (useful
for taking immediately after a meal) or more slowly
over a period of between 8 and 24 hours to give a
background blood concentration of insulin. Many
diabetics take both these forms of recombinant
insulin at the same time.
36. • Other Genetic Markers
There is some concern about using antibiotic resistance
genes as markers. Could the antibiotic resistance genes
spread to other bacteria, producing strains of pathogenic
(disease-causing) bacteria that we could not kill with
antibiotics? In insulin production, the risk is probably very
small, because the genetically modified bacteria are only
grown in fermenters and not released into the wild. But
now there are many different kinds of genetically
modified bacteria around, some of which are used in
situations in which their genes might be passed on to
other bacteria. If these bacteria were pathogens, then we
might end up with diseases that are untreatable.
37. Because of the risk of creating pathogenic antibiotic resistant bacteria,
there is now much less use of antibiotic resistance genes in this way, and
other ways have been developed in which the successfully transformed
bacteria can be identified. One method uses enzymes that produce
fluorescent substances. For example, enzymes obtained from jellyfish
make a protein called GFP (green fluorescent protein) that fluoresces
bright green in ultraviolet light. The gene for the enzyme is inserted into
the plasmids. So all that needs to be done to identify the bacteria that
have taken up the plasmid is to shine ultraviolet light onto them. The ones
that glow green are the genetically modified ones. The same marker gene
can be used in a range of organisms.
38.
39. Another marker is the enzyme
β-glucuronidase
(known as GUS for short), which
originates from E.coli.
Any transformed cell that contains
this enzyme, when
incubated with some specific
colorless or non- fluorescent
substrates, can transform them into
colored or fluorescent products.
This is especially useful in detecting
the activity of inserted genes in
plants, such as the sundew.
40.
41.
42.
43. Bacteria contain many different genes, which make many different
proteins. But not all these genes are switched on at once. The
bacteria make only the proteins that are required in the conditions in
which they are growing. For example, E. coli bacteria make the
enzyme β-galactosidase only when they are growing in a medium
containing lactose and there is no glucose available.
• PROMOTERS
44. The expression of genes, such as those in the lac
operon, is controlled by a promoter – the region of DNA
to which RNA polymerase binds as it starts transcription.
If we want the gene that we are going to insert into a
bacterium to be expressed, then we also have to insert
an appropriate promoter. When bacteria were first
transformed to produce insulin, the insulin gene was
inserted next to the β-galactosidase gene so they shared
a promoter. The promoter switched on the gene when
the bacterium needed to metabolize lactose. So, if the
bacteria were grown in a medium containing lactose but
no glucose, they synthesized both β-galactosidase and
human insulin.
45. The promoter not only allows RNA polymerase to
bind to DNA but also ensures that it recognizes which
of the two DNA strands is the template strand. Within
the sequence of nucleotides in the promoter region is the
transcription start point – the first nucleotide of the gene
to be transcribed. In this way, the promoter can be said
to control the expression of a gene and can ensure a high
level of gene expression. In eukaryotes, various proteins
known as transcription factors are also required to bind
to the promoter region or to RNA polymerase before
transcription can begin.
46.
47.
48.
49. Gel electrophoresis is a technique used widely to separate different molecules
and in the analysis of DNA, RNA and proteins. This technique involves placing a
mixture of molecules into wells cut into agarose gel and applying an electric field.
During electrophoresis the molecules are separated within the gel according to their size
/mass and their net (overall) charge in response to the electric field.
The separation occurs because:
net (overall) charge – negatively charged molecules move towards the anode (+) and
positively charged molecules move towards the cathode (–); highly charged molecules
move faster than those with less overall charge. e.g. DNA is negatively charged due to
the phosphate groups and thus when placed in an electric field the molecules move
towards the anode.
50. Size- Different sized molecules move through the gel (agarose for DNA and
polyacrylamide – PAG for proteins) at different rates. The tiny pores in the gel
result in smaller molecules moving quickly, whereas larger molecules move
slowly.
composition of the gel – common gels are polyacrylamide for proteins and
agarose for DNA; the size of the ‘pores’ within the gel determines the speed with
which proteins and fragments of DNA move.
51. Electrophoresis of Proteins
The charge on proteins is dependent on the ionization of the R groups on the
amino acid residues. Some amino acids have R groups that can be positively
charged (−NH3+) and some have R groups that can be negatively charged
(−COO−). Whether these R groups are charged or not depends on the pH.
When proteins are separated by electrophoresis, the procedure is carried out
at a constant pH using a buffer solution. Usually proteins have a net negative
charge.
52. Applying an electrical
field across
the buffer chambers
forces the
migration of protein
into and
through the gel.
53. Gel electrophoresis has been used
to separate the polypeptides
produced by different alleles of
many genes. For example,
allozymes are variant forms of
enzymes produced by different
alleles of the same gene.
54. There are also many variants of haemoglobin.
Adult haemoglobin is composed of four polypeptides:
2 α-globins and 2 β-globins. In sickle cell anaemia, a variant
of β-globin has an amino acid with a non-polar R
group instead of one with an R group that is charged.
These two variants of the β-globin can be separated by
electrophoresis because they have different net charges.
This means that haemoglobin molecules in people who
have sickle cell anaemia have a slightly lower negative
charge than normal haemoglobin and so the molecules
do not move as far through the gel as molecules of normal
haemoglobin. The test to find out whether someone carries
the sickle cell allele makes use of this difference.
55. Separation of haemoglobin by gel
electrophoresis. This analysis
was carried out on a family in
which one child has sickle cell
anaemia. Lane 1 contains
haemoglobin standards, A =
normal haemoglobin, S = sickle
cell haemoglobin; lanes 2 and 3
are the haemoglobin samples
from parents; lanes 4, 5 and 6 are
haemoglobin samples from their
children.
56.
57. Electrophoresis of DNA
DNA fragments carry a small charge thanks to
the negatively charged phosphate groups. In
DNA electrophoresis, these fragments move
through the gel towards the anode. The
smaller the fragments, the faster they move.
59. A region of DNA that is known to vary between different people is chosen. These
regions often contain variable numbers of repeated DNA sequences and are
known as variable number tandem repeats (VNTRs).Only identical twins share
all their VNTR sequences.
DNA can be extracted from almost anything that has come from a person’s body – the
root of a hair, a tiny spot of blood or semen at a crime scene, or saliva where someone
has drunk from a cup. Usually the quantity of DNA is increased by using the
polymerase chain reaction(PCR), which makes many copies of the DNA that has been
found.
The DNA is then chopped into pieces using restriction enzymes known to cleave it
close to the VNTR regions. Now the DNA is ready for electrophoresis. When the
current is turned off, the gel contains DNA fragments that have ended up in different
places. These fragments are not visible straight away.
60. To make the fragments visible, they are carefully transferred
onto absorbent paper, which is placed on top of the gel. The
paper is then heated just enough to make the two strands in
each DNA molecule separate from one another. Short
sequences of single-stranded DNA called probes are added;
they have base sequences complementary to the VNTR regions.
The probes also contain a radioactive phosphorus isotope so
when the paper is placed on an X-ray film, the radiation emitted
by the probes (which are stuck to the DNA fragments)make the
film go dark. So, we end up with a pattern of dark stripes on the
film matching the positions that the DNA fragments reached on
the agarose gel. Alternatively, the probes may be labelled with a
fluorescent stain that shows up when ultraviolet light is shone
onto them.
66. Polymerase Chain Reaction
The polymerase chain reaction, generally known as PCR, is used
in almost every application of gene technology. It is a method for
rapid production of a very large number of copies of a particular
fragment of DNA. Virtually unlimited quantities of a length of DNA
can be produced from the smallest quantity of DNA (even one
molecule).
67. First, the DNA is denatured, usually by heating it. This separates the DNA
molecule into its two strands, leaving bases exposed.
The enzyme DNA polymerase is then used to build new strands of DNA against the
exposed ones. However, DNA polymerase cannot just begin doing this with no
‘guidance’. A primer is used to begin the process. This is a short length of DNA, often
about 20 base pairs long, that has a base sequence complementary to the start of the
part of the DNA strand that is to be copied. The primer attaches to the start of the DNA
strand, and then the DNA polymerase continues to add nucleotides all along the rest of
the DNA strand.
Once the DNA has been copied, the mixture is heated again, which once more
separates the two strands in each DNA molecule, leaving them available for
copying again. Once more, the primers fix themselves to the start of each strand
of unpaired nucleotides, and DNA polymerase makes complementary copies of
them.
68. The three stages in each round of copying need different temperatures are:
Denaturation – the double-stranded DNA is heated to 95°C which breaks the
hydrogen bonds that bond the two DNA strands together.
Annealing – the temperature is decreased to between 50 - 60°C so that primers
(forward and reverse ones) can anneal to the ends of the single strands of DNA.
Elongation / Extension – the temperature is increased to 72°C for at least a
minute, as this is the optimum temperature for Taq polymerase to build the
complementary strands of DNA to produce the new identical double-stranded
DNA molecules
69.
70. We can see that theoretically this could go on
forever, making more and more copies of what
might originally have been just a tiny number of
DNA molecules. A single DNA molecule can be
used to produce literally billions of copies of itself in
just a few hours. PCR has made it possible to get
enough DNA from a tiny sample – for example, a
microscopic portion of a drop of blood left at a
crime scene.
71. Taq polymerase was the first heat-stable DNA
polymerase to be used in PCR. It was isolated from the
thermophilic bacterium, Thermus aquaticus, which is
found in hot springs in Yellowstone Park in the US. It is valuable for PCR
for two reasons. First, is not destroyed by the denaturation step, so
it does not have to be replaced during each cycle. Second,
its high optimum temperature means that the temperature
for the elongation step does not have to be dropped below
that of the annealing process, so efficiency is maximized.
PCR is now routinely used in forensic science to
amplify DNA from the smallest tissue samples left at the
scene of a crime. Many crimes have been solved with
the help of PCR together with analysis of DNA using gel
electrophoresis.
73. Microarrays are laboratory tools used to detect the expression of thousands of
genes at the same time and to identify the genes present in an organism’s
genome.
Microarrays are used in medical diagnosis and treatment (e.g. comparison between
healthy cells and diseased cells to find the characteristics of the disease),
biotechnology (e.g.. in agriculture to identify insect pests), as well as crime (forensic
analysis).
As large numbers of genes can be studied in a short period of time microarrays have
been very valuable to scientists
The microarray consists of a small (usually 2cm2) piece of glass, plastic or silicon (also
known as chips) that have probes attached to a spot (called a gene spot) in a grid
pattern. There can be 10 000 or more spots per cm2.
Probes are short lengths of single-stranded DNA (oligonucleotides) or RNA which
are synthesized to be complementary for a specific base sequence (this sequence
depends on the purpose of the microarray).
74. When a microarray is used to analyze genomes:
DNA is collected from the species going to be compared. Restriction enzymes are used to cut
the DNA into fragments.
These fragments are denatured to create single-stranded DNA molecules and these DNA
fragments are labelled using fluorescent tags (the fragments from the different sources are
tagged different colors, usually red and green).
Once these fragments are mixed together they are then allowed to hybridize with the probes
on the microarray. After a set period of time any DNA that did not hybridize with the probes is
washed off.
The microarray is then examined using ultraviolet light (which causes the tags to fluoresce) or
scanned (colors are detected by the computer and the information is analyzed and stored)
The presence of the color indicates where hybridization has occurred, as the DNA
fragment is complementary to the probe. If red and green fluorescent spots appear
then only one species of DNA has hybridized, however, if the spot is yellow then both
species have hybridized with that DNA fragment, which suggests that both species
have that gene in common.
75. A DNA microarray
as viewed with a
laser scanner. The
colors are analyzed
to show which
genes or alleles are
present.
76. When genes are being expressed or are in their active state, many copies of
mRNA are produced by transcription. The corresponding proteins are then
produced from these mRNAs during translation. Thus scientists can
indirectly, by assessing the quantity of mRNAs, determine which genes are
being expressed in the cells.
Microarrays can be used to detect whether a gene is being expressed (a
method used to research cancerous vs non-cancerous cells) by detecting the
quantity of mRNA present.
77. To compare which genes are being expressed using microarrays the following steps
occur:
o mRNA is collected from both types of cells and reverse transcriptase is used to convert mRNA
into cDNA.
o PCR may be used to increase the quantity of cDNA (this occurs for all samples to remain
proportional so a comparison can be made when analysis occurs). Fluorescent tags are added
to the cDNA. Spots on the microarray that fluoresce indicate the genes that were being
transcribed in the cell.
o The cDNA is then denatured to produce single-stranded DNA. The single-stranded DNA
molecules are allowed to hybridize with the probes on the microarray.
o When the ultraviolet light is shone on the microarray the spots that fluoresce indicate that
gene was transcribed (expressed) and the intensity of the light emitting from the spots
indicates the quantity of mRNA produced (i.e. how active the gene is). If the light being
emitted is of high intensity then many mRNA were present, while a low intensity emission
indicates few mRNA are present.
78. How to use a microarray
to compare the mRNA
molecules present in
cancerous and non-
cancerous cells. The
results identify which
genes in the cancerous
cells that are not
normally expressed are
being transcribed.
79.
80. Bioinformatics
Research into the genes that are present in different organisms and the genes that
are expressed at any one time in an organism’s life generates huge quantities of
data. As we have seen, one DNA chip alone may give 10 000 pieces of information
about the presence and absence of genes in genomes or the activity of genes
within cells.
Gene sequencing has also generated huge quantities of data. This technique
establishes the sequence of base pairs in sections of DNA. Sequencing DNA is
now a fully automated process and the genomes of many species have been
published. There is also a vast quantity of data about the primary structures,
shapes and functions of proteins.
81. Bioinformatics combines biological
data with computer technology and
statistics. It builds up databases
and allows links to be made
between them. The databases
hold gene sequences, sequences
of complete genomes, amino acid
sequences of proteins and protein
structures. Computer technology
facilitates the collection and
analysis of this mass of information
and allows access to it via the
internet.
82. There are databases that specialize in holding different
types of information; for example, on DNA sequences
and the primary structures of proteins. The information
needs to be in a form that can be searched, so soft
ware developers play an important role in developing
systems that allow this. In 2014, these databases held
over 6 × 1011 base pairs or 600 Gbp (Gigabasepairs),
equivalent to 200 human genome equivalents or huges.
A huge is 3 × 109 base pairs. Databases that hold the
coordinates required to show 3D models hold details of
over 100 000 different proteins and nucleic acids. The
quantity of data is vast and growing at an exponential
rate.
83. The database Ensembl holds data on the genomes of
eukaryotic organisms. Among others, it holds the human
genome and the genomes of zebra fish and mice that are
used a great deal in research. UniProt (universal protein
resource) holds information on the primary sequences
of proteins and the functions of many proteins, such as
enzymes. The search tool BLAST (basic local alignment
search tool) is an algorithm for comparing primary
biological sequence information, such as the primary
sequences of different proteins or the nucleotide sequences
of genes. Researchers use BLAST to find similarities
between sequences that they are studying and those
already saved in databases.
84. When a genome has been sequenced, comparisons can
be made with other known genomes. For example, the
human genome can be compared to the genomes of the
fruit fly, Drosophila, the nematode worm, Caenorhabditis,
or the malarial parasite, Plasmodium. Sequences can be
matched and degrees of similarity calculated. Similarly,
comparisons can be made between amino acid sequences
of proteins or structures of proteins. Close similarities
indicate recent common ancestry.
85. Human genes, such as those that are concerned with
development, may be found in other organisms such as
Drosophila. This makes Drosophila a useful model for
investigating the way in which such genes have their effect.
Microarrays can be used to find out when and where
genes are expressed during the development of a fruit fly.
Researchers can then access information about these genes
and the proteins that they code for. For example, they can
search databases for identical or similar base sequences in
other organisms, compare primary structures of proteins
and visualize the 3D structure of the proteins.
86. Caenorhabditis elegans was the first multicellular
organism to have its genome fully sequenced. It has
fewer than 1000 cells in its body, of which about 300
are nerve cells. It is conveniently transparent, allowing
the developmental fate of each of its cells to be mapped.
Because of its simplicity, it is used as a model organism
for studying the genetics of organ development, the
development of neurones into a nervous system and
many other areas of biology such as cell death, ageing
and behavior.
87. All the information we have about the genome
of Plasmodium is now available in databases. This
information is being used to find new methods to
control the parasite. For example, being able to read
gene sequences is providing valuable information in the
development of vaccines for malaria.
Bioinformatics is the collection, processing and
analysis of biological information and data using
computer soft ware.