SlideShare a Scribd company logo
1 of 8
HBC1019 Biochemistry 1 Trimester 1, 2010/2011
Page 1 of 8
Faculty of Information ScienceTechnology
LAB REPORT
HBC 1019 - Biochemistry I
Practical 7
DNA, RNA and the Flow of Genetic
Information
Name : Osama Barayan
ID : 1091105869
HBC1019 Biochemistry 1 Trimester 1, 2010/2011
Page 2 of 8
Introduction
Biological databases are always referred as sequence or structure libraries that contained
huge amount of information about the sequence and structure of nucleic acids (DNA,
RNA) and proteins. This practical will introduce to you some of the relevant databases.
There are very useful and becoming important resources for the study of biochemistry and
bioinformatics as well at all levels.
Finding databases
a. What are the major online databases that contain DNA and protein
sequences?0
1. http://www.ncbi.nlm.nih.gov/
2. http://www.cellbiol.com/
3. http://www.biochemweb.org/
4. http://nar.oxfordjournals.org/
a. Which databases contain entire genomes?
We can find many sites in the internet for example
http://www.ncbi.nlm.nih.gov/
b. Define and understand the meaning of the following terms; once you
defined them, please provide the link(s) as well.
i. BLAST
Basic Local Alignment Search Tool, or BLAST, is an algorithm for
comparing primary biological sequence information, such as the amino-acid sequences of
different proteins or the nucleotides of DNA sequences.
ii. Taxonomy
the science of the classification of living things, grouped by similarity: species are
grouped into genera, genera into families, families into orders, orders into classes,
classes into phyla, and phyla with similar characteristics at the top level of the
classification .
Gene ontology
The Gene Ontology, or GO, is a major bioinformatics initiative to unify the
representation of gene and gene product attributes across all specie
iii. Phylogenetic tree
A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing
the inferred evolutionary relationships among various biological species or other
entities based upon similarities and differences in their physical and/or genetic
characteristics
iv. Multiple sequence alignment
A multiple sequence alignment (MSA) is a sequence alignment of three or more
biological sequences, generally protein, DNA, or RNA.
HBC1019 Biochemistry 1 Trimester 1, 2010/2011
Page 3 of 8
5. Analyzing DNA sequence
You will learn how to analyze a given DNA sequence by identify an open reading
frame, determine the protein that it will express and find the bacterial source for
that protein.
This is the DNA sequence:
TACGCAATGCGTATCATTCTGCTGGGCGCTCCGGGCGCAGGTAAAGGTACTCAGGCTCAATTCATC
ATGGAGAAATACGGCATTCCGCAAATCTCTACTGGTGACATGTTGCGCGCCGCTGTAAAAGCAGGT
TCTGAGTTAGGTCTGAAAGCAAAAGAAATTATGGATGCGGGCAAGTTGGTGACTGATGAGTTAGTT
ATCGCATTACTCAAAGAACGTATCACACAGGAAGATTGCCGCGATGGTTTTCTGTTAGACGGGTTC
CCGCGTACCATTCCTCAGGCAGATGCCATGAAAAAGAAGCCGGTATCAGTTGATTATGTGCTGGAG
TTTGATGTTCCAGACGAGCTGATTGTTGAGCGCATTGTCGGCCGTCGGGTACATGCTGCTTCAGGC
CGTGTTTATCACGTTAAATTCAACCCACCTAAAGTTGAAGATAAAGATGATGTTACCGGTGAAGAG
CTGACTATTCGTAAAGATGATCAGGAAGCGACTGTCCGTAAGCGTCTTATCGAATATCATCAACAA
ACTGCACCATTGGTTTCTTACTATCATAAAGAAGCGGATGCAGGTAATACGCAATATTTTAAACTG
GACGGAACCCGTAATGTAGCAGAAGTCAGTGCTGAACTGGCGACTATTCTCGGTTAATTCTGGATG
GCCTTATAGCTAAGGCGGTTTAAGGCCGCCTTAGCTATTTCAAGTAAGAAGGGCGTAGTACCTACA
AAAGGAGATTTGGCATGATGCAAAGCAAACCCGGCGTATTAATGGTTAATTTGGGGACACCAGATG
CTCCAACGTCGAAAGCTATCAAGCGTTATTTAGCTGAGTTTTTGAGTGACCGCCGGGTAGTTGATA
CTTCCCCATTGCTATGGTGGCCATTGCTGCATGGTGTTATTTTACCGCTTCGGTCACCACGTGTAG
CAAAACTTTATCAATCCGTTTGGATGGAAGAGGGCTCTCCTTTATTGGTTTATAGCCGCCGCCAGC
AGAAAGCACTGGCAGCAAGAATGCCTGATATTCCTGTAGAATTAGGCATGAGCTATGGTTCAC
a. What is an Open Reading Frame (ORF) and reading frame?
any region of DNA or RNA where a protein could be encoded. There
must be a string of nucleotides in which one of the three reading frames
has no stop codons
b. Try to find an ORF from the segment of DNA above by finding the first
start codon and the first in frame stop codon.
Basically, in bacteria, an open reading frame on a piece of mRNA almost
always begins with AUG, which corresponds to ATG in the DNA segment
that code for the mRNA. According to the standard genetic code, there are
three Stop codons on mRNA: UAA, UAG, and UGA, which correspond to
TAA, TAG, and TGA in the parent DNA segment. Here are the rules for
finding an open reading frame in this piece of bacterial DNA:
i. It must start with ATG. In this exercise, the first ATG is the start codon.
ii. It must end with TAA, TAG, or TGA.
iii. It must be at least 300 nucleotides long (coding for 100 amino acids).
iv. The ATG start codon and the stop codon must be in frame. This means that the total
number of bases in the sequence from the start to the stop codon must be evenly
divisible by 3.
c. Copy the entire sequence again and go to the Translate tool on the ExPASy
server (http://www.expasy.org/tools/dnal.htm). Paste the sequence in the
box and select “Verbose (“Met”, “Stop”, spaces between residues)” as the
Output format and click on “Translate Sequence”.
HBC1019 Biochemistry 1 Trimester 1, 2010/2011
Page 4 of 8
What are the results of translation? Identify the reading frame that
contain a protein (more than 100 continuous amino acids with no
interruptions by a stop codon) and its name.
Y A Met R I I L L G A P G A G K G T Q A Q F I Met E K Y G I P Q I S T G D Met L R A A V
K A G S E L G L K A K E I Met D A G K L V T D E L V I A L L K E R I T Q E D C R D G F L
L D G F P R T I P Q A D A Met K K K P V S V D Y V L E F D V P D E L I V E R I V G R R
V H A A S G R V Y H V K F N P P K V E D K D D V T G E E L T I R K D D Q E A T V R K
R L I E Y H Q Q T A P L V S Y Y H K E A D A G N T Q Y F K L D G T R N V A E V S A E L
A T I L G Stop F W Met A L Stop L R R F K A A L A I S S K K G V V P T K G D L A
Now change the Output format from the early page to “Compact (“M”, “-”,
no spaces)”. Go to the same reading frame as before and copy the protein
sequence (by one-letter abbreviations) starting with “M” for start codon.
Paste the sequence in your answer.
MRIILLGAPGAGKGTQAQFIMEKYGIPQISTGDMLRAAVKAGSELGLKAKEIM
DAGKL
VTDELVIALLKERITQEDCRDGFLLDGFPRTIPQADAMKKKPVSVDYVLEFDV
PDELIVE
RIVGRRVHAASGRVYHVKFNPPKVEDKDDVTGEELTIRKDDQEATVRKRLIEY
HQQTAPL
VSYYHKEADAGNTQYFKLDGTRNVAEVSAELATILG
d. Now you will identify the protein and the bacterial source. Go to the NCBI
BLAST page (http://www.ncbi.nlm.nih.gov/BLAST/).
What are the different types of BLAST program and what are their
functions?
Nucleotide blast : Search a nucleotide database
blastx : Search protein database using a translated nucleotide query
Protein blast : Search protein database
tblastn : Search translated nucleotide database using a protein query
tblastx : Search translated nucleotide database using a translated nucleotide
query
You will do a simple BLAST search using your protein sequence, but you
can do much more with BLAST. You are encouraged to try the Tutorials
on the BLAST
(http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/tut1.html).
On the BLAST page, select “Protein-protein BLAST.” Enter your protein
sequence in the “Search” box. Use the default values for the rest of the page
and click on the “BLAST!” button. You will be taken to the “formatting
BLAST” page. Click on the “Format!” button. You may have to wait for
the results. Your protein should be the first one listed in the BLAST output.
HBC1019 Biochemistry 1 Trimester 1, 2010/2011
Page 5 of 8
6. Sequence homology
You will use BLAST to look for sequences that are homologous to the protein that you
identified in problem 2.
a. Define homolog, ortholog and paralog.
A homology in chemistry refers to a chemical compound from a series
of compounds that differ only in the number of repeated structural units.
A homolog is a special case of an analog.
either of two or more homologous gene sequences found in different
species is called ortholog
either of a pair of genes that derive from the same ancestral gene is
called_paralog
b. Go to the NCBI BLAST page (http://www.ncbi.nlm.nih.gov/BLAST/) and
choose “Protein-protein BLAST.” Paste your protein sequence into the
“Search” box. Before clicking on the “BLAST!” button, narrow the search
by kingdom. As you look down the BLAST page, you'll see an Options
section under “choose search set” (followed by an empty box) or “select
from:” key in “Eukaryota.” Now click on the “BLAST!” button. Click on
the “Format!” button on the next page. Can you find a homologous
sequence from yeast? YES (Hint: Use your browser's Find tool to search
for the term “Saccharomyces.”) Note the Score and E value given at the
right of the entry. Can you find a homologous sequence from humans?
(Hint: Search for the term “Homo.”) Note its Score and E value.
Yes ,,max 98% from Cytidylate kinase,,,,total 90.5,,, and E
value is 4e-18. Cytidine monophosphate, Score is 90.1, query
coverage 98%, and E value is 5e-18
UMP-CMP kinase isoform a, Score is 89.7, query coverage
98%, and E value is 6e-18.
Most biochemists consider 25% identity the cutoff for sequence homology,
meaning that if two proteins are less than 25% identical in sequence, more
evidence is needed to determine whether they are homologs. Click on the
Score values for the yeast and human proteins to see each sequence aligned
with your query sequence and to see the percent sequence identity. Are the
yeast and human sequences homologous to your query sequence? yes
HBC1019 Biochemistry 1 Trimester 1, 2010/2011
Page 6 of 8
c. What are Score and E-value stand for? Use the BLAST online tutorial
(http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html) to
discover the meaning. What is the difference between an identity and a
conservative substitution? From the result of BLAST you gained, provide an
example from the comparison of your sequence and a homologous sequence.
Score = a measure of the similarity of the query to the sequence shown.
E−value is a measure of the reliability of the S score.
BLAST uses a substitution matrix to assign values in the alignment process,
based on the analysis of amino acid substitutions in a wide variety of
protein sequences. Make sure you understand the meaning of the term
“substitution matrix.” What is the default substitution matrix on the
BLAST page? BLOSUM62.
What other matrices are available?
PAM1, PAM250, PAM30, PAM70, BLOSUM45, BLOSUM80
What is the source of the names for these substitution matrices?
PAM = Point Accepted Mutation. This matrix work by observing
differences between closely related proteins.
-BLOSUM = BLOck SUbstitution Matrix. Matrix that can calculate small
changes in sequences which could happen during evolution process. This
matrix works by using multiple alignments of evolutionarily divergent
proteins
Repeat the BLAST search in Problem 3(b) using a different substitution matrix.
(Look for algorithm parameters). Do you find different answers?yes
HBC1019 Biochemistry 1 Trimester 1, 2010/2011
Page 7 of 8
HBC1019 Biochemistry 1 Trimester 1, 2010/2011
Page 8 of 8

More Related Content

Similar to Practical 7 dna, rna and the flow of genetic information5

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
DNA and Genes Lab ActivityComplete your answers in the spaces .docx
DNA and Genes Lab ActivityComplete your answers in the spaces .docxDNA and Genes Lab ActivityComplete your answers in the spaces .docx
DNA and Genes Lab ActivityComplete your answers in the spaces .docx
jacksnathalie
 
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docxMYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
dohertyjoetta
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
Abhik Seal
 

Similar to Practical 7 dna, rna and the flow of genetic information5 (20)

Proteomics a search tool for vaccines
Proteomics a search tool for vaccinesProteomics a search tool for vaccines
Proteomics a search tool for vaccines
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
DNA and Genes Lab ActivityComplete your answers in the spaces .docx
DNA and Genes Lab ActivityComplete your answers in the spaces .docxDNA and Genes Lab ActivityComplete your answers in the spaces .docx
DNA and Genes Lab ActivityComplete your answers in the spaces .docx
 
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docxMYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matrices
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners Slides
 
Pep Talk San Diego 011311
Pep Talk San Diego 011311Pep Talk San Diego 011311
Pep Talk San Diego 011311
 
Article
ArticleArticle
Article
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
Protein databases
Protein databasesProtein databases
Protein databases
 

More from Osama Barayan

More from Osama Barayan (9)

Estimating the-time-needed-for-mitosis
Estimating the-time-needed-for-mitosisEstimating the-time-needed-for-mitosis
Estimating the-time-needed-for-mitosis
 
Practical 6 elisa
Practical 6 elisaPractical 6 elisa
Practical 6 elisa
 
Practical 5 sds page
Practical 5 sds pagePractical 5 sds page
Practical 5 sds page
 
Practical 4 sec
Practical 4 secPractical 4 sec
Practical 4 sec
 
Practical 3 got
Practical 3 gotPractical 3 got
Practical 3 got
 
Practical 2 tlc
Practical 2 tlcPractical 2 tlc
Practical 2 tlc
 
Hbc1019 tut4
Hbc1019 tut4Hbc1019 tut4
Hbc1019 tut4
 
Biochemistry I
Biochemistry IBiochemistry I
Biochemistry I
 
Practical 9 protein structure and function (3)
Practical 9 protein structure and function  (3)Practical 9 protein structure and function  (3)
Practical 9 protein structure and function (3)
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Practical 7 dna, rna and the flow of genetic information5

  • 1. HBC1019 Biochemistry 1 Trimester 1, 2010/2011 Page 1 of 8 Faculty of Information ScienceTechnology LAB REPORT HBC 1019 - Biochemistry I Practical 7 DNA, RNA and the Flow of Genetic Information Name : Osama Barayan ID : 1091105869
  • 2. HBC1019 Biochemistry 1 Trimester 1, 2010/2011 Page 2 of 8 Introduction Biological databases are always referred as sequence or structure libraries that contained huge amount of information about the sequence and structure of nucleic acids (DNA, RNA) and proteins. This practical will introduce to you some of the relevant databases. There are very useful and becoming important resources for the study of biochemistry and bioinformatics as well at all levels. Finding databases a. What are the major online databases that contain DNA and protein sequences?0 1. http://www.ncbi.nlm.nih.gov/ 2. http://www.cellbiol.com/ 3. http://www.biochemweb.org/ 4. http://nar.oxfordjournals.org/ a. Which databases contain entire genomes? We can find many sites in the internet for example http://www.ncbi.nlm.nih.gov/ b. Define and understand the meaning of the following terms; once you defined them, please provide the link(s) as well. i. BLAST Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. ii. Taxonomy the science of the classification of living things, grouped by similarity: species are grouped into genera, genera into families, families into orders, orders into classes, classes into phyla, and phyla with similar characteristics at the top level of the classification . Gene ontology The Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all specie iii. Phylogenetic tree A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics iv. Multiple sequence alignment A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA.
  • 3. HBC1019 Biochemistry 1 Trimester 1, 2010/2011 Page 3 of 8 5. Analyzing DNA sequence You will learn how to analyze a given DNA sequence by identify an open reading frame, determine the protein that it will express and find the bacterial source for that protein. This is the DNA sequence: TACGCAATGCGTATCATTCTGCTGGGCGCTCCGGGCGCAGGTAAAGGTACTCAGGCTCAATTCATC ATGGAGAAATACGGCATTCCGCAAATCTCTACTGGTGACATGTTGCGCGCCGCTGTAAAAGCAGGT TCTGAGTTAGGTCTGAAAGCAAAAGAAATTATGGATGCGGGCAAGTTGGTGACTGATGAGTTAGTT ATCGCATTACTCAAAGAACGTATCACACAGGAAGATTGCCGCGATGGTTTTCTGTTAGACGGGTTC CCGCGTACCATTCCTCAGGCAGATGCCATGAAAAAGAAGCCGGTATCAGTTGATTATGTGCTGGAG TTTGATGTTCCAGACGAGCTGATTGTTGAGCGCATTGTCGGCCGTCGGGTACATGCTGCTTCAGGC CGTGTTTATCACGTTAAATTCAACCCACCTAAAGTTGAAGATAAAGATGATGTTACCGGTGAAGAG CTGACTATTCGTAAAGATGATCAGGAAGCGACTGTCCGTAAGCGTCTTATCGAATATCATCAACAA ACTGCACCATTGGTTTCTTACTATCATAAAGAAGCGGATGCAGGTAATACGCAATATTTTAAACTG GACGGAACCCGTAATGTAGCAGAAGTCAGTGCTGAACTGGCGACTATTCTCGGTTAATTCTGGATG GCCTTATAGCTAAGGCGGTTTAAGGCCGCCTTAGCTATTTCAAGTAAGAAGGGCGTAGTACCTACA AAAGGAGATTTGGCATGATGCAAAGCAAACCCGGCGTATTAATGGTTAATTTGGGGACACCAGATG CTCCAACGTCGAAAGCTATCAAGCGTTATTTAGCTGAGTTTTTGAGTGACCGCCGGGTAGTTGATA CTTCCCCATTGCTATGGTGGCCATTGCTGCATGGTGTTATTTTACCGCTTCGGTCACCACGTGTAG CAAAACTTTATCAATCCGTTTGGATGGAAGAGGGCTCTCCTTTATTGGTTTATAGCCGCCGCCAGC AGAAAGCACTGGCAGCAAGAATGCCTGATATTCCTGTAGAATTAGGCATGAGCTATGGTTCAC a. What is an Open Reading Frame (ORF) and reading frame? any region of DNA or RNA where a protein could be encoded. There must be a string of nucleotides in which one of the three reading frames has no stop codons b. Try to find an ORF from the segment of DNA above by finding the first start codon and the first in frame stop codon. Basically, in bacteria, an open reading frame on a piece of mRNA almost always begins with AUG, which corresponds to ATG in the DNA segment that code for the mRNA. According to the standard genetic code, there are three Stop codons on mRNA: UAA, UAG, and UGA, which correspond to TAA, TAG, and TGA in the parent DNA segment. Here are the rules for finding an open reading frame in this piece of bacterial DNA: i. It must start with ATG. In this exercise, the first ATG is the start codon. ii. It must end with TAA, TAG, or TGA. iii. It must be at least 300 nucleotides long (coding for 100 amino acids). iv. The ATG start codon and the stop codon must be in frame. This means that the total number of bases in the sequence from the start to the stop codon must be evenly divisible by 3. c. Copy the entire sequence again and go to the Translate tool on the ExPASy server (http://www.expasy.org/tools/dnal.htm). Paste the sequence in the box and select “Verbose (“Met”, “Stop”, spaces between residues)” as the Output format and click on “Translate Sequence”.
  • 4. HBC1019 Biochemistry 1 Trimester 1, 2010/2011 Page 4 of 8 What are the results of translation? Identify the reading frame that contain a protein (more than 100 continuous amino acids with no interruptions by a stop codon) and its name. Y A Met R I I L L G A P G A G K G T Q A Q F I Met E K Y G I P Q I S T G D Met L R A A V K A G S E L G L K A K E I Met D A G K L V T D E L V I A L L K E R I T Q E D C R D G F L L D G F P R T I P Q A D A Met K K K P V S V D Y V L E F D V P D E L I V E R I V G R R V H A A S G R V Y H V K F N P P K V E D K D D V T G E E L T I R K D D Q E A T V R K R L I E Y H Q Q T A P L V S Y Y H K E A D A G N T Q Y F K L D G T R N V A E V S A E L A T I L G Stop F W Met A L Stop L R R F K A A L A I S S K K G V V P T K G D L A Now change the Output format from the early page to “Compact (“M”, “-”, no spaces)”. Go to the same reading frame as before and copy the protein sequence (by one-letter abbreviations) starting with “M” for start codon. Paste the sequence in your answer. MRIILLGAPGAGKGTQAQFIMEKYGIPQISTGDMLRAAVKAGSELGLKAKEIM DAGKL VTDELVIALLKERITQEDCRDGFLLDGFPRTIPQADAMKKKPVSVDYVLEFDV PDELIVE RIVGRRVHAASGRVYHVKFNPPKVEDKDDVTGEELTIRKDDQEATVRKRLIEY HQQTAPL VSYYHKEADAGNTQYFKLDGTRNVAEVSAELATILG d. Now you will identify the protein and the bacterial source. Go to the NCBI BLAST page (http://www.ncbi.nlm.nih.gov/BLAST/). What are the different types of BLAST program and what are their functions? Nucleotide blast : Search a nucleotide database blastx : Search protein database using a translated nucleotide query Protein blast : Search protein database tblastn : Search translated nucleotide database using a protein query tblastx : Search translated nucleotide database using a translated nucleotide query You will do a simple BLAST search using your protein sequence, but you can do much more with BLAST. You are encouraged to try the Tutorials on the BLAST (http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/tut1.html). On the BLAST page, select “Protein-protein BLAST.” Enter your protein sequence in the “Search” box. Use the default values for the rest of the page and click on the “BLAST!” button. You will be taken to the “formatting BLAST” page. Click on the “Format!” button. You may have to wait for the results. Your protein should be the first one listed in the BLAST output.
  • 5. HBC1019 Biochemistry 1 Trimester 1, 2010/2011 Page 5 of 8 6. Sequence homology You will use BLAST to look for sequences that are homologous to the protein that you identified in problem 2. a. Define homolog, ortholog and paralog. A homology in chemistry refers to a chemical compound from a series of compounds that differ only in the number of repeated structural units. A homolog is a special case of an analog. either of two or more homologous gene sequences found in different species is called ortholog either of a pair of genes that derive from the same ancestral gene is called_paralog b. Go to the NCBI BLAST page (http://www.ncbi.nlm.nih.gov/BLAST/) and choose “Protein-protein BLAST.” Paste your protein sequence into the “Search” box. Before clicking on the “BLAST!” button, narrow the search by kingdom. As you look down the BLAST page, you'll see an Options section under “choose search set” (followed by an empty box) or “select from:” key in “Eukaryota.” Now click on the “BLAST!” button. Click on the “Format!” button on the next page. Can you find a homologous sequence from yeast? YES (Hint: Use your browser's Find tool to search for the term “Saccharomyces.”) Note the Score and E value given at the right of the entry. Can you find a homologous sequence from humans? (Hint: Search for the term “Homo.”) Note its Score and E value. Yes ,,max 98% from Cytidylate kinase,,,,total 90.5,,, and E value is 4e-18. Cytidine monophosphate, Score is 90.1, query coverage 98%, and E value is 5e-18 UMP-CMP kinase isoform a, Score is 89.7, query coverage 98%, and E value is 6e-18. Most biochemists consider 25% identity the cutoff for sequence homology, meaning that if two proteins are less than 25% identical in sequence, more evidence is needed to determine whether they are homologs. Click on the Score values for the yeast and human proteins to see each sequence aligned with your query sequence and to see the percent sequence identity. Are the yeast and human sequences homologous to your query sequence? yes
  • 6. HBC1019 Biochemistry 1 Trimester 1, 2010/2011 Page 6 of 8 c. What are Score and E-value stand for? Use the BLAST online tutorial (http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html) to discover the meaning. What is the difference between an identity and a conservative substitution? From the result of BLAST you gained, provide an example from the comparison of your sequence and a homologous sequence. Score = a measure of the similarity of the query to the sequence shown. E−value is a measure of the reliability of the S score. BLAST uses a substitution matrix to assign values in the alignment process, based on the analysis of amino acid substitutions in a wide variety of protein sequences. Make sure you understand the meaning of the term “substitution matrix.” What is the default substitution matrix on the BLAST page? BLOSUM62. What other matrices are available? PAM1, PAM250, PAM30, PAM70, BLOSUM45, BLOSUM80 What is the source of the names for these substitution matrices? PAM = Point Accepted Mutation. This matrix work by observing differences between closely related proteins. -BLOSUM = BLOck SUbstitution Matrix. Matrix that can calculate small changes in sequences which could happen during evolution process. This matrix works by using multiple alignments of evolutionarily divergent proteins Repeat the BLAST search in Problem 3(b) using a different substitution matrix. (Look for algorithm parameters). Do you find different answers?yes
  • 7. HBC1019 Biochemistry 1 Trimester 1, 2010/2011 Page 7 of 8
  • 8. HBC1019 Biochemistry 1 Trimester 1, 2010/2011 Page 8 of 8