SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
CODON TABLE:
Codon Full_Name 3_Letter 1_Letter
TTT Phenylalanine Phe F
TTC Phenylalanine Phe F
TTA Leucine Leu L
TTG Leucine Leu L
TCT Serine Ser S
TCC Serine Ser S
TCA Serine Ser S
TCG Serine Ser S
TAT Tyrosine Tyr Y
TAC Tyrosine Tyr Y
TAA Termination (ochre) Ter *
TAG Termination (amber) Ter *
TGT Cysteine Cys C
TGC Cysteine Cys C
TGA Termination (opal or umber) Ter *
TGG Tryptophan Trp W
CTT Leucine Leu L
CTC Leucine Leu L
CTA Leucine Leu L
CTG Leucine Leu L
CCT Proline Pro P
CCC Proline Pro P
CCA Proline Pro P
CCG Proline Pro P
CAT Histidine His H
CAC Histidine His H
CAA Glutamine Gln Q
CAG Glutamine Gln Q
CGT Arginine Arg R
CGC Arginine Arg R
CGA Arginine Arg R
CGG Arginine Arg R
ATT Isoleucine Ile I
ATC Isoleucine Ile I
ATA Isoleucine Ile I
ATG Methionine Met M
ACT Threonine Thr T
ACC Threonine Thr T
ACA Threonine Thr T
ACG Threonine Thr T
AAT Asparagine Asn N
AAC Asparagine Asn N
AAA Lysine Lys K
AAG Lysine Lys K
AGT Serine Ser S
AGC Serine Ser S
AGA Arginine Arg R
AGG Arginine Arg R
GTT Valine Val V
GTC Valine Val V
GTA Valine Val V
GTG Valine Val V
GCT Alanine Ala A
GCC Alanine Ala A
GCA Alanine Ala A
GCG Alanine Ala A
GAT Aspartate Asp D
GAC Aspartate Asp D
GAA Glutamate Glu E
GAG Glutamate Glu E
GGT Glycine Gly G
GGC Glycine Gly G
GGA Glycine Gly G
GGG Glycine Gly G
sfa.gff:
sfa.fasta: Fasta format The most commonly used biological sequence format is known as fasta. It
can be used for both nucleotide and amino acid sequences. From the NCBI website: A sequence
in FASTA format begins with a single-line description, followed by lines of sequence data. The
description line (defline) is distinguished from the sequence data by a greater-than (">") symbol
at the beginning. It is recommended that all lines of text be shorter than 80 characters in length.
Examples of FASTA (can be protein or nucleotide) format:
>gi|129295|sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED)
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFH
>gi|142864|gb|M10040.1|BACDNAE B. subtilis dnaE gene encoding DNA primase
GTACGACGGAGTGTTATAAGATGGGAAATCGGATACCAGATGAAATTGTGGATCAG
GTGCAAAAGTCGGC
AGATATCGTTGAAGTCATAGGTGATTATGTTCAATTAAAGAAGCAAGGCCGAAACT
ACTTTGGACTCTGT
CCTTTTCATGGAGAAAGCACACCTTCGTTTTCCGTATCGCCCGACAAACAGATTTTTC
ATTGCTTTGGCT GCGGAGCGGGCGGCAATGTTTTCTCTTTTTTAAGGCAGATGGAA
GFF format Sequences are only useful if we annotate them ie indicate what features they encode,
where those features are etc. A relatively new but widely used annotation format is GFF - the
General Feature Format. It is quite a terse, concise format used primarily by automated parsing
systems. Each field in the file is tab separated and represents a specific aspect of the sequence
feature being annotated.
Codon usage Most amino acids are encoded by more than a single codon. Codon usage bias can
reflect evolutionary forces as well as the overall GC content of a genome. It has many
consequences in terms of protein expression and is very important in modern synthetic biology.
The task Your task is to write a Python program to read a GFF file and associated fasta sequence
file, extract the relevant information and calculate the codon usage for all the annotated genes
along with some other questions below. You are also provided with a codon table file. Your
answers should be written to an output file. As ever, your solution should be generally applicable
to any input files in this format. Your program will need to take several command line arguments
- the gff filename, the fasta filename, the codon table filename and an output filename. You must
use this order of arguments in your code. You are provided with a sample GFF file and
associated fasta DNA sequence file, along with a codon table file. NB: The format of the GFF
file provided has been modified for simplified parsing. We have removed some fields, modified
some and added a header line to indicate the content of each field. NB: For our purposes, CDS is
a gene and all features in the file are annotated as CDS (ie each line in the file represents a gene)
so your code does not need to check whether a line represents a gene.
Questions 1. If there is a hemagglutinin (check for 'hemagglutination' in the annotations)
encoded on this sequence, what is the name of the gene 2. How many genes are annotated in the
gff file? 3. What is the length (number of nucleotides in the annotated gene) and translated
sequence of each gene? 4. Calculate the codon usage for all the genes and report the following: -
A single codon usage table for the entire set of genes - not a codor usage table for each gene See
specification of output formatting below
Output Your output file should be in the following format: ##1 Your answer (if you don't find
any haemagglutinin report 'None') ##Q2: Your answer ##Q3 Gene_name_1: length nt
Translated sequence Gene_name_2: length nt Translated sequence Gene_name_3: length nt etc...
Codon Usage Table
1. Your program must take command line arguments - all input filenames and an output
filename. Your program should check that the correct number of arguments have been provided
and exit gracefully (use quit0 or sys.exit() with a usage message if the script was called
incorrectly. 2. Use a main0 function and conditional execution (if_name_== "_main_") as we've
discussed in the section on modules. 3. Print your answers to no more than 2 places of decimals.
4. Functions must return the result of the computation and not write the result. Writing must be
implemented in main0 - not in your calculation function(s). 5. Use meaningful function names
and variable names. 6. Put your student number on a comment line at the top of your code, under
the shebang line and also as the first line of the output file. 7. Your solution will need to import
the sys module and can import textwrap for limiting line length in output. The latter is not
required (if you want to do this it's much nicer output!). YOU SHOULD NOT IMPORT ANY
OTHER MODULES.
1. Coordinates are according to biological counting - ie 1-based counting. The start and end
coordinates should be included in the gene. 2. For this assignment, dictionaries are your best
friend - don't be afraid to make liberal use of them. 3. Translation: a. Use the standard genetic
code. Use the standard single letter amino acid code in your translations (eg Phenylalanine is
represented by F). For stop codons use the asterisk symbol () You should use the tab-delimited
file codon_table.txt which is available on Brightspace in this exercise. 4. Your codon usage table
output should have one line per codon consisting of 3 tab separated fields as shown below. The
first is the codon; the second column should be an integer since it is a count; the third column
should be a float, less than or equal to 1 since it is a proportion. Order of codons is unimportant.
For example, if there are 5 occurrences of Phenylalanine (F), 4 of which are TTT and 1 being
TTC the relevant lines would look like this: Codon Usage Table ie: There are 2 Phenylalanine
(F) codons - TTT and TTC. The proportion field shows the number of TTT codons as a
proportion of the total number of F codons and similarly for TTC. There should be 1 line like
this for each possible codon. As a sanity check, adding up the values for the codons which code
for a given amino acid should sum to 1 (as shown above for Phe which has 2 codons:
0.8+0.2==1 ).

Contenu connexe

Similaire à CODON TABLE Codon Full_Name 3_Letter 1_Letter TTT P.pdf

RSEM and DE packages
RSEM and DE packagesRSEM and DE packages
RSEM and DE packagesRavi Gandham
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...IOSR Journals
 
Hw1 Gen320fall07revised
Hw1 Gen320fall07revisedHw1 Gen320fall07revised
Hw1 Gen320fall07revisedariddlegirl
 
data.txtInternational Business Management l2 Cons.docx
data.txtInternational Business Management       l2        Cons.docxdata.txtInternational Business Management       l2        Cons.docx
data.txtInternational Business Management l2 Cons.docxtheodorelove43763
 
Theory of automata and formal language lab manual
Theory of automata and formal language lab manualTheory of automata and formal language lab manual
Theory of automata and formal language lab manualNitesh Dubey
 
Basic architecture of expression vectors
Basic architecture of expression vectorsBasic architecture of expression vectors
Basic architecture of expression vectorsRashmi Rawat
 
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...IJAAS Team
 
SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2solgenomics
 
1588147798Begining_ABUAD1.pdf
1588147798Begining_ABUAD1.pdf1588147798Begining_ABUAD1.pdf
1588147798Begining_ABUAD1.pdfSemsemSameer1
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Best C++ Programming Homework Help
Best C++ Programming Homework HelpBest C++ Programming Homework Help
Best C++ Programming Homework HelpC++ Homework Help
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.pptSilpa87
 
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...Eli Kaminuma
 

Similaire à CODON TABLE Codon Full_Name 3_Letter 1_Letter TTT P.pdf (16)

DSP_Assign_1
DSP_Assign_1DSP_Assign_1
DSP_Assign_1
 
RSEM and DE packages
RSEM and DE packagesRSEM and DE packages
RSEM and DE packages
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
 
Hw1 Gen320fall07revised
Hw1 Gen320fall07revisedHw1 Gen320fall07revised
Hw1 Gen320fall07revised
 
data.txtInternational Business Management l2 Cons.docx
data.txtInternational Business Management       l2        Cons.docxdata.txtInternational Business Management       l2        Cons.docx
data.txtInternational Business Management l2 Cons.docx
 
Theory of automata and formal language lab manual
Theory of automata and formal language lab manualTheory of automata and formal language lab manual
Theory of automata and formal language lab manual
 
Basic architecture of expression vectors
Basic architecture of expression vectorsBasic architecture of expression vectors
Basic architecture of expression vectors
 
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
 
SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2
 
1588147798Begining_ABUAD1.pdf
1588147798Begining_ABUAD1.pdf1588147798Begining_ABUAD1.pdf
1588147798Begining_ABUAD1.pdf
 
Input-output
Input-outputInput-output
Input-output
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Best C++ Programming Homework Help
Best C++ Programming Homework HelpBest C++ Programming Homework Help
Best C++ Programming Homework Help
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.ppt
 
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
[2020-09-01] IIBMP2020 Generating annotation texts of HLA sequences with anti...
 

Plus de secunderbadtirumalgi

Code using Java Programming and use JavaFX. Show your code and outpu.pdf
Code using Java Programming and use JavaFX. Show your code and outpu.pdfCode using Java Programming and use JavaFX. Show your code and outpu.pdf
Code using Java Programming and use JavaFX. Show your code and outpu.pdfsecunderbadtirumalgi
 
Como parte de su programa Global Health Partnerships (2008-2011), Pf.pdf
Como parte de su programa Global Health Partnerships (2008-2011), Pf.pdfComo parte de su programa Global Health Partnerships (2008-2011), Pf.pdf
Como parte de su programa Global Health Partnerships (2008-2011), Pf.pdfsecunderbadtirumalgi
 
Como parte de la investigaci�n de su tesis, genera una biblioteca de.pdf
Como parte de la investigaci�n de su tesis, genera una biblioteca de.pdfComo parte de la investigaci�n de su tesis, genera una biblioteca de.pdf
Como parte de la investigaci�n de su tesis, genera una biblioteca de.pdfsecunderbadtirumalgi
 
Como administrador de la colecci�n de espec�menes en un museo de his.pdf
Como administrador de la colecci�n de espec�menes en un museo de his.pdfComo administrador de la colecci�n de espec�menes en un museo de his.pdf
Como administrador de la colecci�n de espec�menes en un museo de his.pdfsecunderbadtirumalgi
 
Como alcalde de Tropical Island, se enfrenta al doble mandato de pre.pdf
Como alcalde de Tropical Island, se enfrenta al doble mandato de pre.pdfComo alcalde de Tropical Island, se enfrenta al doble mandato de pre.pdf
Como alcalde de Tropical Island, se enfrenta al doble mandato de pre.pdfsecunderbadtirumalgi
 
Commentators on the US economy feel the US economy fell into a reces.pdf
Commentators on the US economy feel the US economy fell into a reces.pdfCommentators on the US economy feel the US economy fell into a reces.pdf
Commentators on the US economy feel the US economy fell into a reces.pdfsecunderbadtirumalgi
 
Combe Corporation has two divisions Alpha and Beta. Data from the m.pdf
Combe Corporation has two divisions Alpha and Beta. Data from the m.pdfCombe Corporation has two divisions Alpha and Beta. Data from the m.pdf
Combe Corporation has two divisions Alpha and Beta. Data from the m.pdfsecunderbadtirumalgi
 
Coloque los eventos para explicar c�mo un bien p�blico llega a ser d.pdf
Coloque los eventos para explicar c�mo un bien p�blico llega a ser d.pdfColoque los eventos para explicar c�mo un bien p�blico llega a ser d.pdf
Coloque los eventos para explicar c�mo un bien p�blico llega a ser d.pdfsecunderbadtirumalgi
 
College students often make up a substantial portion of the populati.pdf
College students often make up a substantial portion of the populati.pdfCollege students often make up a substantial portion of the populati.pdf
College students often make up a substantial portion of the populati.pdfsecunderbadtirumalgi
 
Code using Java Programming. Show your code and output.Create a pe.pdf
Code using Java Programming. Show your code and output.Create a pe.pdfCode using Java Programming. Show your code and output.Create a pe.pdf
Code using Java Programming. Show your code and output.Create a pe.pdfsecunderbadtirumalgi
 
code in html with div styles 9. Town of 0z Info - Microsoft Interne.pdf
code in html with div styles  9. Town of 0z Info - Microsoft Interne.pdfcode in html with div styles  9. Town of 0z Info - Microsoft Interne.pdf
code in html with div styles 9. Town of 0z Info - Microsoft Interne.pdfsecunderbadtirumalgi
 
CODE FOR echo_client.c A simple echo client using TCP #inc.pdf
CODE FOR echo_client.c A simple echo client using TCP  #inc.pdfCODE FOR echo_client.c A simple echo client using TCP  #inc.pdf
CODE FOR echo_client.c A simple echo client using TCP #inc.pdfsecunderbadtirumalgi
 
Coastal Louisiana has been experiencing habitat fragmentation and ha.pdf
Coastal Louisiana has been experiencing habitat fragmentation and ha.pdfCoastal Louisiana has been experiencing habitat fragmentation and ha.pdf
Coastal Louisiana has been experiencing habitat fragmentation and ha.pdfsecunderbadtirumalgi
 
Cloud Solutions had the following accounts and balances as of Decemb.pdf
Cloud Solutions had the following accounts and balances as of Decemb.pdfCloud Solutions had the following accounts and balances as of Decemb.pdf
Cloud Solutions had the following accounts and balances as of Decemb.pdfsecunderbadtirumalgi
 
Climate scientists claim that CO2 has risen recently to levels that .pdf
Climate scientists claim that CO2 has risen recently to levels that .pdfClimate scientists claim that CO2 has risen recently to levels that .pdf
Climate scientists claim that CO2 has risen recently to levels that .pdfsecunderbadtirumalgi
 
Climate change is one of the defining challenges of our time, but to.pdf
Climate change is one of the defining challenges of our time, but to.pdfClimate change is one of the defining challenges of our time, but to.pdf
Climate change is one of the defining challenges of our time, but to.pdfsecunderbadtirumalgi
 
Classify each of the following items as excludable, nonexcludable, r.pdf
Classify each of the following items as excludable, nonexcludable, r.pdfClassify each of the following items as excludable, nonexcludable, r.pdf
Classify each of the following items as excludable, nonexcludable, r.pdfsecunderbadtirumalgi
 
Chris is a young moonshine producer in the Tennessee region of Appal.pdf
Chris is a young moonshine producer in the Tennessee region of Appal.pdfChris is a young moonshine producer in the Tennessee region of Appal.pdf
Chris is a young moonshine producer in the Tennessee region of Appal.pdfsecunderbadtirumalgi
 
choose the right answer onlyQuestion 9 What are the four facto.pdf
choose the right answer onlyQuestion 9 What are the four facto.pdfchoose the right answer onlyQuestion 9 What are the four facto.pdf
choose the right answer onlyQuestion 9 What are the four facto.pdfsecunderbadtirumalgi
 
Choose ONE of the scenarios below and write a problem-solving report.pdf
Choose ONE of the scenarios below and write a problem-solving report.pdfChoose ONE of the scenarios below and write a problem-solving report.pdf
Choose ONE of the scenarios below and write a problem-solving report.pdfsecunderbadtirumalgi
 

Plus de secunderbadtirumalgi (20)

Code using Java Programming and use JavaFX. Show your code and outpu.pdf
Code using Java Programming and use JavaFX. Show your code and outpu.pdfCode using Java Programming and use JavaFX. Show your code and outpu.pdf
Code using Java Programming and use JavaFX. Show your code and outpu.pdf
 
Como parte de su programa Global Health Partnerships (2008-2011), Pf.pdf
Como parte de su programa Global Health Partnerships (2008-2011), Pf.pdfComo parte de su programa Global Health Partnerships (2008-2011), Pf.pdf
Como parte de su programa Global Health Partnerships (2008-2011), Pf.pdf
 
Como parte de la investigaci�n de su tesis, genera una biblioteca de.pdf
Como parte de la investigaci�n de su tesis, genera una biblioteca de.pdfComo parte de la investigaci�n de su tesis, genera una biblioteca de.pdf
Como parte de la investigaci�n de su tesis, genera una biblioteca de.pdf
 
Como administrador de la colecci�n de espec�menes en un museo de his.pdf
Como administrador de la colecci�n de espec�menes en un museo de his.pdfComo administrador de la colecci�n de espec�menes en un museo de his.pdf
Como administrador de la colecci�n de espec�menes en un museo de his.pdf
 
Como alcalde de Tropical Island, se enfrenta al doble mandato de pre.pdf
Como alcalde de Tropical Island, se enfrenta al doble mandato de pre.pdfComo alcalde de Tropical Island, se enfrenta al doble mandato de pre.pdf
Como alcalde de Tropical Island, se enfrenta al doble mandato de pre.pdf
 
Commentators on the US economy feel the US economy fell into a reces.pdf
Commentators on the US economy feel the US economy fell into a reces.pdfCommentators on the US economy feel the US economy fell into a reces.pdf
Commentators on the US economy feel the US economy fell into a reces.pdf
 
Combe Corporation has two divisions Alpha and Beta. Data from the m.pdf
Combe Corporation has two divisions Alpha and Beta. Data from the m.pdfCombe Corporation has two divisions Alpha and Beta. Data from the m.pdf
Combe Corporation has two divisions Alpha and Beta. Data from the m.pdf
 
Coloque los eventos para explicar c�mo un bien p�blico llega a ser d.pdf
Coloque los eventos para explicar c�mo un bien p�blico llega a ser d.pdfColoque los eventos para explicar c�mo un bien p�blico llega a ser d.pdf
Coloque los eventos para explicar c�mo un bien p�blico llega a ser d.pdf
 
College students often make up a substantial portion of the populati.pdf
College students often make up a substantial portion of the populati.pdfCollege students often make up a substantial portion of the populati.pdf
College students often make up a substantial portion of the populati.pdf
 
Code using Java Programming. Show your code and output.Create a pe.pdf
Code using Java Programming. Show your code and output.Create a pe.pdfCode using Java Programming. Show your code and output.Create a pe.pdf
Code using Java Programming. Show your code and output.Create a pe.pdf
 
code in html with div styles 9. Town of 0z Info - Microsoft Interne.pdf
code in html with div styles  9. Town of 0z Info - Microsoft Interne.pdfcode in html with div styles  9. Town of 0z Info - Microsoft Interne.pdf
code in html with div styles 9. Town of 0z Info - Microsoft Interne.pdf
 
CODE FOR echo_client.c A simple echo client using TCP #inc.pdf
CODE FOR echo_client.c A simple echo client using TCP  #inc.pdfCODE FOR echo_client.c A simple echo client using TCP  #inc.pdf
CODE FOR echo_client.c A simple echo client using TCP #inc.pdf
 
Coastal Louisiana has been experiencing habitat fragmentation and ha.pdf
Coastal Louisiana has been experiencing habitat fragmentation and ha.pdfCoastal Louisiana has been experiencing habitat fragmentation and ha.pdf
Coastal Louisiana has been experiencing habitat fragmentation and ha.pdf
 
Cloud Solutions had the following accounts and balances as of Decemb.pdf
Cloud Solutions had the following accounts and balances as of Decemb.pdfCloud Solutions had the following accounts and balances as of Decemb.pdf
Cloud Solutions had the following accounts and balances as of Decemb.pdf
 
Climate scientists claim that CO2 has risen recently to levels that .pdf
Climate scientists claim that CO2 has risen recently to levels that .pdfClimate scientists claim that CO2 has risen recently to levels that .pdf
Climate scientists claim that CO2 has risen recently to levels that .pdf
 
Climate change is one of the defining challenges of our time, but to.pdf
Climate change is one of the defining challenges of our time, but to.pdfClimate change is one of the defining challenges of our time, but to.pdf
Climate change is one of the defining challenges of our time, but to.pdf
 
Classify each of the following items as excludable, nonexcludable, r.pdf
Classify each of the following items as excludable, nonexcludable, r.pdfClassify each of the following items as excludable, nonexcludable, r.pdf
Classify each of the following items as excludable, nonexcludable, r.pdf
 
Chris is a young moonshine producer in the Tennessee region of Appal.pdf
Chris is a young moonshine producer in the Tennessee region of Appal.pdfChris is a young moonshine producer in the Tennessee region of Appal.pdf
Chris is a young moonshine producer in the Tennessee region of Appal.pdf
 
choose the right answer onlyQuestion 9 What are the four facto.pdf
choose the right answer onlyQuestion 9 What are the four facto.pdfchoose the right answer onlyQuestion 9 What are the four facto.pdf
choose the right answer onlyQuestion 9 What are the four facto.pdf
 
Choose ONE of the scenarios below and write a problem-solving report.pdf
Choose ONE of the scenarios below and write a problem-solving report.pdfChoose ONE of the scenarios below and write a problem-solving report.pdf
Choose ONE of the scenarios below and write a problem-solving report.pdf
 

Dernier

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 

Dernier (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 

CODON TABLE Codon Full_Name 3_Letter 1_Letter TTT P.pdf

  • 1. CODON TABLE: Codon Full_Name 3_Letter 1_Letter TTT Phenylalanine Phe F TTC Phenylalanine Phe F TTA Leucine Leu L TTG Leucine Leu L TCT Serine Ser S TCC Serine Ser S TCA Serine Ser S TCG Serine Ser S TAT Tyrosine Tyr Y TAC Tyrosine Tyr Y TAA Termination (ochre) Ter * TAG Termination (amber) Ter * TGT Cysteine Cys C TGC Cysteine Cys C TGA Termination (opal or umber) Ter * TGG Tryptophan Trp W CTT Leucine Leu L CTC Leucine Leu L CTA Leucine Leu L CTG Leucine Leu L CCT Proline Pro P CCC Proline Pro P CCA Proline Pro P CCG Proline Pro P CAT Histidine His H CAC Histidine His H CAA Glutamine Gln Q CAG Glutamine Gln Q CGT Arginine Arg R CGC Arginine Arg R CGA Arginine Arg R CGG Arginine Arg R ATT Isoleucine Ile I
  • 2. ATC Isoleucine Ile I ATA Isoleucine Ile I ATG Methionine Met M ACT Threonine Thr T ACC Threonine Thr T ACA Threonine Thr T ACG Threonine Thr T AAT Asparagine Asn N AAC Asparagine Asn N AAA Lysine Lys K AAG Lysine Lys K AGT Serine Ser S AGC Serine Ser S AGA Arginine Arg R AGG Arginine Arg R GTT Valine Val V GTC Valine Val V GTA Valine Val V GTG Valine Val V GCT Alanine Ala A GCC Alanine Ala A GCA Alanine Ala A GCG Alanine Ala A GAT Aspartate Asp D GAC Aspartate Asp D GAA Glutamate Glu E GAG Glutamate Glu E GGT Glycine Gly G GGC Glycine Gly G GGA Glycine Gly G GGG Glycine Gly G sfa.gff: sfa.fasta: Fasta format The most commonly used biological sequence format is known as fasta. It can be used for both nucleotide and amino acid sequences. From the NCBI website: A sequence
  • 3. in FASTA format begins with a single-line description, followed by lines of sequence data. The description line (defline) is distinguished from the sequence data by a greater-than (">") symbol at the beginning. It is recommended that all lines of text be shorter than 80 characters in length. Examples of FASTA (can be protein or nucleotide) format: >gi|129295|sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED) QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFH >gi|142864|gb|M10040.1|BACDNAE B. subtilis dnaE gene encoding DNA primase GTACGACGGAGTGTTATAAGATGGGAAATCGGATACCAGATGAAATTGTGGATCAG GTGCAAAAGTCGGC AGATATCGTTGAAGTCATAGGTGATTATGTTCAATTAAAGAAGCAAGGCCGAAACT ACTTTGGACTCTGT CCTTTTCATGGAGAAAGCACACCTTCGTTTTCCGTATCGCCCGACAAACAGATTTTTC ATTGCTTTGGCT GCGGAGCGGGCGGCAATGTTTTCTCTTTTTTAAGGCAGATGGAA GFF format Sequences are only useful if we annotate them ie indicate what features they encode, where those features are etc. A relatively new but widely used annotation format is GFF - the General Feature Format. It is quite a terse, concise format used primarily by automated parsing systems. Each field in the file is tab separated and represents a specific aspect of the sequence feature being annotated. Codon usage Most amino acids are encoded by more than a single codon. Codon usage bias can reflect evolutionary forces as well as the overall GC content of a genome. It has many consequences in terms of protein expression and is very important in modern synthetic biology. The task Your task is to write a Python program to read a GFF file and associated fasta sequence file, extract the relevant information and calculate the codon usage for all the annotated genes along with some other questions below. You are also provided with a codon table file. Your answers should be written to an output file. As ever, your solution should be generally applicable to any input files in this format. Your program will need to take several command line arguments - the gff filename, the fasta filename, the codon table filename and an output filename. You must use this order of arguments in your code. You are provided with a sample GFF file and associated fasta DNA sequence file, along with a codon table file. NB: The format of the GFF file provided has been modified for simplified parsing. We have removed some fields, modified some and added a header line to indicate the content of each field. NB: For our purposes, CDS is a gene and all features in the file are annotated as CDS (ie each line in the file represents a gene) so your code does not need to check whether a line represents a gene. Questions 1. If there is a hemagglutinin (check for 'hemagglutination' in the annotations)
  • 4. encoded on this sequence, what is the name of the gene 2. How many genes are annotated in the gff file? 3. What is the length (number of nucleotides in the annotated gene) and translated sequence of each gene? 4. Calculate the codon usage for all the genes and report the following: - A single codon usage table for the entire set of genes - not a codor usage table for each gene See specification of output formatting below Output Your output file should be in the following format: ##1 Your answer (if you don't find any haemagglutinin report 'None') ##Q2: Your answer ##Q3 Gene_name_1: length nt Translated sequence Gene_name_2: length nt Translated sequence Gene_name_3: length nt etc... Codon Usage Table 1. Your program must take command line arguments - all input filenames and an output filename. Your program should check that the correct number of arguments have been provided and exit gracefully (use quit0 or sys.exit() with a usage message if the script was called incorrectly. 2. Use a main0 function and conditional execution (if_name_== "_main_") as we've discussed in the section on modules. 3. Print your answers to no more than 2 places of decimals. 4. Functions must return the result of the computation and not write the result. Writing must be implemented in main0 - not in your calculation function(s). 5. Use meaningful function names and variable names. 6. Put your student number on a comment line at the top of your code, under the shebang line and also as the first line of the output file. 7. Your solution will need to import the sys module and can import textwrap for limiting line length in output. The latter is not required (if you want to do this it's much nicer output!). YOU SHOULD NOT IMPORT ANY OTHER MODULES. 1. Coordinates are according to biological counting - ie 1-based counting. The start and end coordinates should be included in the gene. 2. For this assignment, dictionaries are your best friend - don't be afraid to make liberal use of them. 3. Translation: a. Use the standard genetic code. Use the standard single letter amino acid code in your translations (eg Phenylalanine is represented by F). For stop codons use the asterisk symbol () You should use the tab-delimited file codon_table.txt which is available on Brightspace in this exercise. 4. Your codon usage table output should have one line per codon consisting of 3 tab separated fields as shown below. The first is the codon; the second column should be an integer since it is a count; the third column should be a float, less than or equal to 1 since it is a proportion. Order of codons is unimportant. For example, if there are 5 occurrences of Phenylalanine (F), 4 of which are TTT and 1 being TTC the relevant lines would look like this: Codon Usage Table ie: There are 2 Phenylalanine (F) codons - TTT and TTC. The proportion field shows the number of TTT codons as a
  • 5. proportion of the total number of F codons and similarly for TTC. There should be 1 line like this for each possible codon. As a sanity check, adding up the values for the codons which code for a given amino acid should sum to 1 (as shown above for Phe which has 2 codons: 0.8+0.2==1 ).