Phylogenetic Analysis and Identification Of Dioxane Degrader
1. Phylogenetic Analysis and Identification of 1, 4-Dioxane Degrading
Genes
Keith Sanders May 9th 2016 Dr. Iyer and Brian Iken
Abstract
1, 4-dioxane is a substance that was used as a solvent for other organic compounds.
Exposure to this compound can have numerous deleterious effects on a living organisms and is
suspected as a carcinogen. Originally, this substance was known as just an occupational hazard.
Unfortunately, 1, 4-dioxane has also been found to contaminate ground water. After a brief
analysis of the compound, bioremediation became a key possibility in the degradation of the
substance in contaminated areas. In order to accomplish this, the bacterium Pseudonocardia
Dioxanivorans was discovered. Pseudonocardia Dioxanivorans has the ability to degrade 1, 4-
dioxane thanks to its multicomponent monooxygenase, which contained specific genes working
together with the monooxygenase. It is my hypothesis that organisms with a multicomponent
monooxygenase system phylogenetically similar Pseudonocardia Dioxanivorans will also be
effective 1, 4-dioxane degraders. After the particular genes of interest were discovered,
programs like BLAST were used to discover similar sequences within many biotechnology
databases. Next Clustal Omega was used to create multiple sequence alignments, as well as
output data which would provide further phylogenetic data from the sequences gathered. The
results of the project showed that while many organism contained components of the
monooxygenase or notable biomarkers, three notable organisms provided sufficient evidence
of being true 1,4 dioxane degraders. These organisms are Rhodococcus sp. YYL, Pseudonocardia
tetrahydrofuranoxydans, and Pseudonocardia sp. ENV478.
Introduction
1, 4 dioxane was used as a solvent for numerous organic and inorganic compounds. This
compound is a clear colorless liquid with an odor similar to ether. 1, 4 dioxane is also soluble in
water. This compound is also known to be highly flammable in both its liquid and vapor state. 1,
4 dioxane is hazardous for humans. Short-term exposure such as inhalation of this chemical can
cause minor ailments such as dizziness and headaches, or major aliments such as irritation of
2. the throat, lungs, eyes. 1, 4 dioxane can also be absorbed through the skin causing mild to
severe skin irritation. Chronic exposure to 1, 4 dioxane can be extremely detrimental, and even
lethal. Studies have shown that long term exposure to 1, 4 dioxane can damage the kidney and
the liver. Multiple studies using exposing rats to 1, 4 dioxane in both their drinking water and
vapor resulting in a number of rats suffering damage to the organs in their endocrine system
(Kasai, T., Kano, H.,Umeda, Y., Sasaki, T., Ikawa,N., Nishizawa, T., . . . Fukushima, S. (2009). These
rats also developed cancerous cells. These studies lead 1, 4 dioxane to be classified as a
probable human carcinogen. Typically, humans only come in contact with this substance as a
part of occupational hazards. However, 1, 4 dioxane has been detected as a contaminant in
both surface and ground water. 1, 4-dioxane is a very dangerous chemical and unfortunately it
is also problematic to get rid of. The purpose of this study is to find bio-degraders, organisms
which can perform bioremediation by degrading one substance and converting it into a
different product. These bio-degraders are often favored for remediation problems because
they are easy to maintain and generally less harmful to the environment.
After performing a literary review on 1, 4 dioxane, I started to search for literature
about organisms. More specifically I was looking for genes which had the ability to degrade this
substance and the organisms they belong too. After reviewing articles I discovered that a key
gene of interest was a monooxygenase component MmoB/DmpM. This gene was mentioned in
the organism Pseudonocardia Dioxanivorans strain 1190. This monooxygenase was particularly
interesting because it did not require other organic substrates to degrade 1,4-dioxane(
Gedalanga, P. B.,Pornwongthong, P., Mora, R., Chiang, S. D.,Baldwin, B., Ogles, D.,& Mahendra, S.
(2014) ) .Interesting moreover, the monooxygenase found in Pseudonocardia Dioxanivorans
contained a multi-component gene cluster which aided in its ability to degrade 1,4 -dioxane.
These components included things such as an alpha and beta subunits, a reductase. The other
genes that were in the complexes were evaluated and used to confirm or scrutinize the results,
however the monooxygenase MmoB/DmpM was the target gene. Another article led me to
examine biomarkers which showed promise in being 1, 4-dioxane degraders. This article
provided me the means to search for genes like phenol-2 monooxygenase and propane
monooxygenase which needed specific substrates to operate. This article also guided me into
3. looking into alcohol dehydrogenase genes. Since there was a lot of information on
Pseudonocardia Dioxanivorans, I used the genes from this organism as a comparative measure
against new information (Gedalanga, P. B.,Pornwongthong, P.,Mora, R., Chiang, S. D., Baldwin, B.,
Ogles, D.,& Mahendra, S. (2014)).
Going into the research project, I wanted to make sure I had enough information to
evaluate the results of my search. Information about the degradation pathway was discovered
in order to get a better idea of target genes and organisms to further look into (Stevenson, E., &
Turnbull, M. (2013, April 17). Thisarticle alsopointedme todifferent avenueswhichcouldbe revisited
for additional experimentation.
In this project I will use principle bioinformatics techniques to approach and analyze
genes capable of degrading 1, 4-dioxane. Starting the project I already know of a select
organism which can perform 1, 4-dioxane degradation so there are a few only possible
outcomes. One is that Pseudonocardia Dioxanivorans is alone in its degradation ability while
the other being that large multitude of organisms which can perform this task. A compromise
between the two possible outcomes is that while the genes themselves are not exclusive to
Pseudonocardia Dioxanivorans, there is a system at work in this organism which makes it more
effective on a critical level than most organisms. It is my hypothesis that organisms with
monooxygenase systems phylogenetically similar Pseudonocardia Dioxanivorans will also be
effective 1, 4-dioxane degraders.
Materials
The materials used in this project were entirely composed of bioinformatics
practices using computational applications and databases. As a result there weren’t any
chemical reagents used. Instead, many different computer applications and databases were
used to conduct and explore the subject material. Although the specifics of the hardware are
not important, it is noteworthy to state that most of the work conducted on this project were
done at the College of Technology computer lab and the library computer lab at the University
of Houston.
4. The key materials of this project are the sequences used during this project. These
sequences come in FASTA form and are found on the National Center of Biotechnology
Information or NCBI. FASTA in terms of this project, is a text based format used to represent
both DNA, RNA, and Protein sequences. These sequences are placed in FASTA format because it
is nearly universal among many different types of Bioinformatics’ applications. Most of the
work done in this experiment will be conducted, translated, or produced from a FASTA format.
NBCI also plays a critical role in this process. NBCI is the central hub for many databases
used to produce information that the project builds off of. Other databases such as PDB will
play a role in the analysis of the protein created by the genes. NBCI contains the Pubmed
Database which was used during most of the literary review. It also contained ascension
numbers, which allowed sequences to be streamed and referenced across other databases.
Also ExPASy and EBI were used as a database for applications. ExPASy is a
Bioinformatics resource portal. This was used as a source for other bioinformatics applications
including GENIO/LOGO, T-Coffee, and the PHYLIPS tools. GENIO/LOGO was used to create the
consensus sequence logo. T-Coffee was the secondary tool used to create multiple sequence
alignments, and the PHYLIP tools are a set of programs ranging from DNA and protein
sequences, as well as phylogeny tree building programs. EBI is an acronym for The European
Bioinformatics Institution. This placed many of the programs used during the production of
Multiple Sequence Alignments or MSAs. The primary program found from EBI and used in the
project was Clustal Omega. This program was able to create MSAs and make outputs using both
visual and FASTA formats. Clustal Omega also had the ability to create phylogeny trees and tree
file output data, which could be used in other programs.
The final program used was Treeview. This program had the ability to read tree file or
phylip tree files outputs, and convert them into the visual images of the phylogeny tree. This
program also had different styles of phylogeny tree. Ultimately, this tool was used to create the
phylogeny trees seen in the results section.
Methods
5. Literary Review
This project has three main goal. The first one is to discover the identity of genes which
could degrade 1, 4 Dioxane. The second is to provide a MSA of the genetic sequence of the
gene in question. The last part is to conduct a phylogenetic tree of the genes with the
organisms that accompany them.
To accomplish my first task, I conducted a literary review. This simply means that I
searched my resources to find publications pertaining to the scope of my study. In this case, it
was the identity of a gene which could degrade the 1, 4 Dioxane substance. The identity of a
gene and organism was discovered using articles discovered on Pub med. Pubmed is one of the
many databases located on the NCBI website. Likewise, the other databases on NCBI like gene,
protein, nucleotide, and Genbank were utilized when gaining find and record new sequences.
During my initial searches the organism.
Articles eventually led me to the discovery of the organism Pseudonocardia
dioxanivorans. More importantly this lead me to my first gene of interest, monooxygenase
component MmoB/DmpM. I found information on the gene using the Gene database located
on NCBI. With this information I was able to gather key features of the gene. The most
important of these features were its family identifies and it’s FASTA. While still using the gene
database located on NCBI I found more genes related to the gene family MmoB/DmpM. The
search of these monooxygenase genes also lead to the discovery of many different
monooxygenases which unlike the monooxygenase component MmoB/DmpM, used different
organic substances to perform and degrade monooxygenase. The result of searches for this
substrate included monooxygenase that used propane, phenol, and toluene as substrates. MY
literary review led me to believe that some substrate dependent tested viable options while
others were not.
BLAST
After I discovered all the genes I could using the NBCI search I used the algorithm BLAST
to analyze the FASTA sequence and compare it with sequences used in other databases. BLAST
6. stands for Basic Local Alignment Search Tool. This tool lead me to a few genes that I missed
during my previous search just using the NBCI database.
This tool also had parameters which allowed me to control my searches. When I would
perform a search using the nucleotide sequence of a gene I would do so using Megablast
parameter. Also I would exclude models and uncultured/ environmental samples in my search
because I felt like it was important to the project that I obtain non hypothetical results.
Furthermore, this search was conducted using the nucleotide collection (nr/nt) database
because it contained the largest source of DNA sequence information. Whenever I used a
protein sequence in BLAST I would use the protein BLAST algorithm, with the DELTA-BLAST
parameter. Searches performed in this method were used under the UniProt/SwissProt
database. This was mostly done because I was more familiar with this databases. The DELTA-
BLAST parameter helped validity my results by excluding matches found with low similarities.
Like before I also excluded models and uncultured/ environmental samples in this search.
Multiple Sequence Alignments
After more genes were discovered and compiled it was time to perform MSAs. To do
this I used EBI’s Clustal Omega program. I kept the parameters at the default for all my
alignments. I broke my sequencing analysis down in a way which would allow me to look at a
type of gene on a separate basis before compiling everything together. The groups examined
were first was the monooxygenase component MmoB/DmpM genes. The genes to follow were
the broken down components of the gene cluster associated with the monooxygenase
component MmoB/DmpM. These genes include the alpha subunit, the beta subunit, and the
reductase. The next genes evaluated were the propane monooxygenase, the phenol2-
monooxygenase, and alcohol dehydrogenase. Once the MSAs were completed, the output files
were converted into identity matrix scores, FASTA MSAs, visual MSAs, and phylogeny trees files.
All four if these output files were created from Clustal Omega. If a particular gene showed a
score at 60% or below on the MSA identity matrix is was excluded from further processing.
Some exceptions include the score being higher on the DNA score but failing the protein
threshold, or vise versa. Another program named T-Coffee was also used to conduct the MSAs.
This program ran sequences using its default parameters. I thought that it was important to
7. provide a second opinion of MSAs. Although the algorithms used between T-coffee and Clustal
Omega might be slightly different, I was mostly looking for huge changes in MSA scores rather
than small ones. In the end I decided to stick with the Clustal Omega MSA
Phylogenetic Tree
After the MSAs were conducted it was time to move into the final phase of the analysis
work for the project. A phylogenic tree was conducted using the sequences discovered. Even
though both Clustal and T-coffee produce a phylogenic tree using the results I decided that I
wanted to use a different program for this. Using the output Phyllis file from the Clustal Omega
output I conducted phylogenic trees using the Tree view program. Trees were completed using
the default parameters of the Treeview programs. The style the phylogenetic tree is produced
in is known as a phylogram. I started using making tree files of gene groups individually like
before. This means the first group of trees contained just monooxygenase component
MmoB/DmpM genes, then the next contained monooxygenase genes with substrates. Finally a
phylogenetic tree using all the sequences I compiled was produced. The trees produced by
Treeview lost their distance information visually placed on the image. To compensate for that.
There is a scale to size distance at the bottom of the left hand corner.
Results
The results of the project yielded results for the genes Alcohol Dehydrogenase, Phenol
2-monooxygenase, Propane monooxygenase, multi-component monooxygenase MMoB/DmpB,
as well as the other components of the multi-component monooxygenase complex. These
additional components include an Alpha and Beta subunit, and a reductase. I thought it was
also important to evaluate the whole all together multi-component unit.
When applicable, both the results of DNA and protein sequences are present. However,
there were situations where either DNA or protein sequences could not be obtained.
Alcohol Dehydrogenase
Protein MSA Score:
Thiscontainsthe score of each proteinsequence usedfromthe multiplesequence alignment.
8. Figure 1
Protein Multiple Sequence Alignment (See Attachment):
Figure 2
Protein sequence consensus Logo (See Attachment):
Using multiple amino acid residue sequences, a consensus sequence is created. A Logo is used
to visually represent the sequence where the height of the residue represents its appearance
the given position. The taller the residue, the more often it appears in that position.
Figure 3
Protein Phylogeny Tree (See Attachment):
This is a phylogeny tree created from the amino acid residue sequences. The phylogenic trees
were created using the Average Distance % Identity.
Figure 4
Phenol 2-monooxygenase
DNA MSA Score:
Thiscontainsthe score of each DNA nucleotide sequence usedfromthe multiplesequencealignment.
Figure 5
DNA Multiple SequenceAlignment(SeeAttachment):
Figure 6
DNA nucleotide sequence consensuslogo(SeeAttachment):
9. Usingmultiple DNA sequences,aconsensussequence iscreated.A Logoisusedto visuallyrepresentthe
sequence where the heightof the residuerepresentsitsappearancethe givenposition.The tallerthe
residue,the more oftenitappearsinthat position.
Figure 8
Protein MSA Score:
Thiscontainsthe score of each proteinsequence usedfromthe multiplesequence alignment.
Figure 9
ProteinsequenceconsensusLogo(See Attachment):
Usingmultiple aminoacidresiduesequences,aconsensus sequence iscreated.A Logoisusedtovisually
representthe sequence wherethe heightof the residue representsitsappearance the givenposition. .
The tallerthe residue,the more oftenitappearsinthatposition.
Figure 10
Protein Multiple SequenceAlignment(SeeAttachment):
Figure 12
Propane Monooxygenase
DNA MSA Score:
Thiscontainsthe score of each DNA nucleotide sequence usedfromthe multiplesequencealignment.
Figure 13
10. DNA nucleotide sequence consensuslogo(SeeAttachment):
Usingmultiple DNA sequences,aconsensussequence iscreated.A Logoisusedto visuallyrepresentthe
sequence where the heightof the residuerepresentsitsappearancethe givenposition.The tallerthe
residue,the more oftenitappearsinthatposition.
Figure 15
Protein MSA Score:
Thiscontainsthe score of each proteinsequence usedfromthe multiplesequence alignment.
Figure 17
Protein Multiple SequenceAlignment(SeeAttachment):
Figure 18
ProteinsequenceconsensusLogo(See Attachment):
Usingmultiple aminoacidresiduesequences,aconsensussequence iscreated.A Logoisusedtovisually
representthe sequence wherethe heightof the residue representsitsappearance the givenposition. .
The tallerthe residue,the more oftenitappearsinthat position.
Figure 19
Protein Phylogeny Tree(See Attachment):
Thisis a phylogenytree createdfromthe aminoacidresidue sequences.The phylogenictreeswere
createdusingthe Average Distance %Identity.
Figure 20
Multi-componentmonooxygenase
DNA MSA Score:
11. Thiscontainsthe score of each DNA nucleotide sequence usedfromthe multiplesequencealignment.
Figure 21
DNA Multiple SequenceAlignment(SeeAttachment):
Figure 22
DNA nucleotide sequence consensuslogo(SeeAttachment):
Usingmultiple DNA sequences,aconsensussequence iscreated.A Logoisusedto visuallyrepresentthe
sequence where the heightof the residuerepresentsitsappearancethe givenposition.The tallerthe
residue,the more oftenitappearsinthatposition.
Figure 23
DNA Phylogeny Tree(See Attachment):
Thisis a phylogenytree createdfromthe DNA nucleotide sequences.The phylogenictreeswere created
usingthe Average Distance %Identity.
Figure 24
Protein MSA Score:
Thiscontainsthe score of each proteinsequence usedfromthe multiplesequence alignment.
12. Figure 25
Protein Multiple SequenceAlignment(SeeAttachment):
Figure 26
ProteinsequenceconsensusLogo(See Attachment):
Usingmultiple aminoacidresiduesequences,aconsensussequence iscreated.A Logoisusedtovisually
representthe sequence wherethe heightof the residue representsitsappearance the givenposition.
The tallerthe residue,the more oftenitappearsinthatposition.
Figure 27
Protein Phylogeny Tree(See Attachment):
Thisis a phylogenytree createdfromthe aminoacidresidue sequences.The phylogenictreeswere
createdusingthe Average Distance %Identity.
Figure 28
AlphaSubunit
DNA MSA Score:
Thiscontainsthe score of each DNA nucleotide sequence usedfromthe multiplesequencealignment.
13. Figure 29
DNA Multiple SequenceAlignment(SeeAttachment):
Figure 30
DNA nucleotide sequence consensuslogo(SeeAttachment):
Usingmultiple DNA sequences,aconsensussequence iscreated.A Logoisusedto visuallyrepresentthe
sequence where the heightof the residuerepresentsitsappearancethe givenposition.The tallerthe
residue,the more oftenitappearsinthatposition.
Figure 31
DNA Phylogeny Tree(See Attachment):
Thisis a phylogenytree createdfromthe DNA nucleotide sequences. The phylogenictreeswere created
usingthe Average Distance %Identity.
Figure 32
Protein MSA Score:
Thiscontainsthe score of each proteinsequence usedfromthe multiplesequence alignment.
Figure 33
14. Protein Multiple SequenceAlignment(SeeAttachment):
Figure 34
ProteinsequenceconsensusLogo(See Attachment):
Usingmultiple aminoacidresiduesequences,aconsensussequence iscreated.A Logoisusedtovisually
representthe sequence wherethe heightof the residue representsitsappearance the givenposition. .
The tallerthe residue,the more oftenitappearsinthatposition.
Figure 35
Protein Phylogeny Tree(See Attachment):
Thisis a phylogenytree createdfromthe amino acidresidue sequences.The phylogenictreeswere
createdusingthe Average Distance %Identity.
Figure 36
Beta Subunit
DNA MSA Score:
Thiscontainsthe score of each DNA nucleotide sequence usedfromthe multiplesequencealignment.
Figure 37
DNA Multiple SequenceAlignment(SeeAttachment):
Figure 38
DNA nucleotide sequence consensuslogo(SeeAttachment):
Usingmultiple DNA sequences,aconsensussequence iscreated.A Logoisusedto visuallyrepresentthe
sequence where the heightof the residuerepresentsitsappearancethe givenposition.The tallerthe
residue,the more oftenitappearsinthatposition.
Figure 39
15. DNA Phylogeny Tree(See Attachment):
Thisis a phylogenytree createdfromthe DNA nucleotide sequences.The phylogenictreeswere created
usingthe Average Distance %Identity.
Figure 40
Protein MSA Score:
Thiscontainsthe score of each proteinsequence usedfromthe multiplesequence alignment.
Figure 41
Protein Multiple SequenceAlignment(See Attachment):
Figure 42
ProteinsequenceconsensusLogo(See Attachment):
Usingmultiple aminoacidresiduesequences,aconsensussequence iscreated.A Logoisusedtovisually
representthe sequence wherethe heightof the residue representsitsappearance the givenposition. .
The tallerthe residue,the more oftenitappearsinthatposition.
Figure 43
Protein Phylogeny Tree(See Attachment):
Thisis a phylogenytree createdfromthe aminoacidresidue sequences.The phylogenictreeswere
createdusingthe Average Distance %Identity.
Figure 44
Reductase
16. DNA MSA Score:
Thiscontainsthe score of each DNA nucleotide sequence usedfromthe multiplesequencealignment.
Figure 45
DNA Multiple SequenceAlignment(See Attachment):
Figure 46
DNA nucleotide sequence consensuslogo(SeeAttachment):
Usingmultiple DNA sequences,aconsensussequence iscreated.A Logoisusedto visuallyrepresentthe
sequence where the heightof the residuerepresentsitsappearancethe givenposition.The tallerthe
residue,the more oftenitappearsinthatposition.
Figure 47
DNA Phylogeny Tree(See Attachment):
Thisis a phylogenytree createdfromthe DNA nucleotide sequences.The phylogenictreeswere created
usingthe Average Distance %Identity.
Figure 48
Protein MSA Score:
Thiscontainsthe score of each proteinsequence usedfromthe multiplesequence alignment.
17. Figure 49
Protein Multiple SequenceAlignment(SeeAttachment):
Figure 50
ProteinsequenceconsensusLogo(See Attachment):
Usingmultiple aminoacidresiduesequences,aconsensussequence iscreated.A Logoisusedtovisually
representthe sequence wherethe heightof the residue representsitsappearance the givenposition. .
The tallerthe residue,the more oftenitappearsinthatposition.
Figure 51
Protein Phylogeny Tree(See Attachment):
Thisis a phylogenytree createdfromthe aminoacidresidue sequences.The phylogenictreeswere
createdusingthe Average Distance %Identity.
Figure 52
Multi-componentGene complex
DNA MSA Score:
Thiscontainsthe score of each DNA nucleotide sequence usedfromthe multiplesequencealignment.
Figure 53
DNA nucleotide sequence consensuslogo(SeeAttachment):
18. Usingmultiple DNA sequences,aconsensussequence is created.A Logoisusedto visuallyrepresentthe
sequence where the heightof the residuerepresentsitsappearancethe givenposition.The tallerthe
residue,the more oftenitappearsinthatposition.
Figure 54
DNA Phylogeny Tree(See Attachment):
Thisis a phylogenytree createdfromthe DNA nucleotide sequences.Itcreatesthe Phylogenictrees
usingthe Average Distance %Identity.
Figure 55
Master PhylogenyTree (SeeAttachment):
This isthe phylogenytree createdfromall the sequencescollectedduringthe project.Itisannotated
withcolorto helpforeasiernavigation.
The firsttree containedthe propane monooxygenaseandthe regularmonooxygenase gene clusters
alongwiththe Phen2-monoxygenase.Thistree doesnotinclude the alcohol dehydrogenase.
Figure 56
The last figure showsthe compete phylogenytree withall the sequencesusedforthe project.
Figure 57
Conclusion
At the start of my project, I was able to find literature that led me to believe Pseudonocardia
dioxanivorans was an organism with the ability to degrade 1, 4-dioxane. Furthermore the
identity of the gene that made this possible was discovered. Monooxygenase MmoB/DmpM
was the target gene which started much of the research. After more research was conducted, it
was discovered that the monooxygenase MmoB/DmpM worked within a gene complex. This
complex contained a reductase, an alpha, and beta subunit. The complex was then analyzed
with its individual components, as well as whole. Part of the reason for analyzing parts
19. individually was to find different sequences that may be lost in the overall gene cluster. When
this was performed only the monooxygenase for the alpha subunit yielded new results.
The initial analysis provided me with the organisms Pseudonocardia sp. K1,
Pseudonocardia sp. ENV478, and Rhodococcus sp. YYL. It is also important to note that while
using DNA sequences, the organism Pseudonocardia sp. K1 was displayed. However, when
using protein the organism changed its name to Pseudonocardia tetrahydrofuranoxydans.
However, Pseudonocardia tetrahydrofuranoxydans and Pseudonocardia sp. K1 are indeed the
same organism. Looking at the percent identity score, you can see that these organisms have
the strongest identity score with our target organism believe Pseudonocardia dioxanivorans.
The gene of interest was analyzed along with the individual components of the gene complex
and the complex as a whole. When it comes to the monooxygenase complex, in both its
individual and complete components, the percent identity score never drops below 90%. This is
a very strong indicator of functional similarity. The percent identify score of the propane
monooxygenase stayed mostly in the 70% range. This implied an identify score strong enough
to be relevant. Alcohol dehydrogenase percent identify score ranged from the mid-80s to high
60s. The range provided me with significant enough results to continue the project. The Phen-2
monooxygenase scores never got above 70% but was never below 60%. This coupled with the
suggestion to investigate from my literary review is what kept these gene in for further
evaluation.
The alcohol dehydrogenase, phen-2 monooxygenase, and propane monooxygenase,
were all evaluated as well using the same analytical biotechnology techniques. The propane
monooxygenase was the only gene discovered to have a gene cluster similar to the previous
monooxygenase gene cluster. This provoked me to exclude propane monooxygenase that were
not a part of the cluster because overall they had low percent identity or they only had
relations to one part of the cluster and no relationship to the gene cluster as a whole. The only
reason the monooxygenase alpha subunits were allowed to keep their singular similarity
matches was because the present identity score was still much too high to exclude. The overall
identity scores of the Propane monooxygenase, Phen-2 monooxygenase, and alcohol
20. dehydrogenase were high enough to be significant, but not as high as the monooxygenase
within the gene cluster previously spoken of.
The consensus logo is a way to visualize the results of the MSA and the percent identity
score. In DNA the gene cluster for the monooxygenase shared the strongest consistency, with it
having multiple matches at a 100% frequency. This was found in the individual components.
Odd enough, while the gene cluster still holds a percent identity score above 90% the
consensus sequences varies frequently among two different nucleotides. In proteins, the
consensus sequence varied. This variation was observed with at least two amino acids sharing a
50% frequency each.
The Phen-2 monooxygenase showed strong and mixed consensus among its DNA logo.
The protein logo showed mixed consensus with three amino acids usually fighting over
consensus. The alcohol dehydrogenase showed strong consensus among its protein sequence.
Many sections had a 100% frequency. The Propane monooxygenase was given a gene logo of
the entire cluster when it came to the nucleotide however, since creating a logo of the same
cluster was problematic on a protein, only the actual propane monooxygenase was given a
logo. The DNA logo of the cluster shows plenty of conflicting consensus and very little 100%
frequency. The protein logo showed much stronger consensus among its sequences with most
sequences in a 50% frequency. The alcohol dehydrogenase only has a protein logo created
because not all of the nucleotide sequences could be found. These sequences show strong
consensus with many 100% frequencies.
Multiple Sequence Alignments were also produced. In typical fashion a “*” represents a
completely conserved residue, ‘’:“indicates a conserved residue, and a “.”. A blank represents a
portion with no kind of conservative match. Represents a semi-conserved residue. MSAs were
conducted for all sequences, however it was problematic to exhibit the MSAs for the complete
gene cluster for monooxygenase and the propane monooxygenase due to the enormous size of
that data. Individual DNA and protein MSAs for each of the components of the monooxygenase
have been provided. The individual and the gene cluster show very strong fully conserved
regions. This matches up with their percent identity score. The propane monooxygenase
21. portion of its gene cluster has been provided. This shows a mixture of both fully conserved and
to a lesser extent, conserved regions. The phen-2 monooxygenase and the alcohol
dehydrogenase show similar results, with a mix of fully, conserved, and semi-conserved regions.
Mostly fully conserved regions.
Phylogeny trees were conducted for every type of gene, however only results from
genes containing more than three entries will be provided. This is because a tree with three or
less results give you little to no practical information, especially with the scope of this project.
When it came to monooxygenase, it is important to notice that our target organism
Pseudonocardia dioxanivorans and the genes associated with it usually closest related to
Rhodococcus sp. YYL. This can be observed in both individual genes and the gene cluster. The
alcohol dehydrogenase genes show a varied amount of diversity among themselves. An
important factor to notice is that a Pseudonocardia dioxanivorans organism was located. If you
look at the phylogeny tree you can see that this organism is more closely related to other
Pseudonocardia rather than the Rhodococcus. Comparing this to the monooxygenase tree may
suggest that while these two organism have strong similarities within this gene, there are still
many avenues were there are different. The final tree is shown with all the genes examined in
this project in based on their protein. Like before, you see Pseudonocardia dioxanivorans and
Rhodococcus sp. YYL being the closest related among each other. The exception to this is when
it comes to the alpha subunit. The alcohol dehydrogenase genes are isolated furthest away
from the rest of the genes. This could suggest that their role in 1, 4-dioxane degradation is
entirely different from the rest of the genes.
The results of this tree prompted me to make another tree without the alcohol
dehydrogenase. This tree uses the entire gene cluster of the primary and propane
monooxygenase. I did this because I thought that the results of the gene complex separated
was mostly redundant. These results showed that Pseudonocardia dioxanivorans and
Rhodococcus sp. YYL are still the closest in relation.
There are errors and limitations that have that occurred during the project. A major
limitation I faced was the amount of sequences available. I had to work with sequences that
22. available from the databases and my ability to find those sequences. This means that I could
have missed a sequence, or that there could possibly be more organisms whose genetic
sequences are not available but also can become 1, 4-dioxane degraders. Another source of
error could be my human error, better explained as my critic on what counts as valuable
information. I wanted to include sequences that I thought were relevant but I fear that I might
have excluded some sequences based on my own exclusion criteria.
The information gained in this project has many useful applications. The first being this
increases the number of organism which are suspected to be 1, 4-dioxane degraders. While
more experiments are needed to evaluate their effectiveness, phylogenetic analysis does
provide evidence to support further study. Having multiple organisms which can perform this
task makes using them for bioremediation purposes more feasible. Furthermore, upon
researching 1, 4-dioxane degradation, I discovered articles about fungi which could perform this
task. This means that more information about different organisms who have the ability to
perform this task may still be out there (Kinne, M., Poraj-Kobielska, M., Ralph, S. A., Ullrich, R.,
Hofrichter, M., & Hammel, K. E. (2009).). This information can also b e evaluated with the
results of this project to examine the different or similar processes both organisms provide to
degrade 1, 4-dioxane.
In summation, propane monooxygenase and phen-2 monooxygenase have the ability to
degrade 1, 4-dioxane, but only when their particular substrates are available. This makes them
less optimal than the other monooxygenase examined in this project. Alcohol Dehydrogenase is
the least related to any of the genes, which would suggest that its role in dioxane degradation is
not as direct as the other genes. The key piece of information obtained from the results showed
that three organisms Pseudonocardia sp. K1, Pseudonocardia sp. ENV478, and Rhodococcus sp.
YYL have genes most closely related to the gene of interest. The gene complex which contains
the monooxygenase, as well as the alpha subunit, beta subunit, reductase, and the
monooxygenase is what gives these organisms more affinity for the degradation task. This
supports the idea that these organisms are true 1, 4-dioxane degradation. Furthermore, the
genes associated with Rhodococcus sp. YYL are more closely related to Pseudonocardia
Dioxanivorans.
23. References
1.Sales, C. M., Mahendra, S., Grostern, A., Parales, R. E., Goodwin, L. A., Woyke, T., . . . Alvarez-Cohen,
L. (2011). Genome Sequence of the 1,4-Dioxane-Degrading Pseudonocardia dioxanivoransStrain
CB1190. Journal of Bacteriology, 193(17), 4549-4550. doi:10.1128/jb.00415-11
2. Gedalanga, P. B., Pornwongthong, P., Mora, R., Chiang, S. D., Baldwin, B., Ogles, D., & Mahendra, S.
(2014). Identification of Biomarker GenesTo Predict Biodegradation of 1,4-Dioxane. Applied and
Environmental Microbiology, 80(10), 3209-3218. doi:10.1128/aem.04162-13
3. 1,4-Dioxane (1,4-Diethyleneoxide). (n.d.). Retrieved April 30, 2016, from
https://www3.epa.gov/airtoxics/hlthef/dioxane.html
4. Kasai, T., Kano, H., Umeda, Y., Sasaki, T., Ikawa, N., Nishizawa, T., . . . Fukushima, S.
(2009). Two-year inhalation study of carcinogenicity and chronic toxicity of 1,4-dioxane in
male rats. Inhalation Toxicology, 21(11), 889-897. doi:10.1080/08958370802629610
5. Stevenson, E., & Turnbull, M. (2013, April 17). 1,4-Dioxane Pathway Map. Retrieved May
09, 2016, from http://eawag-bbd.ethz.ch/diox/diox_map.html
6. Kinne, M., Poraj-Kobielska, M., Ralph, S. A., Ullrich, R., Hofrichter, M., & Hammel, K. E.
(2009). Oxidative Cleavage of Diverse Ethers by an Extracellular Fungal
Peroxygenase. Journal of Biological Chemistry, 284(43), 29343-29349.
doi:10.1074/jbc.m109.040857
Attachments
Alcohol Dehydrogenase
Protein Multiple SequenceAlignment(SeeAttachment)