Successfully reported this slideshow.
Keegan McAuliffe
MCB 432: Computing in
Molecular Biology
The following is my final presentation for MCB 432: detailing the...
Keegan McAuliffe
Henry Chen
Andrew Storm
Dominic Gentile
Team 10 Results and Discussion
Introduction:
The onset of new hig...
Results: (Optional tasks)
The objective of Optional Task 1 was to determine the GC content of each gene. In order to ascer...
The best blastp match for each contig was of the genus Bacteroides, and the overwhelming majority was
of the species Bacte...
Optional Task 6 used PHYRE2 to analyze CDS 1.1_1, 1.1_4, 1.1_14, 1.1_19, 1.1_32, 1.1_54, 1.1_57,
1.1_60, 1.1_68, and 2.1_8...
Optional Task #9 determined if we can find any homologous RNA secondary structures from our assembled
genome. Like all gen...
Discussion:
As we previously alluded to in the discussing the results of Optional Task 3, we used Blastp to
determine the ...
Appendix
Contains 7 tables containing the raw data used to create our Results and
Discussion sections along with 1 figure ...
Table1GenomeAssemblystatistics forTeam10
No.ofReadPairs 47893
No.oflowqualityreads 1763
No.ofassembledReads 102640
No.ofun...
Table 2 Gene annotation summary for scaffolds
CDS/ORFs tRNAs other RNAs
scaffold1.1 95 0 0
scaffold2.1 9 1 0
Table 3. Predicted Gene Coordinates
Scaffold Name Type Start Stop Strand NT Length AA Length GC % Signal Peptide?SP Length...
Table 5. Single best blast hit of annotated ORFs from Team 10
Name Gene Identifier Description Organism % identity E-value...
Table 6. PFAM domain matches for annotated genes from Team 10
Name PFAM ID Description E value
scaffold1.1_1 PF01610.12 Tr...
Table 7. TIGRFAM domain matches for annotated genes from Team 10
Name TIGRFAM ID Description E value
scaffold1.1_5TIGR0405...
Table 8. Phyre2 predicted best crystal structure matches for annotated genes from Team 10
Name
PDB best
match Pct_identity...
Figure 3 is a screenshot of the whole-genome alignment
of our scaffolds against the genome of Bacteroides
vulgatus str. 39...
Prochain SlideShare
Chargement dans…5
×

MCB 432 Final Table PP 01.06.16

198 vues

Publié le

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

MCB 432 Final Table PP 01.06.16

  1. 1. Keegan McAuliffe MCB 432: Computing in Molecular Biology The following is my final presentation for MCB 432: detailing the process our group undertook to determine the identity of a unknown bacteria. We were provided with raw sequence reads of a bacteria, and we converted them into contigs and scaffolds. We assembled the data into a complete genome, then annotated for potential genes to successfully determine the identity of the bacteria as Bacteroides vulgatus str. 3975.
  2. 2. Keegan McAuliffe Henry Chen Andrew Storm Dominic Gentile Team 10 Results and Discussion Introduction: The onset of new high throughput sequencing has increased our ability to analyze genetic information. In this project, we demonstrate how to use raw sequence data from sampled organisms for genetic and genomic analysis. With the raw sequenced reads provided by the PI, we assembled a genome for our unknown microorganism. The genome assembly was accomplished by using the A5ud assembler program (Table 1). With the data generated, we were able to determine the total number of contigs and scaffolds and use these assemblies to predict and annotate genes (Table 2). Assembled genome on hand, we are now capable of searching and analyzing predicted genes in order to characterize our unknown organism, which we accomplished using the Prodigal algorithm for gene prediction. Prodigal generates gene and protein predictions, but does not provide analysis to what those predicted genes and proteins represent. Therefore, we need to employ other programs that function to annotate our predictions and because genes are so complex, we need to be specific in choosing programs for gene analysis. For instance, programs such as Emboss allow you to search for alignments and patterns in your assembly to databases of well-known genes, HMM and Blast searches allow to you to compare protein homology, and many other programs designed to search for features such as tRNA and signal peptides. With this analytical power, we analyzed our genome and present how we accomplished these tasks and our results.
  3. 3. Results: (Optional tasks) The objective of Optional Task 1 was to determine the GC content of each gene. In order to ascertain this information, it was first necessary to assemble our reads into contigs and scaffolds—the objective of Mandatory Task 1. To do this, we first had to unzip or inflate the data of our read, using the “gunzip” command. Next, we ran the A5ud assembler on the data. This generated a file for quality trimming report, assembly report, initial scaffolding report, final scaffold quality check, error corrected reads, contigs, crude scaffolds, broken scaffolds, and final scaffolds. The assembly report contained the GC content for each contig, which we added to Table 3. The average GC content for all contigs is .407. Because GC bonds are more stable than AT bonds, our genome is less stable than a genome of GC content greater than .500. The objective of Optional Task 3 was to determine the best BlastP match for our proteins against the NR database. The first step of Task 3, then, was to determine the proper command to generate a single best match from the NR database for each contig, with an E-value less than 1e-10, as well as the organism to which it belongs, the accession number, and percent identity. The command we used was: blastp –db nr –query TeamProject.faa –out TeamProject.br –evalue 1E-10 –outfmt 6 –max_target_seqs 1 This command gave us the E-value, accession number, and percent identity for the blast blastp match of each contig. However, we still needed to the organism name and description of the gene. For this, we used the program efetch.pl. Using a list of accession names as an input, efetch.pl generated the organism name and gene annotation for each gene of interest. This data was recorded in Table 5. This task was also instrumental in determining the most closely related genus, species, and strain to our scaffolds.
  4. 4. The best blastp match for each contig was of the genus Bacteroides, and the overwhelming majority was of the species Bacteroides vulgatus. More specifically, the strain Bacteroides vulgatus str. 3975 RP4 occurred 9 times out of 104 contigs. Furthermore, this represents 60% of the 15 blast results specific enough to indicate strain. This data led us to conclude that Bacteroides vulgatus str. 3975 is the most closely related strain. The objective of Optional Tasks 4 and 5 were to analyze the CDSs for possible proteins and genes. The scaffold sequence were analyzed using PFAM to determine possible protein matches and TIGRFAM to determine possible gene matches. The hmmscan for the PFAM matches used the Pfam-A database and TeamProject.faa. The hmmscan for the TIGRFAM matches used the TIGRFAMs_14.0.HMM database and TeamProject.faa. The results were compiled into Table 6 and Table 7 from the TeamProject_pfam.txt and TeamProject_tigrfam.txt. Only the best match for each CDS were added to Table 3. The PFAM hmmscan revealed that many of the CDS had at least one related protein. The predicted proteins of CDSs with multiple matches were all closely related. For example, all the predicted proteins for the 1_83 CDS are from the Glycosyl transferase family 2. The TIGRFAM search revealed that there were fewer matches; only 33 to the 191 matches of the PFAM search. Most of the CDS with TIGRFAM matches only have one match. Only CDS 1_15, 1_39, 1_82, and 1_85 have multiple matches. These CDSs only had two matches where several PFAM matches had four or five matches. The TIGRFAM and PFAM matches for each CDS both predicted similar functions for the CDSs that had both TIGRFAM and PFAM matches.
  5. 5. Optional Task 6 used PHYRE2 to analyze CDS 1.1_1, 1.1_4, 1.1_14, 1.1_19, 1.1_32, 1.1_54, 1.1_57, 1.1_60, 1.1_68, and 2.1_8. All CDSs except 1.1_1 and 1.1_32 had a confidence of 100.0; with values of 61.1 and 49.4 respectively. The PHYRE2 predicted proteins agree with the PFAM predictions for all except 1.1_1, 1.1_32, 1.1_57, and 1.1_60. The other possible PHYRE2 matches were also not the same as the PFAM results. This may be because the structures of the PFAM matches are not in the PHYRE2 database. For Optional Task 7 we used looked for more specific features such as signal peptides. We used our assembled scaffold (team.fasta) and compared it to a reference database with gram negative prokaryotes, we were able to identify potential signal peptides and determined the length of these peptides. We compared our data to gram negative prokaryotes because our previous blast analysis identified genes and proteins matched those found in the gram negative genus Bacteriodes. The output data (which can be located in the file TeamProj_SigP_Summary.txt) specifically denoted the presence or absence of the signal peptides and the cutoff points of those peptides (C-value). This allowed us to determine the predicted lengths of the peptides. The results can be found in Table 3. The objective of Optional Task 8 was to analyze the presence of rho-independent transcriptional terminators. This is a particularly useful application as intrinsic terminators typically denote genes that are actively transcribed. In order to accomplish this task, we needed to run our genome alignment (team.fasta) for a RHO independent terminator database search while supplying the search with predicted gene coordinates. These predicted gene coordinates were determined through our EMBOSS infoseq analysis of predicted proteins on our assembly and restructured into the TeamProj.coords file for use with our RHO analysis program. The report generated can be found in the file TeamProj_tt + TeamProj_tt.txt and the results of which predicted genes had identifiable RHO independent terminators are listed in Table 3.
  6. 6. Optional Task #9 determined if we can find any homologous RNA secondary structures from our assembled genome. Like all genes, tRNA structure can provide valuable information on the function and origin of the gene, which can be incredibly valuable when characterizing an unknown genome. With our assembled genome in hand (team.fasta) we searched for matches in conserved RNA structures with a handful of RFAM databases: RF00005, RF00010, RF00023, RF00029, RF00059, RF00174, RF00177, RF01693, RF01694, RF01726, RF01998, and RF02001. The data can be found as TeamProj_RF*.txt. From our search we only found 1 tRNA match and include that match in information on the matched gene in Table 3. For Optional Task 14, we constructed an alignment of our scaffolds with the genome of the bacterial strain with the most sequence matches, which we determined to be Bacteroides vulgatus str. 3975 RP4. On NCBI, we found 184 contigs of a whole genome-sequencing project for this strain. We concatenated these contigs to create a whole genome, to which we compared our scaffolds using blastn. With that blast report as a reference, we aligned the genomes using “act” and saved a screenshot of part of the alignment as Figure 3.
  7. 7. Discussion: As we previously alluded to in the discussing the results of Optional Task 3, we used Blastp to determine the best match of each contig within the database “NR.” This data, located in Table 5, clearly indicates that genus of the closest relative is Bacteroides. After all, according to our blastp results, the best match of every contig corresponds to the genus Bacteroides. We can further assert that the species is Bacteroides vulgatus. 43 of the 104 contigs list Bacteroides vulgatus as their best match, and of the blast matches that were specific to species, 43 of 49 contigs (87.76%) list Bacteroides vulgatus. We can delve even deeper into the identity of the closest relative, as of the 104 contigs we were searching against, the strain Bacteroides vulgatus str. 3975 RP4 occurred 9 times. Thus, 9 of 15 blast results specific enough to indicate strain list Bacteroides vulgatus str. 3975 RP4. These data led us to conclude that Bacteroides vulgatus str. 3975 is the most closely related strain.
  8. 8. Appendix Contains 7 tables containing the raw data used to create our Results and Discussion sections along with 1 figure showing our genome alignment
  9. 9. Table1GenomeAssemblystatistics forTeam10 No.ofReadPairs 47893 No.oflowqualityreads 1763 No.ofassembledReads 102640 No.ofunassembledReads 2382 No.ofContigs 2 No.ofScaffolds 2 Totalntlengthofscaffolds 126196 Length %G+C No.ofreads mapped Coverage Contig 100.0 119,977 40.61% 4851245 6065.0 Contig 100.1 6,219 37.58% 240956 5811.0
  10. 10. Table 2 Gene annotation summary for scaffolds CDS/ORFs tRNAs other RNAs scaffold1.1 95 0 0 scaffold2.1 9 1 0
  11. 11. Table 3. Predicted Gene Coordinates Scaffold Name Type Start Stop Strand NT Length AA Length GC % Signal Peptide?SP Length (AA) Best Blast Hit Blast description scaffold 1.1 1_1 CDS 3 611 - 609 202 0.406 N gi|496057719|ref|WP_008782226.1| transposase, partial scaffold 1.1 1_2 CDS 845 3022 - 2178 725 0.405 Y 21 gi|649547948|gb|KDS54658.1| hypothetical protein M099_1756 scaffold 1.1 1_3 CDS 3539 3766 - 228 75 0.403 N gi|649547946|gb|KDS54656.1| glycoside hydrolase family 88 domain protein scaffold 1.1 1_4 CDS 3949 4905 - 957 318 0.383 N gi|492435030|ref|WP_005843062.1| MULTISPECIES: transcriptional regulator scaffold 1.1 1_5 CDS 5062 6291 + 1230 409 0.408 N gi|492435027|ref|WP_005843060.1| TonB-dependent receptor scaffold 1.1 1_6 CDS 6311 7198 + 888 295 0.429 Y 18 gi|492435023|ref|WP_005843058.1| hypothetical protein scaffold 1.1 1_7 CDS 7536 8942 + 1407 468 0.396 Y 21 gi|649547942|gb|KDS54652.1| ahpC/TSA family protein scaffold 1.1 1_8 CDS 9027 9767 - 741 246 0.396 N gi|649547941|gb|KDS54651.1| ahpC/TSA family protein scaffold 1.1 1_9 CDS 10111 12657 + 2547 848 0.421 N gi|495945682|ref|WP_008670261.1| MULTISPECIES: hypothetical protein scaffold 1.1 1_10 CDS 12750 15755 - 3006 1001 0.36 N gi|495945680|ref|WP_008670259.1| MULTISPECIES: hypothetical protein scaffold 1.1 1_11 CDS 15884 16252 + 369 122 0.477 Y 19 gi|492458337|ref|WP_005851052.1| alpha-L-fucosidase scaffold 1.1 1_12 CDS 16394 17275 - 882 293 0.468 N gi|492434987|ref|WP_005843035.1| tRNA dimethylallyltransferase 1 scaffold 1.1 1_13 CDS 17363 18388 - 1026 341 0.429 N gi|492434984|ref|WP_005843033.1| MULTISPECIES: hypothetical protein scaffold 1.1 1_14 CDS 18424 19740 - 1317 438 0.432 N gi|492434981|ref|WP_005843031.1| MULTISPECIES: UDP-N- acetylglucosamine acyltransferase scaffold 1.1 1_15 CDS 19846 21519 + 1674 557 0.476 N gi|492458346|ref|WP_005851058.1| MULTISPECIES: hydroxymyristoyl- ACP dehydratase scaffold 1.1 1_16 CDS 21680 21880 + 201 66 0.454 N gi|492458349|ref|WP_005851060.1| MULTISPECIES: UDP-3-O- acylglucosamine N-acyltransferase scaffold 1.1 1_17 CDS 22035 22727 + 693 230 0.43 N gi|500644323|ref|WP_011964621.1| phosphohydrolase scaffold 1.1 1_18 CDS 22796 23239 - 444 147 0.453 N gi|492434969|ref|WP_005843024.1| MULTISPECIES: orotidine 5'- phosphate decarboxylase scaffold 1.1 1_19 CDS 23255 23524 - 270 89 0.47 N gi|492434967|ref|WP_005843023.1| MULTISPECIES: peptide chain release factor 1 scaffold 1.1 1_20 CDS 23527 23871 - 345 114 0.471 N gi|492458355|ref|WP_005851064.1| MULTISPECIES: phosphoribosylformylglycinamidine cyclo-ligase scaffold 1.1 1_21 CDS 24081 24527 + 447 148 0.31 N gi|492434963|ref|WP_005843021.1| hypothetical protein scaffold 1.1 1_22 CDS 24636 24818 + 183 60 0.409 N gi|492434961|ref|WP_005843020.1| MULTISPECIES: toxin Fic
  12. 12. Table 5. Single best blast hit of annotated ORFs from Team 10 Name Gene Identifier Description Organism % identity E-value 1_1 gi|496057719|ref|WP_008782226.1| transposase, partial Bacteroides sp. 3_1_40A 100 8.00E-88 1_2 gi|649547948|gb|KDS54658.1| hypothetical protein M099_1756 Bacteroides vulgatus str. 3975 RP4 100 4.00E-62 1_3 gi|649547946|gb|KDS54656.1| glycoside hydrolase family 88 domain protein Bacteroides vulgatus str. 3975 RP4 100 6.00E-62 1_4 gi|492435030|ref|WP_005843062.1| MULTISPECIES: transcriptional regulator Bacteroides 100 5.00E-82 1_5 gi|492435027|ref|WP_005843060.1| TonB-dependent receptor Bacteroides vulgatus 100 0 1_6 gi|492435023|ref|WP_005843058.1| hypothetical protein Bacteroides vulgatus 100 0 1_7 gi|649547942|gb|KDS54652.1| ahpC/TSA family protein Bacteroides vulgatus str. 3975 RP4 100 0 1_8 gi|649547941|gb|KDS54651.1| ahpC/TSA family protein Bacteroides vulgatus str. 3975 RP4 100 0 1_9 gi|495945682|ref|WP_008670261.1| MULTISPECIES: hypothetical protein Bacteroides 99.61 0 1_10 gi|495945680|ref|WP_008670259.1| MULTISPECIES: hypothetical protein Bacteroides 97.22 2.00E-16 1_11 gi|492458337|ref|WP_005851052.1| alpha-L-fucosidase Bacteroides vulgatus 100 0 1_12 gi|492434987|ref|WP_005843035.1| tRNA dimethylallyltransferase 1 Bacteroides vulgatus 100 0 1_13 gi|492434984|ref|WP_005843033.1| MULTISPECIES: hypothetical protein Bacteroides 100 9.00E-131 1_14 gi|492434981|ref|WP_005843031.1| MULTISPECIES: UDP-N-acetylglucosamine acyltransferaseBacteroides 100 3.00E-180 1_15 gi|492458346|ref|WP_005851058.1| MULTISPECIES: hydroxymyristoyl-ACP dehydrataseBacteroides 100 0 1_16 gi|492458349|ref|WP_005851060.1| MULTISPECIES: UDP-3-O-acylglucosamine N-acyltransferaseBacteroides 100 0 1_17 gi|500644323|ref|WP_011964621.1| phosphohydrolase Bacteroides vulgatus 100 0 1_18 gi|492434969|ref|WP_005843024.1| MULTISPECIES: orotidine 5'-phosphate decarboxylaseBacteroides 100 0 1_19 gi|492434967|ref|WP_005843023.1| MULTISPECIES: peptide chain release factor 1 Bacteroides 100 0 1_20 gi|492458355|ref|WP_005851064.1| MULTISPECIES: phosphoribosylformylglycinamidine cyclo-ligaseBacteroides 100 0 1_21 gi|492434963|ref|WP_005843021.1| hypothetical protein Bacteroides vulgatus 100 6.00E-138 1_22 gi|492434961|ref|WP_005843020.1| MULTISPECIES: toxin Fic Bacteroides 100 0 1_23 gi|492458359|ref|WP_005851066.1| MULTISPECIES: hypothetical protein Bacteroides 100 6.00E-43 1_24 gi|492434958|ref|WP_005843019.1| hypothetical protein Bacteroides vulgatus 99.64 0 1_25 gi|492458364|ref|WP_005851068.1| MULTISPECIES: hypothetical protein Bacteroides 100 0 1_26 gi|492458366|ref|WP_005851069.1| MULTISPECIES: membrane protein Bacteroides 100 2.00E-43 1_27 gi|492458368|ref|WP_005851070.1| MULTISPECIES: hypothetical protein Bacteroides 100 9.00E-114 1_28 gi|492458370|ref|WP_005851071.1| MULTISPECIES: beta-N-acetylhexosaminidase Bacteroides 100 0 1_29 gi|492434942|ref|WP_005843009.1| MULTISPECIES: endonuclease Bacteroides 99.71 0 1_30 gi|511016443|ref|WP_016270813.1| excinuclease ABC subunit A Bacteroides vulgatus 100 0 1_31 gi|492434935|ref|WP_005843004.1| MULTISPECIES: hypothetical protein Bacteroides 100 0 1_32 gi|492434933|ref|WP_005843003.1| MULTISPECIES: chromate transporter Bacteroides 100 1.00E-131 1_33 gi|492434930|ref|WP_005843001.1| MULTISPECIES: chromate transporter Bacteroides 100 1.00E-105 1_34 gi|511016442|ref|WP_016270812.1| hypothetical protein Bacteroides vulgatus 100 0 1_35 gi|511016441|ref|WP_016270811.1| phosphoribosylformylglycinamidine synthase Bacteroides vulgatus 100 0 1_36 gi|492434921|ref|WP_005842995.1| MULTISPECIES: translocator protein, LysE familyBacteroides 100 4.00E-150 1_37 gi|492434917|ref|WP_005842993.1| MULTISPECIES: hypothetical protein Bacteroides 100 5.00E-127 1_38 gi|492458387|ref|WP_005851079.1| MULTISPECIES: dTDP-4-dehydrorhamnose reductaseBacteroides 100 0 1_39 gi|492434911|ref|WP_005842989.1| MULTISPECIES: peptide chain release factor 3 Bacteroides 100 0 1_40 gi|492434907|ref|WP_005842987.1| MULTISPECIES: molecular chaperone DnaJ Bacteroides 100 0 1_41 gi|492434904|ref|WP_005842985.1| dihydrofolate reductase Bacteroides vulgatus 100 0 1_42 gi|548318542|ref|WP_022508241.1| hypothetical protein Bacteroides vulgatus CAG:6 100 1.00E-174 1_43 gi|492434896|ref|WP_005842980.1| hypothetical protein Bacteroides vulgatus 100 0 1_44 gi|492458409|ref|WP_005851092.1| transcriptional regulator Bacteroides vulgatus 99.7 0 1_45 gi|492434890|ref|WP_005842976.1| MULTISPECIES: hypothetical protein Bacteroides 100 1.00E-44 1_46 gi|492434887|ref|WP_005842974.1| hypothetical protein Bacteroides vulgatus 100 0 1_47 gi|500644291|ref|WP_011964611.1| hypothetical protein Bacteroides vulgatus 100 0
  13. 13. Table 6. PFAM domain matches for annotated genes from Team 10 Name PFAM ID Description E value scaffold1.1_1 PF01610.12 Transposase 2.90E-25 scaffold1.1_2 PF11396.3 Protein of unknown function (DUF2874) 7.80E-15 scaffold1.1_4 PF03965.11 Penicillinase repressor 2.40E-25 scaffold1.1_5 PF03544.9 Gram-negative bacterial TonB protein C-termi 2.50E-23 scaffold1.1_5 PF13715.1 Domain of unknown function (DUF4480) 1.50E-16 scaffold1.1_5 PF05569.6 BlaR1 peptidase M56 1.00E-11 scaffold1.1_5 PF13620.1 Carboxypeptidase regulatory-like domain 2.90E-10 scaffold1.1_5 PF07715.10 TonB-dependent Receptor Plug Domain 2.10E-06 scaffold1.1_6 PF14559.1 Tetratricopeptide repeat 6.20E-13 scaffold1.1_6 PF13414.1 TPR repeat 6.70E-12 scaffold1.1_6 PF07719.12 Tetratricopeptide repeat 2.90E-11 scaffold1.1_6 PF13428.1 Tetratricopeptide repeat 2.00E-10 scaffold1.1_6 PF13432.1 Tetratricopeptide repeat 9.60E-10 scaffold1.1_6 PF13429.1 Tetratricopeptide repeat 5.30E-08 scaffold1.1_6 PF12895.2 Anaphase-promoting complex, cyclosome, subun 1.30E-07 scaffold1.1_6 PF13431.1 Tetratricopeptide repeat 6.80E-06 scaffold1.1_7 PF00578.16 AhpC/TSA family 1.30E-11 scaffold1.1_7 PF00255.14 Glutathione peroxidase 4.20E-08 scaffold1.1_7 PF14289.1 Domain of unknown function (DUF4369) 1.70E-06 scaffold1.1_8 PF13905.1 Thioredoxin-like 1.40E-14 scaffold1.1_8 PF13098.1 Thioredoxin-like domain 1.90E-14 scaffold1.1_8 PF00085.15 Thioredoxin 2.70E-11 scaffold1.1_8 PF08534.5 Redoxin 4.30E-11 scaffold1.1_8 PF00578.16 AhpC/TSA family 1.00E-07 scaffold1.1_11 PF01120.12 Alpha-L-fucosidase 2.60E-87 scaffold1.1_12 PF01715.12 IPP transferase 7.70E-64 scaffold1.1_12 PF01745.11 Isopentenyl transferase 3.00E-12 scaffold1.1_12 PF04851.10 Type III restriction enzyme, res subunit 0.00022 scaffold1.1_13 PF07929.6 Plasmid pRiA4b ORF-3-like protein 4.00E-11 scaffold1.1_14 PF13720.1 Udp N-acetylglucosamine O-acyltransferase; D 1.20E-28 scaffold1.1_14 PF00132.19 Bacterial transferase hexapeptide (six repea 1.10E-25 scaffold1.1_15 PF03331.8 UDP-3-O-acyl N-acetylglycosamine deacetylase 6.00E-74 scaffold1.1_15 PF07977.8 FabA-like domain 1.10E-35 scaffold1.1_16 PF00132.19 Bacterial transferase hexapeptide (six repea 1.10E-29 scaffold1.1_16 PF04613.9 UDP-3-O-[3-hydroxymyristoyl] glucosamine N-a 7.00E-17 scaffold1.1_16 PF14602.1 Hexapeptide repeat of succinyl-transferase 1.20E-10 scaffold1.1_17 PF01966.17 HD domain 2.90E-08 scaffold1.1_18 PF00215.19 Orotidine 5'-phosphate decarboxylase / HUMPS 9.20E-30 scaffold1.1_19 PF03462.13 PCRF domain 3.40E-39 scaffold1.1_19 PF00472.15 RF-1 domain 2.60E-33 scaffold1.1_20 PF02769.17 AIR synthase related protein, C-terminal dom 1.70E-12 scaffold1.1_22 PF13310.1 Virulence protein RhuM family 5.70E-110 scaffold1.1_24 PF02638.10 Glycosyl hydrolase like GH101 1.80E-53 scaffold1.1_24 PF13200.1 Putative glycosyl hydrolase domain 3.40E-07 scaffold1.1_25 PF02554.9 Carbon starvation protein CstA 8.90E-79 scaffold1.1_25 PF13722.1 C-terminal domain on CstA (DUF4161) 2.30E-24
  14. 14. Table 7. TIGRFAM domain matches for annotated genes from Team 10 Name TIGRFAM ID Description E value scaffold1.1_5TIGR04057 SusC_RagA_signa: TonB-dependent outer membrane receptor, SusC/RagA subfamily, signature region2.70E-16 scaffold1.1_5TIGR01352 tonB_Cterm: TonB family C-terminal domain 2.70E-12 scaffold1.1_12TIGR00174 miaA: tRNA dimethylallyltransferase 5.90E-75 scaffold1.1_14TIGR01852 lipid_A_lpxA: acyl-[acyl-carrier-protein]-UDP-N-acetylglucosamine O-acyltransferase 1.70E-92 scaffold1.1_15TIGR00325 lpxC: UDP-3-O-[3-hydroxymyristoyl] N-acetylglucosamine deacetylase 2.50E-56 scaffold1.1_15TIGR01750 fabZ: beta-hydroxyacyl-(acyl-carrier-protein) dehydratase FabZ 3.90E-49 scaffold1.1_16TIGR01853 lipid_A_lpxD: UDP-3-O-[3-hydroxymyristoyl] glucosamine N-acyltransferase LpxD 3.60E-105 scaffold1.1_18TIGR02127 pyrF_sub2: orotidine 5'-phosphate decarboxylase 3.60E-72 scaffold1.1_19TIGR00019 prfA: peptide chain release factor 1 1.10E-137 scaffold1.1_30TIGR00630 uvra: excinuclease ABC subunit A 0 scaffold1.1_38TIGR01214 rmlD: dTDP-4-dehydrorhamnose reductase 1.90E-89 scaffold1.1_39TIGR00503 prfC: peptide chain release factor 3 6.10E-207 scaffold1.1_39TIGR00231 small_GTP: small GTP-binding protein domain 2.20E-25 scaffold1.1_49TIGR02227 sigpep_I_bact: signal peptidase I 1.30E-19 scaffold1.1_52TIGR01730 RND_mfp: efflux transporter, RND family, MFP subunit 8.80E-48 scaffold1.1_56TIGR00221 nagA: N-acetylglucosamine-6-phosphate deacetylase 1.30E-81 scaffold1.1_57TIGR00057 TIGR00057: tRNA threonylcarbamoyl adenosine modification protein, Sua5/YciO/YrdC/YwlC family1.20E-44 scaffold1.1_59TIGR00460 fmt: methionyl-tRNA formyltransferase 8.00E-81 scaffold1.1_61TIGR02937 sigma70-ECF: RNA polymerase sigma factor, sigma-70 family 4.40E-29 scaffold1.1_63TIGR01163 rpe: ribulose-phosphate 3-epimerase 1.00E-83 scaffold1.1_64TIGR00360 ComEC_N-term: ComEC/Rec2-related protein 8.50E-27 scaffold1.1_67TIGR03990 Arch_GlmM: phosphoglucosamine mutase 1.80E-160 scaffold1.1_69TIGR00539 hemN_rel: putative oxygen-independent coproporphyrinogen III oxidase 4.50E-87 scaffold1.1_71TIGR00231 small_GTP: small GTP-binding protein domain 1.10E-18 scaffold1.1_76TIGR00166 S6: ribosomal protein S6 2.00E-25 scaffold1.1_77TIGR00165 S18: ribosomal protein S18 1.90E-33 scaffold1.1_78TIGR00158 L9: ribosomal protein L9 1.00E-35 scaffold1.1_82TIGR01579 MiaB-like-C: MiaB-like tRNA modifying enzyme 3.00E-122 scaffold1.1_82TIGR00089 TIGR00089: radical SAM methylthiotransferase, MiaB/RimO family 1.10E-113 scaffold1.1_85TIGR00525 folB: dihydroneopterin aldolase 5.10E-30
  15. 15. Table 8. Phyre2 predicted best crystal structure matches for annotated genes from Team 10 Name PDB best match Pct_identity Confidence Aligned region Description 1.1_1 c3f9kV 22 61.1 89-115 two domain fragment of hiv-2 integrase in complex with ledgf ibd 1.1_4 d1sd4a 19 100 3-120 Penicillinase repressor 1.1_14 c3i3aC 39 100 2-255 transferase, structural basis for the sugar nucleotide and acyl chain2 selectivity of leptospira interrogans lpxa 1.1_19 c3d5cX 43 100 8-369 peptide chain release factor 1, structural basis for translation termination on the 70s ribosome 1.1_32 c3dboA 29 49.4 36-67 toxin/antitoxin, crystal structure of a member of the vapbc family of toxin-antitoxin2 systems, vapbc-5, from mycobacterium tuberculosis 1.1_54 c4mt4C 12 100 27-478 transport protein, crystal structure of the campylobacter jejuni cmec outer membrane2 channel 1.1_57 c2eqaA 23 100 6-191 rna binding protein, crystal structure of the hypothetical sua5 protein from2 sulfolobus tokodaii 1.1_60 c3k6oA 24 100 29-237 structural genomics, unknown function, crystal structure of protein of unknown function duf13442 (yp_001299214.1) from bacteroides vulgatus atcc 8482 1.1_68 c1upsB 16 100 21-262 glycosyl hydrolase, glcnac[alpha]1-4gal releasing endo-[beta]-galactosidase2 from clostridium perfringens
  16. 16. Figure 3 is a screenshot of the whole-genome alignment of our scaffolds against the genome of Bacteroides vulgatus str. 3975 RP4, which we determined to be the strain with the most blastp matches against our contigs.

×