1. Kaitlin Hart
Protein Function Prediction Using ProMol,
PyMol, and other Bioinformatics Tools
Rochester Institute of Technology
2. Introduction
● 3,454 (as of 7/25/2014) PDB entries with unknown function in
the Protein Data Bank
● The purpose of this project is to determine the hypothetical
function of proteins in a timely and cost effective manner
● In silico methods allow us to rapidly narrow candidate functions
● In vitro characterization alone is expensive and time consuming
9. AutoDock Vina
O. Trott, A. J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a
new scoring function, efficient optimization and multithreading, Journal of
Computational Chemistry 31 (2010) 455-461.
11. Conclusion and Future Plans
● 4EZI has been hypothesized as possibly hydrolase, or
more specifically, an esterase
● 4EZI shows little activity with p-nitrophenyl acetate
● AutoDock suggests that inositol would be a good
substrate candidate
Conclusions:
Future Plans:
● Continue work with AutoDock to find other possible
substrates
● Test substates with kinetics assays
12. Acknowledgements
Thank you to all of the present and past members of the SBEVSL (Structural Biology Extensible
Visualization Scripting Language) project.
This project was funded by RIT, Dowling College, the National Institute of Health
(2R15GM078077 & 3R15GM078077-02S1), and the National Science Foundation (NSF ATE
040208).
Dr. Paul Craig
Dr. Lea
Michel
Greg Dodge
Dr. Herbert
Bernstein
Dr. Jeff Mills
13. References
Altschul, S. F., T. L. Madden, A. A. Schäffer, J Zhang, Z Zhang, W Miller, and D J Lipman. “Gapped BLAST and
PSI-BLAST: a New Generation of Protein Database Search Programs.” Nucleic Acids Research 25, no. 17
(September 1, 1997): 3389–3402.
Bernstein, F. C., T. F. Koetzle, G. J.Williams, E. F. Meyer Jr, M. D. Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi,
and M. Tasumi. “The Protein Data Bank: a Computer-based Archival File for Macromolecular Structures.” Journal
of Molecular Biology 112, no. 3 (May 25, 1977): 535–542.
Furnham N., G.L Holliday, T.A. de Beer, J.O. Jacobsen, W.R. Pearson, J.M. Thornton. “The Catalytic Site Atlas 2.0:
cataloging catalytic sites and residues identified in enzymes.” Nucleic Acids Research 42. (January
2014):D485-D489. doi: 10.1093/nar/gkt1243.
Hanson, B., C. Westin, M. Rosa, A. Grier, M. Osipovitch, M. MacDonald, G. Dodge, P. Boli, C. Corwin, H. Kessler, T.
McKay, H. Bernstein, P. Craig. “Estimation of Protein Function Using Template-Based Alignment of Enzyme
Active Sites.” BMC Bioinformatics 15, no. 87. (March 27, 2014): doi:10.1186/1471-2105-15-87
Holm, L., and P. Rosenström. “Dali Server: Conservation Mapping in 3D.” Nucleic Acids Research 38, no. suppl 2
(July 1, 2010): W545–W549. doi:10.1093/nar/gkq366.
O. Trott, A. J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a new scoring function,
efficient optimization and multithreading, Journal of Computational Chemistry 31 (2010) 455-461
Punta M., P.C. Coggill, R.Y. Eberhardt, J. Mistry, J. Tate, C. Boursnell, N. Pang, K. Forslund, G. Ceric, J. Clements, A.
Heger, L. Holm, E.L.L. Sonnhammer, S.R. Eddy, A. Bateman, R.D. Finn. “The Pfam Protein Families
Database.” Nucleic Acids Research 40, no. D1 (November 29, 2011): D290-D301. doi:10.1093/nar/gk
-Protein Data Bank is a repository of protein structures. All publicly relseased structures are published there
-There are almost 3,500 proteins in the PDB with no assigned function; basically, we know the structure, but we do not know what these proteins actually do
-purpose of our project was to analyze these proteins in an attempt to find their function
- we can do this via in silico (meaning using computers), which allows us to rapidly narrow candidate functions
-can do this via in vitro methods (testing in the wet lab), but this is costly and time consuming, but accurate
-we need to use both methods when characterizing proteins
-we analyzed 4EZI in silico using three additional bioinformatics tools (BLAST, Dali, and Pfam), which categorize proteins based on entire sequence and/or 3D structure
-after we compared the results of all for computer programs, we performed literature searches to determine the best methods for testing the function of the protein in the wet lab
-In vitro (meaning in the wet lab) characterization involved mixing the enzyme with different substrates. The rate at which the enzyme breaks down the different substrates was measured.
-ProMOL is a computer program used to visualize structural alignment among proteins
- developed by RIT students in 2008
-ProMol looks at specific parts of proteins that we know the function for, called the active site. This is where substrates bind to the protein to catalyze chemical reactions (hand demo)
-ran most of the 3,000+ proteins of unknown function through ProMOL
-found the promising hit, 4EZI, which we decided to pursue
-started with running 4ezi through promol
-best match was 1orv, a hydrolase. Hydrolases are enzymes which cut bonds by adding water molecules
-in this image, 4EZI is red and 1orv is blue. In ProMOL, this is considered an excellent match
-BLAST compares sequences of proteins. When we ran the sequence of 4EZI through BLAST, the best hit was 3H2H, which is a specific type of hydrolase, called an esterase. Esterases cut molecules with carbon chains with less than 10 carbons.
-Dali compares 3D structures of proteins. When we ran 4EZI through the Dali database, the best hit was again the esterase 3H2H
-query cover: percent of 3h2h that is aligned with 4ezi: 3h2h has an extra loop that might have a different function
-e value: expecation value: probability that this alignment will randomly occur. more negative, the better the value (it wouldn’t normally be expected to happen)
-identical: percent that they are identical
-the image on the left is from a previous slide; remember that 1orv is a hydrolase
-The image on the right was generated by running the esterase (a type of hydrolase) 3h2h (the best hit from BLAST and Dali) through ProMOL.
-as you can see, the two alignments are very similar
4EZI and 1ORV
RMSD: 0.36RMSD Alpha: 0.19RMSD:Alpha & Beta: 0.21
4EZI and 3H2H
RMSD All: 0.24RMSD Alpha: 0.17RMSD Alpha & Beta: 0.21
acetate
y = 0.0011ln(x) + 0.0089
R² = 0.924
PNP is a chromogenic substrate
slope: 0.02692
Dr. Oleg Trott
Figure 7. Binding of ITP in 4EZI active site. Serine-195 is highlighted in lime green within the active site of 4EZI. The Phosphate group of the ITP interacts with the hydroxy group of the serine with a distance of 2.9 Å. ITP bound to 4EZI with an affinity of -6.9 kcal/mol.