This talk presents an on line decision support system for structural biologists who are interested in performing multiple protein structure comparisons, via multiple methods, in one go.
Web & Social Media Analytics Previous Year Question Paper.pdf
Protein Structure Alignment and Comparison
1. The ProCKSI-Server
An on-line Decision Support System for
Protein Structure Comparison
Natalio Krasnogor
www.cs.nott.ac.uk/~nxk
Natalio.Krasnogor@Nottingham.ac.uk
Interdisciplinary Optimisation Laboratory
Automated Scheduling, Optimisation & Planning Research Group
School of Computer Science and Information Technology
Centre for Integrative Systems Biology
School of Biology
Centre for Healthcare Associated Infections
Institute of Infection, Immunity & Inflammation
University of Nottingham
27th November 2008, University of Warwick 1
2. Outline
Introduction
− Brief introduction to proteins
− Protein structures Comparison
− Methods
ProCKSI
− Motivation
− External Methods
− USM & MAX-CMO
− Consensus building
Results
− From a structural bioinformatics perspective
− From a Computational perspective
Conclusions
Acknowledgement
27th November 2008, University of Warwick 2
3. Introduction
www.procksi.org
27th November 2008, University of Warwick 3
4. What are Proteins?
Proteins are
biological molecules
of primary
importance to the
functioning of living
organisms
Perform many and
varied functions
27th November 2008, University of Warwick 4
5. Structural Proteins: the organism's basic building blocks, eg.
collagen, nails, hair, etc
Enzymes: biological engines which mediate multitude of biochemical
reactions. Usually enzymes are very specific and catalyze only a
single type of reaction, but they can play a role in more than one
pathway.
Transmembrane proteins: they are the cell’s housekeepers, eg. By
regulating cell volume, extraction and concentration of small
molecules from the extracellular environment and generation of ionic
gradients essential for muscle and nerve cell function (sodium/
potasium pump is an example)
27th November 2008, University of Warwick 5
6. Protein Structures
Varying: size, shape, structure
Structure determines their
biological activity
“Natures Robots”
Understanding protein structure
is key to understanding function
and dysfunction
27th November 2008, University of Warwick 6
7. Components of Proteins
Build Blocks:
− Amino Acids
− Common Basic Unit
Livingstone and Barton:(1993)
• Distinct “side chains”
• 20 Amino Acid Types
27th November 2008, University of Warwick 7
9. Components of Proteins
•Thousands of different physicochemical and biochemical properties (AAIndex)
• Thus proteins are beautiful combinatorial beasts!
27th November 2008, University of Warwick 8
10. Protein Synthesis
Amino Acid Sequences
− AAs polymerised into
Chains (Residues)
− Gene sequence
determines Protein
sequence
Protein Structure
− Chains fold into
specific compact
structures
Structure formation (folding)
is spontaneous
Sequence determines
Structure
Structure determines
function
27th November 2008, University of Warwick 9
11. Determining Protein Structures
Protein Structure
determination is
slow and difficult
Determining protein
sequence is
relatively easy
(Genomics)
PDB vs Genbank
Thomas Splettstoesser
27th November 2008, University of Warwick 10
12. Comparing Protein Structures
• Proteins build the majority of cellular structures
and perform most life functions
• Extend knowledge about the protein universe:
– Understand interrelations
between structures and functions of proteins
through measured similarities
– Group (cluster) proteins by
structural similarities as to infer commonalities
• Goal is to predict functions of proteins
from their structure, or design new
proteins for specific functions
• Considering any two objects:
What does “similar” mean?
Similar or not? How / Where similar?
27th November 2008, University of Warwick 11
13. Protein Structure Comparison
Similarity comparison of protein structures is not trivial even though it is
obvious that proteins may share certain common patterns (motifs)
Many different similarity comparison
methods available, each with its own
strengths and weaknesses
Different concepts of similarity:
sequence vs. structural, local vs. global,
chemical role vs. biological function vs. evolution
sequence vs. …
Different algorithms and implementations:
exact vs. approximation vs. heuristic,
local vs. global search
Maximum Contact Map Overlap
using e.g. Memetic algorithms,
Picture source: http://www.cathdb.info
Variable Neighbourhood Search, Tabu Search
27th November 2008, University of Warwick 12
15. Computational Underpinning
•Dynamic programming (Taylor, 99)
•Comparison of distance matrices (Holms & Sander, 93,96}
•Maximal common sub-graph detection (Artimiuk, Poirrette, Rice & Willet,
95)
•Geometrical matching (Wu, Schmidler, Hastie & Brutlag, 98)
•Root-mean-square-distances (Maiorov & Crippen, 94 – Cohen &
Sternberg,80)
•Other methods (eg. Lackner, Koppensteimer, Domingues & Sippl, 99 –
Zemla, Vendruscolo, Moult & Fidelis, 2001)
A survey of various similarity measures can be found in (Koehl P:
Protein structure similarities. Curr Opin Struct Biol 2001, 11:348-353)
27th November 2008, University of Warwick 14
16. Some Observations
•No agreement on which of these is the best method
• Various difficulties are associated with each.
• They assume that a suitable scoring function can be defined for which
optimum values correspond to the best possible structural match between
two structures (clearly not allways true, e.g. RMSD)
• Some methods cannot produce a proper ranking due to:
• ambiguous definitions of the similarity measures or
• neglect of alternative solutions with equivalent similarity values.
Structure Comparison, is at its core a multi-competence (multi-objective)
problem but it is seldom treated as such, e.g.:
ProSup (Feng & Sippl, 96) optimizes the number of equivalent residues with the RMSD being an
additional constraint (and not another search dimension).
DALI (Holm & Sander, 93) combines various derived measures into one value, effectively
transforming a multi-objective problem into a (weighted) single objective one.
27th November 2008, University of Warwick 15
17. What/How are we comparing?
Models, Measures, Metrics & Methods
or other tasks...
27th November 2008, University of Warwick 16
18. Until very recently researchers would:
Focus on steps 1-4 , often collapsed into one single
step
Compare one algorithm against others on a given
data set
Conclude that their algorithm “is best” for that data
set and write a paper
Meanwhile, in the real world…
No method is best in all data sets.
The biologist will only use the method (s)he is most
familiar with! Regardless of the suitability to his/her
problem.
27th November 2008, University of Warwick 17
19. Until very recently researchers would:
Focus on steps 1-4 , often collapsed into one single
step
Compare one do we change this reality? given
Q: How algorithm against others on a
data set
Conclude that their algorithm “is best” for that data
set and write a paper
Meanwhile, in the real world…
No method is best in all data sets.
The biologist will only use the method (s)he is most
familiar with! Regardless of the suitability to his/her
problem.
27th November 2008, University of Warwick 17
20. Until very recently researchers would:
Focus on steps 1-4 , often collapsed into one single
step
Compare one do we change this reality? given
Q: How algorithm against others on a
data set
Conclude that their it easy for the for that data
A: We make algorithm “is best”
set and write a to use the correct method
biologist paper
(and more)
Meanwhile, in the real world…
No method is best in all data sets.
The biologist will only use the method (s)he is most
familiar with! Regardless of the suitability to his/her
problem.
27th November 2008, University of Warwick 17
21. ProCKSI
www.procksi.org
27th November 2008, University of Warwick 18
22. The ProCKSI-Server
ProCKSI: Protein Comparison, Knowledge,
Similarity, and Information
Web Server for protein structure comparison
Workbench / portal for established
methods and repositories for
protein structure information
– Integrates results from many
comparison methods in one place
– Home-grown comparison methods,
Max-CMO and USM (using contact
maps as their input)
Decision Support System / analysis tool
– Visualises, compares and clusters all similarity measure results
– Incorporates all results and suggests a similarity consensus
27th November 2008, University of Warwick 19
23. The ProCKSI-Server
Minimise the Management Overhead for Experiments
• Upload your own dataset or download structures from the PDB repository
• Validate your PDB file, and extract desired models and chains
• Choose from multiple similarity comparison methods at one place (including
your own similarities) or don’t choose and use all!
Calculation USM
• Submit and monitor the Manager
Local External
progress of your experiment
MaxCMO
Dataset
• Integrate results from all Manager
pair-wise comparisons Similarity
Results Comparison
• Analyse and visualise results Management
from different similarity Task / Job
Scheduling
comparison methods
Overview
Manager
• Combine results and produce a Structure Task
Requests
similarity consensus profile Manager Managers
and Results
DataBase /
Filesystem
Analysis
• Download desired results Manager
27th November 2008, University of Warwick 20
24. Protein Comparison Methods United
Home-grown methods:
− USM
− Max-CMO
External methods:
− DaliLight
− FAST
− CE
− TMalign
− Vorolign
− URMS
Additional informational sources:
− CATH, iHOP, RSCB, SCOP
27th November 2008, University of Warwick 21
25. Home-Grown Methods
• Representation of 3D protein structures as 2D contact maps
- Atoms that are far away in the linear chain,
come close together in the folded state
- If the distance between two atoms
i,j is below a threshold t, they are
said to form a contact
• Mathematical description of contact maps
- Calculation of all pairwise Euclidean distances between atoms i,j
Sequence of atoms
- Translation into a binary, symmetrical
matrix, called the contact map C
Sequence of atoms
• Contact maps in ProCKSI
Input for the two main similarity measures:
- Universal Similarity Metric (USM)
- Maximum Contact Map Overlap (MaxCMO)
27th November 2008, University of Warwick 22
26. An Example of a contact map
1C7W.PDB
27th November 2008, University of Warwick 23
27. Protein Structure Comparison
• Secondary structure elements can
be identified in the contact map:
− α-helix: wide bands on main diagonal
− β-sheet: parallel or perpendicular bands to main
diagonal
• Comparison of contact maps
- using different similarity measures, e.g.
number of alignments, overlap values,
information content, …
• Protein relationships
- Pair-wise comparison of multiple proteins
results in a (standardised) similarity matrix
- Comparison of all possible proteins describes
the protein universe
Protein 1NAT with α-helices and β-sheets
27th November 2008, University of Warwick 24
28. Protein Structure Comparison
• Maximum Contact Map Overlap (MaxCMO) method
is a specific measure of equivalence
- Number of aligned residues (dashed lines) and equivalent contacts
(aligned bows, called overlap)
- Overlap gives strong indication for topological similarity taking the
local environment into account
27th November 2008, University of Warwick 25
29. 1ash 1hlm
Two related proteins taken from the PDB which share a 6 helices structural motif.
27th November 2008, University of Warwick 26
30. 1ash 1hlm
Two related proteins taken from the PDB which share a 6 helices structural motif.
27th November 2008, University of Warwick 26
31. 1ash 1hlm
Two related proteins taken from the PDB which share a 6 helices structural motif.
27th November 2008, University of Warwick 26
32. 1ash 1hlm
Two related proteins taken from the PDB which share a 6 helices structural motif.
Two locally and globally similar contact maps.
27th November 2008, University of Warwick 26
34. Protein Structure Comparison
• Universal Similarity Metric (USM) is the most concept/domain
independent measure in ProCKSI
- detects similarities between (quite) divergent structures
- based on the concept of Kolmogorov complexity
- compares the information content of two contact maps by compression
(NCD)
27th November 2008, University of Warwick 28
35. Protein Structure Comparison
• Contact maps are the input to Universal Similarity Metric
(USM)
• Basic concept is Kolmogorov Complexity:
- Prior Kolmogorov complexity K(o):
Measures the amount of information contained in a given object o
- Conditional Kolmogorov complexity K(o1|o2):
How much (more) information is needed to produce object o1 if one
knows object o2 (as input)
• Calculation of the Normalized Information Distance (NID),
which is a proper, universal and normalized similarity metric
27th November 2008, University of Warwick 29
36. Protein Structure Comparison
• Kolmogorov complexity is not computable directly, but can be heuristically
approximated
• Approximation of the Normalised Information Distance (NID) by the Normalised
Compression Distance (NCD):
– Objects are represented as bit strings s
(or files) that can be concatenated (.)
– Objects are compressed by any lossless
real-world compressor (e.g. zip, bzip2, …)
– Length of the compressed string/file 00000000001100000
00000000011100000
approximates the Kolmogorov complexity 00001100011000000
00000100000000000
00100001000000000
00110000000000000
00000000000000000
00001000010000000
00000000001000000
01100001000000000
11100000100000000
11000000000001000
00000000000000100
00000000000100011
00000000000010000
00000000000001000
00000000000001000
00000000001100000
00000000011100000
– Compression of the second string/file using the concatenation
00001100011000000
00000100000000000
00100001000000000 dictionary of the first one gives cond. Kolmogorov 000000000011
00110000000000000 000000000111
00000000000000000
00001000010000000
00000000001000000
complexity 000011000110
000001000000
001000010000
01100001000000000 001100000000
11100000100000000 000000000000
11000000000001000 000010000100
00000000000000100 000000000010
[ 0 + ε; 1 + ε ]
00000000000100011 011000010000
00000000000010000 NCD NCD 111000001000
00000000000001000 110000000000
00000000000001000
27th November 2008, University of Warwick 30
37. Protein Structure Comparison
• Analysis of similarity matrices by hierarchical clustering:
– Similarity matrices not easy to analyse,
especially for very large datasets
– Similar proteins (with small values)
are grouped together (clustered)
– Many clustering algorithms available,
e.g. Ward’s Minimum Variance
• Results of the hierarchical clustering
can be visualised as linear or
hyperbolic tree
– Hyperbolic tree is favourable for
large sets of proteins
– Fish-eye perspective
– Navigation through the tree
possible
– Tree comparison across
methods/data sets
27th November 2008, University of Warwick 31
38. Total Evidence Consensus
• Comparison of a pair of proteins P1 and P2 with a given
similarity method 1M results in a similarity score 1S12
P1
1M 1S 1S
11
1S
12
… 1S
1n
12
1S 1S
P2 21 22
…
…
1M 1S 1S 1S
1n n1 Text nn
Pn
• Comparison of a dataset with multiple proteins P1 … Pn
with the same similarity method 1M results in similarity
matrix 1S
• Comparison of the same dataset with multiple similarity
methods 1M … mM results in multiple similarity matrices
1S … mS providing multiple similarity measures
27th November 2008, University of Warwick 32
39. Consensus Analysis
Consensus/Greedy
– Standardisation of similarity distances: [0;1]
– Assumption: For a given pair of structures,
the best method produces the best similarity values
– Compilation of a similarity matrix including the
best values from the best similarity method for each pair
Consensus/Average
– Expert user selects similarity measures; included measures contribute equally to the
consensus
– The intelligent combination of similarity comparison measures leads to better results
than any single one can provide!
Consensus/Weighted
– Assign weights to similarity measures according to
preference by ranking, e.g. Z-score > N-Align > RMSD
– Optimise weights: Determine minimum, average and
maximum weights by solving linear programming problem
27th November 2008, University of Warwick 33
40. Total Evidence Consensus
• Each similarity matrix must be standardised [0;1] as different
methods produce different qualities and ranges of measures
• Integration of multiple similarity matrices 1M … mM
in order to build a consensus similarity matrix C
1S
11
1S
12
… 1S
1n
1S 1S
21 22
…
C11 C12 … C1n
1S 1S
n1 nn
C21 C22
…
…
mS mS … mS Cn1 Cnn
11 12 1n
mS 1S
21 22
…
mS mS
n1 nn
• The consensus operator determines
how the different similarity matrices are
weighted and averaged, e.g.:
27th November 2008, University of Warwick 34
41. Results
www.procksi.org
27th November 2008, University of Warwick 35
42. Evaluation of CASP6 Results
• Evaluation of CASP6 competition results
• Prediction of protein structure against a given target
– Evaluation of predictions with similarity comparison methods
CASP ProCKSI MaxCMO CASP Evaluation
Target (T0196) CONSENSUS Overlap GDT-TS
• Similarity ranking with different methods
– CONSENSUS =
Unweighted arithmetic average of
USM + MaxCMO/Overlap + DaliLite/Z
– Comparable results between ProCKSI‘s CONSENSUS method and the
community‘s gold standard GDT-TS supplemented with expert curation
– CONSENSUS detect better model for target T0196
27th November 2008, University of Warwick 36
43. Clustering of Protein Kinases
Comparison of sequence-based classification with structure-based
clustering from single similarity comparison methods and ProCKSI's
consensus method
• Biological background:
– Kinases are enzymes that catalyse the transfer of a phosphate to a protein substrate
– Play essential role in most of the cellular processes
e.g. cellular differentiation and repair, cell proliferation
• Kinases dataset:
http://www.nih.go.jp/mirror/Kinases
− 45 structures published at the Protein Kinase Resourse (PKR) web site
• Hanks' and Hunter's (HH) classification as gold standard:
– Based on sequence information
– HH-Clusters: Mainly 9 different groups (super-families)
– Sub-Clusters: Common features according to the SCOP database
• Experiments with 3 different comparison methods (USM, MaxCMO, DaliLite), 3 different
contact map thresholds, 7 different clustering methods (e.g. Wards, UPGAA)
27th November 2008, University of Warwick 37
44. Clustering of Protein Kinases
Single Similarity Measures DaliLite/Z USM/USM MaxCMO/Overlap
• Best results with clustering
with Ward's Minimum
Variance method
• Each method/measure has
its own strengths and flaws
Strengths:
• Green: Classification on
Class level, e.g. α+β/PK-like
• Blue: Detect similarities
up to Species level with e.g.
mice, pigs, cows
• Red: Produce mixed bag of proteins
being least similar in Blue
Flaws:
• MaxCMO/Overlap only distinguishes proteins on Class level
• DaliLite/Z adds fairly wrong protein 1IAN to Green
• USM/USM reverses order of last two clustering steps (Blue and Green)
27th November 2008, University of Warwick 38
45. Clustering of Protein Kinases
Similarity Consensus USM/USM + DaliLite/Z USM/USM + DaliLite/Z
+ MaxCMO/Overlap
• Exhaustive combination of all
available similarity measures
Best Results:
● Correct clustering with
USM/USM + DaliLite/Z
compensating for each
others flaws
General Trends:
● Including similarity measures
derived from the number of
alignments (e.g. MaxCMO/Align,
DaliLite/Align) partially destroy
good clustering outside Green
● Adding noisier measures (e.g.
MaxCMO/Overlap) still produces
comparable good and robust
results
27th November 2008, University of Warwick 39
46. Consensus Analysis
Comparison of the influence of the combination of different similarity
measures on the quality of the consensus method
• Rost/Sander dataset:
– Designed for secondary structure prediction
– Pairwise sequence similarity of less than 25%
– 126 globular proteins incl. 18 multi-domain proteins
• SCOP classification as gold standard:
– Manually curated database containing expert knowledge
– Hierarchical classification levels:
Class, Fold, Superfamily, Family, Protein, Species
• Analyse performance of each established comparison method against
consensus method using ROC analysis
– Compare true positives against false positives
– Performance measure is Area under the Curve (AUC)
27th November 2008, University of Warwick 40
47. Consensus Analysis - Technique
ROC = Receiver Operator Characteristics
– Technique for comparing the overall performance of
different methods / algorithms / tests on the same dataset
– Widely employed e.g. in signal detection theory,
machine learning, and diagnostic testing in medicine
• ROC curves depict the relative trade-off between benefits
(True Positives) and costs (False Positives)
True Classes
p n
• Confusion matrix of a binary test
Test Classes
Y TP FP
– Hit rate: True Positive rate TPr
N FN TN
P N
– False alarm: False Positive rate FPr Column Totals
27th November 2008, University of Warwick 41
48. Consensus Analysis - Technique
Important points in ROC space
(0,1) : high TPr and low FPr;
perfect classifiction
(0,0) : never issue positive
classifications; useless
(1,1) : always issue positive
classifications; useless
{y=x} : randomly guessing a
classification; useless
ROC curves for methods with continuous output
– Not a simple binary (discrete) decision problem (yes/no)
– Ranking or scoring output estimates the class membership probability
of an instance [0;1]
– Application of a variable threshold in order to produce and validate
discrete classifiers
– The best method has an uppermost (north-western) curve
– Area Under the Curve (AUC) quantifies the performance
27th November 2008, University of Warwick 42
49. Consensus Analysis
Analysis of SCOP’s Class level (as example for all levels)
- RMSD values are not good similarity measures (except for DaliLite)
- Best performance with FAST/SN and FAST/Align (Class level),
and with CE/Z, DaliLite/Z, and DaliLite/Align (all other levels)
- Consensus/All gives worse AUC value than best method but very close to it
27th November 2008, University of Warwick 43
50. Consensus Analysis
Results from Comparisons/Singles
rating ranking
*** first
** second
* third
27th November 2008, University of Warwick 44
51. Consensus Analysis
Results from Consensus/Average
rating ranking
*** first
** second
* third
27th November 2008, University of Warwick 45
52. Consensus Analysis
Analysis of SCOP’s Superfamily level (exemplary for all levels)
Consensus/
Average-Best3
- Consensus/Average-Best3 gives better AUC values than any of
the contributing similarity measures (except Protein level)
- Further reduction to Consensus/Average-Best2 improved only
performance for Protein and Superfamily level
27th November 2008, University of Warwick 46
53. Distributed Computing
Similarity comparison of proteins with multiple methods and
large datasets is very time consuming and needs to be
parallelised / distributed / gridified
– Simple automated scheduling system for job distribution
works well on dedicated ProCKSI cluster (5 nodes, dual)
– Research on how to bundle jobs including fast/slow
methods and small/large dataset
► Optimise the ratio between calculation time and
overhead (data transfer time, waiting time, ...)
– Generalised scheduler for usage of clusters on the GRID
and/or the University of Nottingham's cluster (> 1000
nodes)
27th November 2008, University of Warwick 47
54. Problem / Solution Space
All-against-all comparison of a dataset of S protein structures
using M different similarity comparison methods can be
represented as 3D cube.
s
h od Heterogeneity:
et
M
1. Each structure has
different length i.e number
of residues
2. Each method has different
execution time even for
Structures
same pair of structures
3. Back-end computational
nodes may have different
speeds etc
Structures
27th November 2008, University of Warwick 48
55. Possible Strategies
1. Comparison of one pair of proteins using one method
in the task list => SxSxM jobs, each performing 1 comparison
>> far too fine-grained
2. All-against-all comparison of the entire dataset with one
method => M jobs, each performing SxS comparisons
>> currently running , valid only for |S|<500 proteins
3. Comparison of one pair of proteins using all methods in the
task list => SxS jobs, each performing M comparisons
>> Slightly different from 1, does not allow intelligent load
balancing
4. Intelligent partitioning of the 3D problem space, comparing a
subset of proteins with a set/subset of methods
>> under investigation
27th November 2008, University of Warwick 49
56. Distributed (grid-enabled) architecture
• p = number of nodes
• N1, N2, .. Np= Cluster
or Grid nodes
•The system is able to
run both on a parallel
environment using the
MPI libraries and on a
grid computing
environment using the
MPICH-G2 libraries.
•Complexity of Proteins
is estimated and bag of
proteins are distributed
on different nodes
27th November 2008, University of Warwick 50
61. Experimental results: overall speed-up
Speed-up = Ts /Tp
Where,
Ts: sequential exec time
Tp: Parallel exec time on P
processors
Ideal speed-up = p
where,
P: number of processors
27th November 2008, University of Warwick 53
62. Conclusions
www.procksi.org
27th November 2008, University of Warwick 54
63. Conclusions
• ProCKSI is a workbench for protein structure comparison
– Implements multiple different similarity comparison methods with different
similarity concepts and algorithms
– Facilitates the comparison and analysis of large datasets of protein
structures through a single, user-friendly interface
• ProCKSI is a decision-support system
– Integrates many different similarity measures and suggests a consensus
similarity profile, taking their strengths and weaknesses into account
The combination of multi-competence similarity comparison measures
leads to better results than any single one can provide!
• Additional Tools:
• One of the most tested PDB parsers out-there
• Very flexible tool for generating contact maps under a variety of definitions
and parameters
• Flexible contact maps visualisation
• Trees comparison and visualisation
• You can add your own distance matrix
27th November 2008, University of Warwick 55
64. Conclusions
• ProCKSI keeps expanding:
• More methods are being added.
• If you have a method and want it included contact us!
• More sophisticated data fusion and visualisation are in their
way!
• Hardware is evolving.
• ProCKSI is publicly available at:
http://www.procksi.net
27th November 2008, University of Warwick 56
65. Literature
Journal Papers
– The ProCKSI Server: a decision support system for Protein (Structure)
Comparison, Knowledge, Similarity and Information
Daniel Barthel, Jonathan D. Hirst, Jacek Błażewicz, Edmund K. Burke, Natalio
Krasnogor. BMC Bioinformatics 2007, 8, 416.
– Web and Grid Technologies in Bioinformatics, Computational and Systems
Biology: A Review
Azhar A. Shah, Daniel Barthel, Piotr Lukasiak, Jacek Błażewicz, Natalio
Krasnogor. Current Bioinformatics 2008, 3, 10-31.
Conference Papers
– Grid and Distributed Public Comupting Schemes for Structural Proteomics: A Short
Overview
Azhar A. Shah, Daniel Barthel, Natalio Krasnogor. In Frontiers of High Performance Computing and
Networking (ISPA2007), Lecture Notes in Computer Science 4743, 424-434. Springer-Verlag, Niagara Falls,
Canada, August 2007.
– Protein Structure Comparison, Clustering and Analysis:
An Overview of the ProCKSI Decision Support System
Azhar Ali Shah, Daniel Barthel, Natalio Krasnogor. In Proceedings of the 4th International Symposium on
Biotechnology (IBS) and 1st Pakistan-China-Iran International Conference on Biotechnology, Bioengineering
and Biophysical Chemistry (ICBBB'07), Jamshoro, Pakistan, November 2007.
27th November 2008, University of Warwick 57