BgN-Score & BsN-Score: Bagging & Boosting Based Ensemble Neural Networks Scoring Functions for Accurate Binding Affinity Prediction of Protein-Ligand Complexes
Motivation
2
BA is the principal determinant of
many vital biological processes
Accurate prediction of BA is
challenging & remains unresolved prb.
Conventional SFs have limited
predictive power
Scoring Challenges
6
Lack of accurate accounting of
intermolecular physicochemical
interactions
More descriptors = Curse of
dimensionality
Relationship between descriptors & BA
could be highly nonlinear
Multi-Layer Neural Network
7
Theoretically, it can model any nonlinear
continuous function; however:
Hard to tune its weights to optimal values
Does not handle high dimensional data well
Has high variance errors
Is a black box model lacks descriptive power
Our Approach & Scope of Work
8
Collect large number of PLCs with
known BA
Extract a diverse set of features
Train ensemble NN models based
on boosting and bagging
Evaluate resulting SFs on diverse
and homogeneous protein families
Compound Database: PDBbind
[3]
10
Protein-ligand complexes obtained from
PDBbind 2007
PDBbind is a selective compilation of the
Protein Data Bank (PDB) database
PDB
Ligand’s
MW ≤ 1000
# non-hydrogen
atoms of the
ligand ≥ 6
Only one
ligand is
bound to
the protein
Protein &
ligand non-
covalently
bound
Resolution of the complex
crystal structure ≤ 2.5Å
Elements in complex
must be C, N, O, P, S, F,
Cl, Br, I, H
Known Kd or Ki
Hydrogenation
Protonation &
deprotonation
Refined set
of PDBbind
PDBbind: Refined Set
12
PDBbind: Core Set
13
Refined
set
Similarity
search using
BLAST
Similarity cutoff of
90%
Clusters
with ≥ 4
complexes
Binding affinity of highest-
affinity complex is 100-fold
the affinity of lowest one
First, middle, and
lowest affinity
complexes from
each cluster
Core set of
PDBbind
Extracted features
calculated for the
following scoring
functions:
X-Score (6 features)
AffiScore (30 features).
RF-Score (36 features)
GOLD (14 features)
Compound Characterization
14
1a30 9hvp1d5j
Protein-ligand
complexes
from PDB
PDB key
1a30
Training and Test Datasets
1f9g 2qtg 2usn
1f9g 2qtg
PDBbind filters and Binding Affinity Collection
2usn
Feature Calculation (using X-Score, Affi-Score, RF-Score, and Gold tools)
ExperimentalData
Primary training
set (1105): Pr
Core test set
(195): Cr
Training and Test Datasets
15
1a30
Training and Test Datasets
1f9g 2qtg 2usn
Primary Training Dataset Pr Core Test Dataset Cr
Feature Calculation (using X-Score, Affi-Score, RF-Score, and Gold tools)
Ensemble NN boosting & bagging Algorithms Test Complex to Score
ExperimentalDat
X Dataset
A Dataset
R Dataset
G Dataset
XA Dataset
...XARG Dataset
X Dataset
A Dataset
R Dataset
G Dataset
XA Dataset
...XARG Dataset
Base Learner: A Neural Network
16
Prediction of each network is
calculated as follows:
Network weights are
optimized to minimize the
fitting criterion E:
Input
layer
Hidden
layer
Output
layer
wh,owi,h
+1
x1
x2
xP
Bindingaffinity
+1
Featuresofacomplex
BgN-Score
17
An ensemble of MLP ANNs
grown
Inputs to each ANN are a
random subset of p
features
Each ANN trained on a
bootstrap dataset randomly
sampled with replacement
from training data
After building the ensemble
model, the BA of a new
protein-ligand complex X is
computed by applying the
formula:
wh,owi,h
+1
x3
x21
x13
Bindingaffinity
+1
wh,owi,h
+1
x8
x51
x6
+1
wh,owi,h
+1
x6
x2
x37
+1
Featuresofacomplex
Average
SF Construction & Application Workflow
21
Scoring Function Building
and Evaluation
Collecting Data
Feature Generation
Training Set Formation
Model Building
BsN-Score &
BgN-Score
Binding Affinity
Protein
3D
structure
Ligand
Feature Generation
Build Validate
Parameter Tuning
Training Data
Optimal
Parameters
Parameter Tuning: BgN-Score
22
Optimal parameters:
H~20, p ~ P/3, λ ~ 0.001 , N ~ 2000
Training
Net. 1
Training
Net. 851
Training
Net. 2000Parameter Set 1
Parameter Set i
Parameter Set θ
Generated
Parameter Sets
Build an BgN-Score model
and test on OOB examples
1.56
1.04
3.17
OOBMSE
An example of a
parameter set:
H = 23, p = 15 , λ= 0.031
Choose the parameter
set that corresponds
to the minimum
OOBMSE
Parameter Tuning: BsN-Score
23
Optimal parameters:
H~20, p ~ P/3, λ ~ 0.001 , N ~ 2000
Training
BsN-Score 1
Training
BsN-Score 4
Training
BsN-Score 10Parameter Set 1
Parameter Set i
Parameter Set θ
Generated
Parameter Sets
Build 10 BsN-Score models and test on
their respective validation examples
1.56
1.04
3.17
Average CV MSE
An example of a
parameter set:
H = 23, p = 15 , λ= 0.031
Choose the parameter
set that corresponds
to the minimum CV
MSE
Validation
Validation
Validation
24
0.644
0.657
0.804
0.816
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
SYBYL::F-Score
SYBYL::PMF-Score
GOLD::GoldScore
DS::Jain
SYBYL::D-Score
GOLD::ChemScore
DS::PMF
GlidScore-XP
DS::LigScore2
DS::LUDI3
SYBYL::G-Score
GOLD::ASP
DS::PLP1
SYBYL::ChemScore
DrugScoreCSD
X-Score::HMScore
SNN-Score::X
BgN-Score::XARG
BsN-Score::XARG
Correlation Coefficient Rp
ScoringFunctionsEnsemble NN vs. Conventional SFs on Diverse Complexes
Ensemble NN vs. Conventional SFs on HIV Complexes
25
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CorrelationCoefficientRp
Scoring Functions
Disjoint Training and Test
Complexes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CorrelationCoefficientRp
Scoring Functions
Overlapping Training and Test
Complexes
Ensemble NN vs. Conventional SFs on Trypsin Complexes
26
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CorrelationCoefficientRp
Scoring Functions
Disjoint Training and Test
Complexes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CorrelationCoefficientRp
Scoring Functions
Overlapping Training and Test
Complexes
Ensemble NN vs. Conventional SFs on Carbonic Anhydrase Cmpxs.
27
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CorrelationCoefficientRp
Scoring Functions
Disjoint Training and Test
Complexes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CorrelationCoefficientRp
Scoring Functions
Overlapping Training and Test
Complexes
Ensemble NN vs. Conventional SFs on Thrombin Complexes
28
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CorrelationCoefficientRp
Scoring Functions
Disjoint Training and Test
Complexes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CorrelationCoefficientRp
Scoring Functions
Overlapping Training and Test
Complexes
Concluding Remarks
30
BsN-Score & BgN-Score are the most accuare SFs
BsN-Score & BgN-Score are ~20% more accurate (0.804
& 0.816 vs. 0.675) compared to SNN-Score
BsN-Score & BgN-Score are ~25% more accurate (0.804
& 0.816 vs. 0.644) compared to the best conventional
SF, X-Score.
Moreover, their accuracies are even higher when they
are used to predict BAs of protein-ligand complexes
that are related to their training sets.
Future Work
31
Collect more PLC from other databases
Consider other techniques to extract more descriptors
Analyze variable importance and descriptor interactions
Consider other types & topologies of ANNs such as
Recurrent NNs and Deep NNs.
We used the same complex database that Cheng et al used as a benchmark in their comparative assessment of 16 popular SFs. This DB, PDBbind, is a popular benchmark that has been cited and used to evaluate SFs in hundreds of other studies (from google scholar).
PDBbind is a high-quality and comprehensive compilation of biomolecular complexes deposited in the Protein Data Bank (PDB).
[The slide itself has the talk.]
Boosting is an ensemble machine-learning technique based on a stage-wise fitting of base learners.
The technique attempts to minimize the overall loss by boosting the complexes having highest predicted errors, i.e., by fitting NNs to (accumulated) residuals made by previous networks in the ensemble model.
The algorithm starts by fitting the first network to all training complexes. A small fraction (ν < 1) of the first network’s predictions is used to calculate the first iteration of residuals Y1.
The network f1 is the first term in the boosting additive model. In each subsequent stage, a network is trained on a bootstrap sample of the training complexes described by a random subset of p < P features. The values of the dependent variable of the training data for the network l are the current residuals corresponding to the sampled protein ligand complexes. The residuals for each network are the differences between previous residuals and a small fraction of its predicted errors.
This fraction is controlled by the shrinkage parameter ν < 1 to avoid any overfitting. Network generation continues as long as their number does not exceed a predefined limit L. Each network joins the ensemble with a shrunk version of itself. In our experiments, we fixed the shrinkage parameter to 0.001 which gave the lowest out-of-sample error.
The final prediction of a PLC x^P is : [read the formula given in the slide]
A total of sixteen popular SFs are compared to NN SFs in this study. The sixteen SFs are either used in mainstream commercial docking tools and/or have been developed in academia. The functions were recently compared against each other in a study conducted by Cheng et al. The set includes 9 Empirical SFs, 4 Knowledge-based SFs, and 3 Force-field SFs.