2. Neural Network Method
•Prediction is done by utilizing the
information of different
DATABASE
•Linear sequence 3D structure of
Polypeptide
3. Neural network
Input signals are summed
andturned into zero or one
3.
Feed-forward multilayernetwork
Input layer Hidden layer Output
layer
J1 J2 J3 J4
neurons
4. NeuralNetworks
• Neural networks are rather trained then programmed to carry out
chosen information processing tasks
• Training neural network involves adjusting the network so that is
able to produce specific output for each of given set of input
patterns
• Since the desired data are known in advance, training a feed
forward network is a supervised learning.
• Back propagation algorithm – Each error in recognition on output
effects with reaction of back correction in parameters of activation
function.
5. Training a feed forwardnet
• Training was performed using SNNS (Stuttgart Neural Network
System) package
• Network architecture and weights were exported using ssns2c
program from SNNS package
• Own Perl programs was used to preparing data and
benchmarking network
6. Neuralnetwork
Input signals are summed
andturned into zero or one
3.
Feed-forward multilayernetwork
Input layer Hidden layer Output
layer
J1 J2 J3 J4
neurons
8. Algorithm
• A binary encoding scheme is used for network input. In this
scheme each amino acid at each window position is encoded
by a group of 21 inputs,
• one for each possible amino acid type at that position and
one to provide a null input used when the moving window
overlaps the amino- or carboxyl-terminal end of the protein.
9. • In each group of 21 inputs, the input corresponding to the amino
acid type at that window position is set to 1 and all other inputs are
set to 0.
• Thus, the input layer consists of 17 groups of 21 inputs each and
for any given 17 amino acid window, 17 network inputs are set to 1
and the rest are set to 0.
10. • The hidden layer consists of two units. The output layer also
consists of two units. Secondary structure is encoded in these output
units as follows: (1,0) = helix, (0,1) = sheet, and (0,0) = coil. Actual
computed output values are in the range 0.0-1.0
11. Network Architecture
Input Hidden Output Q3
300 20 3 71.277
260 20 3 72.742
220 20 3 73.428
180 20 3 70.083
Q3 – Corelation Coeficient -
Percentage of correctly predicted
residues
MSE = Mean Square Errror.
2 21
n
k k
i 1MSE(t) f (x t) pi (xi t)
i1
i1
MSE
Epoch
220
300
260
180
13. Strategies to increaseaccuracy
• Adding new types of biological information
• Change the way that information is presented to the network
• Post process the network predictions
• Change the network architecture
14. Strategies to increaseaccuracy
Biological Information:
•Hydrophobicity, charge, backbone
properties
•Length of chain – additional input
•Distance to N & C terminal aa
•Non local information. (all alpha, all beta
etc.)