A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
Bioinformatics kernels relations
1. Kernel Methods and Relational Learning in
Bioinformatics
ir. Michiel Stock
Dr. Willem Waegeman
Prof. dr. Bernard De Baets
Faculty of Bioscience Engineering
Ghent University
November 2012
KERMIT
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 1 / 40
2. Outline
1 Introduction
2 Kernel methods
3 Learning relations
4 Case studies
Enzyme function prediction
Protein-ligand interactions
Microbial ecology
5 Conclusions
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 2 / 40
3. Introduction
Introductory example
Problem statement
Predict protein-protein interactions based on high-throughput data.
Based on a gold standard
Typical features that can be
used:
Yeast two-hybrid
Pfam profile
Phylogenetic profile
Localization
PSI-BLAST
Expression
...
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 3 / 40
4. Introduction
Machine learning is widelyagaused in bioinformatics
88 Larran‹ et al.
Downloaded from bib.oxfordjournals.org at Biomedische Bibliotheek o
Figure 1: Classification of the topics where machine learning methods are applied.
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 4 / 40
5. Introduction
Bioinformatics deals with complex data
Bioinformatics data is typically:
in large dimension (e.g., microarrays or proteomics data)
structured (e.g., gene sequences, small molecules, interaction
networks, phylogenetic trees...)
heterogeneous (e.g., vectors, sequences, graphs to describe
the same protein)
in large quantities (e.g., more than 106 known protein
sequences)
noisy (e.g., many features are not relevant)
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 5 / 40
6. Kernel methods
Formal definition of a kernel
Kernels are non-linear functions defined over objects x ∈ X .
Definition
A function k : X × X → R is called a positive definite kernel if it is
symmetric, that is, k(x, x ) = k(x , x) for any two objects x, x ∈ X , and
positive semi-definite, that is,
N N
ci cj k(xi , xj ) ≥ 0
i=1 j=1
for any N > 0, any choice of N objects x1 , . . . , xN ∈ X , and any choice of
real numbers c1 , . . . , cN ∈ R.
Can be seen as generalized covariances.
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 6 / 40
7. Kernel methods
Interpretation of kernels
Suppose an object x has an
implicit feature representation
φ(x) ∈ F.
A kernel function can be seen
as a dot product in this
feature space: X F
k(x, x ) = φ(x), φ(x )
h (x), (x0 )i
k
Linear models in this feature
space F can be made:
dinsdag, 10 april 2012
T
y (x) = w φ(x)
= an k(xn , x)
n
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 7 / 40
8. Kernel methods
Many kernel methods exist
SVM
Examples of popular kernel
methods:
Support vector machine
(SVM)
Regularized least squares
(RLS)
Kernel principal KPCA
component analysis
(KPCA)
Learning algorithm is
independent of the kernel
representation!
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 8 / 40
9. Kernel methods
Kernels for (protein) sequences
Spectrum kernel (SK)
The SK considers the number of k-mers m two sequences si and sj have in
common.
SKk (si , sj ) = N(m, si )∗N(m, sj )
m∈Σk
with N(m, s) the number of k-mers
m in sequence s.
To predict structure, function...
of DNA, RNA or proteins.
A discriminative alternative for
Hidden Markov Models.
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 9 / 40
10. Kernel methods
Kernels for graphs (1)
Graph
Graphs are a set of interconnected objects, called vertices (or nodes), that
are connected through edges.
Graphs can show the structure of an object or interactions between
different objects.
Graph are important in bioinformatics!
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 10 / 40
11. Kernel methods
Kernels for graphs (2)
Graph kernel
Constructing a similarity between graphs.
In chemoinformatics:
Based on performing a
random walk on both graphs
and counting the number of In structural bioinformatics:
matching walks.
Usually very computationally
demanding!
A
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 11 / 40
12. Kernel methods
Kernels for graphs (3)
Diffusion kernel
Constructing a similarity between vertices within the same graph.
Also based on performing a
random walk on a graph.
Captures the long-range
relationships between
vertices.
Inspired by the heat
equation. The kernel
quantifies how quickly ‘heat’
can spread from one node to
another.
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 12 / 40
13. Kernel methods
Kernels for fingerprints
Fingerprint representation of
Objects that can be described an object:
by a long binary vector x can
be represented by the
Tanimoto kernel:
KTan (xm , xn ) =
xm , xn
.
xm , xm + xn , xn − xm , xn
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 13 / 40
14. Learning relations
Kernels for pairs of objects
Problem statement
Predict the binding interaction between a given protein and a ligand
(small molecule). Learning Molecular docking.
The problem deals with two
types of objects:
Proteins (graph kernel of
structure, sequence
kernel, fingerprints...)
Ligand (fingerprints,
graph kernel...)
Label is for a pair of objects.
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 14 / 40
15. Learning relations
ng and Ranking Algorithms for Bioinformatics
example: pairs of objects
Kernels for
Applications
nomicsWillem Waegeman, Bernard De Baets
Michiel Stock,
Pairwise kernel
IT, Department of Mathematical Modelling, Statistics and Bioinformatics
of Combine the kernel matrices of the individual the process of druga kernel
proteins and a database of ligands to aid objects to construct
istical model based objects.
matrix for pairs of on a data set. Kernel methods allow for the
roductory example: chemogenomics
tein and a from individual kernels for the proteins and ligands:
Starting ligand.
ding interactions between a set of proteins and a database of ligands to aid the process of drug
to model pairwise relations between different types of objects.
s
Data set Object kernels
( , )
By optimizing a ranking loss, our algorithms can also be used for
( , ) as shown on the right.
conditional ranking,
( , )
SVM
In short, our framework is ideally suited for bioinformatics
RLS
...
challenges:
( , )
- efficient learning process
( , ) ...
- can handle complex objects (graphs, trees, sequences...)
Pairwise kernel
- ability to deal with information retrieval problems
Object kernels Learning algorithm
gorithms can also be used for
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 15 / 40
16. ( , ) Learning relations
SVM
Conditional ranking (1) RLS
...
Motivation( , )
Suppose one is not ) ...
( , particularly interested in the exact value of the
interaction but in the order of the proteins for a given ligand.
Pairwise kernel
rnels Learning algorithm
ed for More relevant
More relevant
matics
Query 1 Query 2
Database objects
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 16 / 40
17. Learning relations
Conditional ranking (2)
Based on a graph description,
with e a pair of objects.
Train the model:
h(e) =< w, Φ(e) >= ae K Φ (e, e )
¯
e∈E
using the algorithm:
2
A(T ) = argmin L(h, T )+λ h H.
h∈H
Figure 1 Example of a multi-graph. If this graph, on the left, would be used fo
conditioned on C, then A scores better than E, which ranks higher than E, w
Where we use a ranking loss: higher than D and D ranks higher than B. There is no information about the re
and G, respectively, our model could be used to include these two instances in
are available. Notice that in this setting unconditional ranking of these objects
graph is obviously intransitive. Figure reproduced from (Pahikkala et al., 2010).
L(h, T ) = (ye −ye −h(e)+h(¯))2 .
¯ e
The proposed framework is based on the Kronecker product ke
v ∈V e,¯∈Ev
e implicit joint feature representations of queries and the sets of ob
Exactly this kernel construction will allow a straightforward
existing framework to dyadic relations and multi-task l
(Objectives 1 and 2). It has been proposed independently by three
modeling pairwise inputs in different application domains (Basilico
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics et al. 2004, Ben-Hur et al. November a2012
2005). From different perspective, it h
17 / 40
18. Case studies Enzyme function prediction
Predicting enzyme function
Problem statement
Predict the function (EC number) of an enzyme using structural
information of the active site.
Data: active site of an
1730 enzymes with 21 enzyme:
different functions
four different structural
similarities
CavBase
maximum common
subgraph
labeled point cloud
superposition
fingerprints
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 18 / 40
19. Case studies Enzyme function prediction
EC numbers
EC number
A functional label of an enzyme, based on the reaction that is catalyzed.
Example: EC 2.7.6.1 = ribose-phosphate diphosphokinase
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 19 / 40
20. Case studies Enzyme function prediction
Defining catalytic similarity
Catalytic similarity
The catalytic similarity is the number of successive equal digits in the EC
number between two enzymes, starting from the first digit.
0 EC 2.7.7.34
EC ?.?.?.?
3 2
0
1
EC 4.2.3.90
0
0
0
EC 4.6.1.11
2
EC 2.7.1.12
EC 2.7.7.12
ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 20 / 40