The role of machine learning in modelling the cell
1. The Role of Machine Learning in
Modelling the Cell.
John Hawkins
ARC Centre for Complex Systems
University of Queensland
Australia
2. Overview of Talk
Overview of cell biology
Modelling the cell
Subcellular localisation signals
Machine Learning in General
Neural networks
Feed Forward versus Recurrent
3. Cell Biology – Quick and Dirty
Membrane bound
Organelles
Nucleus
DNA -> RNA ->
Protein
Transport, e.g.
Mitochondria
Peroxisome
Modification, e.g.
Disulphide
Bond Formation
Glycosylation
4. Cell Feedback
At a particular time point a set of genes
will be expressed.
These do not remain constant, instead
the emerging picture is that
There is some essential cycle of gene
expression
With a capacity to indulge in alternative
pathways of expression under external
stimulus.
The pattern of expression is
5. Modelling the cell
Ideally we would like to model the cell
from the level of a 3D physical
simulation.
Currently this is infeasible
So numerous approaches are taken to
form abstractions
Gene Regulatory Networks
Differential equation models of particular
pathways
Machine learning models of particular
6. Biological Sequences
Many Important Biological Molecules are
Polymers.
Thus representable as a sequence of discrete
symbols.
Sequence M = [m1, m2, …, mn] where:
DNA mi { A, T, G, C }
RNA mi { A, U, G, C }
Protein mi { G, A, V, L, I, P, S, C, T, M, D,
E, H, K, R, N, Q, F, Y, W }
7. Information Content
How much information in a linear sequence?
Two crucial elements to function
Physical/chemical properties
Molecular shape
Each residue has well known properties
Denaturation. (Anfinsen,1973).
Sequence defines arrangement of chemical
properties which in turn defines folding.
8. Biological Patterns
Motifs – General term for patterns
Numerous Definitions & Visualisations
PROSITE Patterns – Regular Expression
PROSITE Profiles – Probability Matrix
LOGOs
9. Peroxisomal Localisation
Predominantly controlled by a C-
terminal sequence called the PTS1
signal.
Roughly 12 residues long
Known dependencies between
locations
10. Nuclear Export
Some proteins move continuously between the
nucleus and cytoplasm of the cell.
Either as:
Transporters
Regulators
11. Machine Learning
Requires a set of examples, with
Raw input, sequences data, and
Known classes that the machine should
predict
In essence Function Approximation
Start with a General parametrised
function over the input data
Adjust the parameters until the output of
the function is a good approximation to
the known classes of the examples.
12. Bias
Bias is generally unavoidable
(Mitchell, 1980)
Three Sources of Bias
Input Encoding
Function Structure (Architecture)
Parameter adjustment algorithm (learning)
13. Neural Networks
Graphical Model consisting of layers of
nodes connected by weights
Feed forward neural networks
Fixed input window
Signal propagates in a single pass through the
layers
Recurrent Neural Networks
Signal processed in parts
Recurrent connections maintain a memory state
Output generated after processing the last piece
of the input signal
14. Simple Neural Networks
F F N N O h = S (W1 ∙ I1 + W2 ∙ I2 + b)
R N N O h = S (W1 ∙ I2 + W2 ∙ S (W1 ∙
I1 + b ) + b )
16. Applications
We have applied these techniques to
Subcellular Localisation to
Endoplasmic Reticulum
Mitochondria
Chloroplast
Peroxisome
http://pprowler.imb.uq.edu.au
Working with whole genome data and
wet lab biologists to use these tools for
data mining.