The document describes a computer science practical assignment to create a command-line application in Objective-C that simulates chemical reactions stochastically. It explains that the simulation tracks the molecules of different chemical species and fires reactions according to reaction rates defined by the law of mass action. It provides an example simulation script specifying reaction constants, initial molecule counts, and reactions to simulate an enzyme-substrate system over time.
1. Computer Science Large Practical
Stephen Gilmore
School of Informatics
Friday 28th September, 2012
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 1 / 45
2. Introduction
The requirement for the Computer Science Large Practical is to
create a command-line application implemented in Objective-C.
The purpose of the application is to implement a stochastic
discrete-event simulator for chemical reactions.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 2 / 45
3. Turing completeness
In theory all programming languages are equally powerful so why not
implement this simulator in Java instead of Objective-C?
In practice certain programming languages are better suited to some
tasks than others so it is necessary to learn more than one.
In reality having additional language skills brings additional
opportunities (it turns out that everyone can program in Java).
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 3 / 45
4. Simulation
Stochastic simulation is an important tool to help understand the
chemical and biochemical processes which are constantly underway in
all living things.
Simulation is leading the way in scientific breakthroughs in medicine,
farming and veterinary science, with demonstrable benefits for the
health and wellbeing of humans, plants and animals.
Simulation is important in many other fields also, such as aerospace,
transport, logistics, healthcare, and many others.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 4 / 45
5. Chemical reactions
Chemical reactions change the concentration of chemical species such
as proteins because molecules are created, altered, combined, broken
apart, or destroyed.
The simulation of these processes takes place one reaction event at a
time.
The simulation tracks the number of molecules of each species and
terminates when no other reaction can fire, or when a pre-defined
simulation end-time has been reached.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 5 / 45
6. Exact versus approximate simulation
We are concerned here with exact stochastic simulation, in which
each reaction event is simulated.
Other stochastic simulation algorithms exist, called approximate
simulation algorithms.
These estimate the effect of numerous reaction events in order to
speed up a simulation, but we will not be concerned with these here.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 6 / 45
7. Chemical reactions
Reaction events change one chemical species into another. For example,
the reaction event f ,
f :E +S →C
describes a collision between a molecule of enzyme (E ) and a molecule of
substrate (S) to produce a molecule of a compound (C ).
Note
We write E + S to mean “E and S”, not “E plus S”.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 7 / 45
8. Law of Mass Action
The rate at which a reaction takes place depends on the number of
molecules of the reactants which are available (E and S are the
reactants here).
The Law of Mass Action for such systems states that the rate at
which a reaction occurs is proportional to the product of the number
of molecules of the reactants needed.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 8 / 45
9. Law of Mass Action (Example)
For example, if there are 5 molecules of E and 5 of S then the
reaction proceeds at rate 25f where f is the kinetic constant for this
reaction.
More generally, we would say that the reaction proceeds at rate
f × E × S.
Note
We write “E × S” here to mean “the number of molecules of E multiplied
by the number of molecules of S”.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 9 / 45
10. Reactions and molecule counts
After this reaction has fired, the molecule count of E is decreased
by 1, as is the molecule count of S.
The molecule count of C is increased by 1.
For example, if E and S were 5 before, and C was 0 then afterwards
E and S are 4 and C is 1.
The reaction rate is correspondingly less (now 16f , not 25f ).
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 10 / 45
11. Molecule counts can never become negative
Note that this means that when any reactant is exhausted
(e.g. S = 0) then the reaction cannot fire.
The rate f × E × S is 0, because of the multiplication
Thus, molecule counts can never become negative.
Note
If your simulator generates negative molecule numbers on any simulation
run then you have a bug in your simulator.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 11 / 45
12. Compounds can break apart
Having formed, compounds can break apart again. For example, the
reaction event b,
b :C →E +S
describes a molecule of C breaking up into a molecule of E and a molecule
of S.
Law of Mass Action
This time C is the reactant and the Law of Mass Action states that this
reaction proceeds at rate b × C , where b is the kinetic constant for this
reaction.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 12 / 45
13. Forming a product
Finally, consider an alternative product reaction which produces a new
species P (a product).
p :C →E +P
This describes a molecule of C breaking up into a molecule of E and a
molecule of P.
Law of Mass Action
Again, C is the reactant and the Law of Mass Action states that this
reaction proceeds at rate p × C , where p is the kinetic constant for this
reaction.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 13 / 45
14. Simulation
The simulation of such a system depends on:
the values of the kinetic constants f , b and p,
the initial molecule counts of the species E , S, C and P,
and the stop-time, when the simulation should terminate.
Given this information in the form of a script, a simulator has enough
information to simulate the model, moving from state to state as
allowed by the reactions.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 14 / 45
15. Simulation script (1/2)
# The simulation stop time (t) is 200 seconds
t = 200
# The kinetic real-number constants of the three reactions:
# forward (f), backward (b) and produce (p)
f = 1.0
b = 0.5
p = 0.01
# The initial integer molecule counts of the four species,
# Enzyme, Substrate, Compound and Product
# (E, S, C, P) = (5, 5, 0, 0)
E = 5
S = 5
C = 0
P = 0
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 15 / 45
16. Simulation script (1/2)
# The three reactions. The ‘forward’ reaction (f)
# makes the compound C, and the ‘backward’
# reaction (b) breaks it apart. The ‘produce’
# reaction (p) makes the product P and releases
# the enzyme E.
f : E + S -> C
b : C -> E + S
p : C -> E + P
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 16 / 45
17. Possible forwards, backwards and produce reactions
— starting from the state E = S = 5, C = P = 0.
(5, 5, 0, 0) (E + 1, S + 1, C − 1, P )
b f b
p p
(4, 4, 1, 0) (5, 4, 0, 1) (E, S, C, P ) (E + 1, S, C − 1, P + 1)
b f b f f
p p
(3, 3, 2, 0) (4, 3, 1, 1) (5, 3, 0, 2) (E − 1, S − 1, C + 1, P )
b f b f b f
p p p
(2, 2, 3, 0) (3, 2, 2, 1) (4, 2, 1, 2) (5, 2, 0, 3)
b f b f b f b f
p p p p
(1, 1, 4, 0) (2, 1, 3, 1) (3, 1, 2, 2) (4, 1, 1, 3) (5, 1, 0, 4)
b f b f b f b f b f
p p p p p
(0, 0, 5, 0) (1, 0, 4, 1) (2, 0, 3, 2) (3, 0, 2, 3) (4, 0, 1, 4) (5, 0, 0, 5)
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 17 / 45
24. Plotting the simulation output as a time-series plot
— initially (5, 5, 0, 0)
5
4
3
molecule count
E
S
C
P
2
1
0
0 50 100 150 200
time
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 19 / 45
25. The problem with randomness
How do you know when you’ve got it right?
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 20 / 45
26. Plotting the simulation output as a time-series plot
— initially (10, 10, 0, 0)
10
8
6
molecule count
E
S
C
P
4
2
0
0 50 100 150 200
time
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 21 / 45
27. Plotting the simulation output as a time-series plot
— initially (50, 50, 0, 0)
50
40
30
molecule count
E
S
C
P
20
10
0
0 50 100 150 200
time
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 22 / 45
28. Plotting the simulation output as a time-series plot
— initially (100, 100, 0, 0)
100
80
60
molecule count
E
S
C
P
40
20
0
0 50 100 150 200
time
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 23 / 45
29. Plotting the simulation output as a time-series plot
— initially (500, 500, 0, 0)
500
400
300
molecule count
E
S
C
P
200
100
0
0 50 100 150 200
time
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 24 / 45
30. Plotting the simulation output as a time-series plot
— initially (1000, 1000, 0, 0)
1000
800
600
molecule count
E
S
C
P
400
200
0
0 50 100 150 200
time
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 25 / 45
31. Plotting the simulation output as a time-series plot
— initially (5000, 5000, 0, 0)
5000
4000
3000
molecule count
E
S
C
P
2000
1000
0
0 50 100 150 200
time
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 26 / 45
32. Plotting the simulation output as a time-series plot
— initially (10000, 10000, 0, 0)
10000
8000
6000
molecule count
E
S
C
P
4000
2000
0
0 50 100 150 200
time
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 27 / 45
33. Other ways to display the output
Here we plot the state after each reaction event
0.00000 5 5 0 0
0.08231 4 4 1 0
0.56568 3 3 2 0
0.67220 2 2 3 0
0.83951 1 1 4 0
1.12897 2 2 3 0
1.24503 1 1 4 0
1.84901 0 0 5 0
2.34504 1 1 4 0
2.47735 2 2 3 0
2.57656 1 1 4 0
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 28 / 45
34. Removing molecules
In addition to the kinds of reactions we have seen so far, chemical species
can decay, or otherwise be removed from the simulation.
d :X →
The molecule count of species X is decreased by 1 when this reaction fires,
and no other species count is changed.
Reaction rate
The reaction proceeds at rate d × X .
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 29 / 45
35. Adding molecules
Chemical species can be created, or otherwise be introduced into the
simulation.
c : →Y
The molecule count of species Y is increased by 1 when this reaction fires,
and no other species count is changed.
Reaction rate
This reaction has no reactants and hence fires at a constant rate,
determined by the kinetic constant for this reaction.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 30 / 45
36. Types of reactions considered
In this practical we will never be concerned with reactions which
involve a collision between more than two molecules.
Tri-molecular collisions occur only vanishingly rarely in dilute fluids.
Hence, the number of reactants which a reaction has will either be
zero, or one, or two.
Similarly, the number of chemical species produced by a reaction will
either be zero, or one, or two.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 31 / 45
37. A special case: dimerisation
Reactions in which two molecules of the same species collide constitutes a
special case (called dimerisation). In a reaction such as:
d :A+A→B
the reaction proceeds at rate d × ( 1 × A × (A − 1)), not d × A × A.
2
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 32 / 45
38. A special case: dimerisation (explanation)
This adjusted rate reflects the fact that there are fewer ways for two
molecules of the same type to collide with one another.
Firstly, a molecule cannot collide with itself: hence the subtraction
of 1.
Secondly, a collision between A1 and A2 is exactly the same as a
collision between A2 and A1 : hence the 1 factor.
2
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 33 / 45
39. Summary
Given a reaction script, produce a time-series output
# simulate to a limit # t X
# of 10 seconds 0.00, 100
t = 10 0.01, 98
0.02, 97
# decay rate d is 1 0.03, 96
d = 1.0 Your simulator 0.04, 95
⇒ written in ⇒ 0.05, 95
# 100 molecules of X Objective-C 0.06, 93
X = 100 0.07, 91
0.08, 90
# X decays to nothing .
. .
.
d : X -> . .
model.txt model.csv
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 34 / 45
40. Requirements (1/2)
It should be possible to read and parse a reaction script
Reaction scripts can be formatted with comments and blank lines
Comments are indicated by a hash symbol (# ) and continue until the
end of the line
It should be possible to specify the simulation stop-time, reaction
kinetic constants and species initial concentrations
The simulation stop-time is always denoted by t
Reaction kinetic constants are positive real values and their identifiers
always begin with a lowercase letter (e.g. a, b, c, . . . , but not t, which
is reserved).
Species initial molecule counts are non-negative integer values and their
identifiers always begin with an uppercase letter (e.g. A, B, C, . . . ).
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 35 / 45
41. Requirements (2/2)
Reaction events should fire according to their propensity, as
determined by the reaction kinetic constants and the species
molecular populations.
The application should produce a comma-separated time-series of
molecule counts for the species in the reaction script up to the
simulation stop time.
The first column should be the time points of the observations.
The time series should be preceded by a header listing the species
identifiers in the order in which they appear in the file.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 36 / 45
42. Extra credit
The requirements listed in the section above illustrate the core
functionality which is required from your application.
A well engineered solution which addresses all of the above
requirements should expect to attract a very good or excellent mark.
Additional credit will be awarded for additional useful features which
are not on the above list.
Thus, if you have time remaining before the submission deadline and
you have already met all the requirements listed above then you can
attract additional marks by being creative, conceiving of new features
which can helpfully be added to the application, and implementing
these.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 37 / 45
43. Examples of additional features
Examples of features which you might like to consider adding could be the
following:
good diagnostic error messages for non-well-formed input scripts;
static analysis to generate warnings for kinetic constants or species
populations which are defined but never used;
support for descriptive identifiers for reactions and species (e.g.
forward instead of f, and backward instead of b; and Enzyme
instead of E, and Substrate instead of S);
support for numbers in scientific notation (e.g. writing 1e-10 instead
of 0.0000000001 and 1e12 instead of 1000000000000).
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 38 / 45
44. Early submission credit (1/2)
In software development, timing and delivery of completed applications
and projects can be crucial in gaining a commercial or strategic advantage
over competitors. Being the first to market means that your product has
the opportunity to become known and similar products which come later
may struggle to make the same impact, simply because they became
available second.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 39 / 45
45. Early submission credit (2/2)
In order to motivate good project management, planning, and efficient
software development, the CSLP reserves marks above 90% for work which
is submitted early (specifically, one week before the deadline for Part 2).
To achieve a mark above 90%, a practical submission must be excellent in
all technical and functional aspects, correctly implement the Gillespie
algorithm, be implemented in idiomatic Objective-C, and otherwise meet
all requirements of the practical, and in addition to this it must be
submitted early.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 40 / 45
46. Assessment criteria (1/2)
This practical exercise will be assessed in terms of the completeness,
correctness and efficiency of the simulator, and the quality of the
Objective-C code using features of the language well.
For example, all other things being equal, a simulator which correctly
implements dimerisation will get more marks than one which does
not.
A more accurate simulator will get more marks.
Efficiency is very important in simulators which may need to deal with
millions of reaction events. The main loop of the simulation should
be carefully coded to avoid unnecessary object allocations and other
instructions.
A more efficient simulator will get more marks.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 41 / 45
47. Assessment criteria (2/2)
A well-structured Objective-C application with classes, interfaces and
protocols will be preferred to one which is not structured.
Writing idiomatic Objective-C code will gain marks.
All else being equal, an application which compiled reports static
analysis errors should expect to attract fewer marks than one which
does not.
Sloppy development style ignoring compiler warnings will lose marks.
Additionally, all else being equal, an application whose code contains
examples of poor programming style (such as unused variables, dead
code, blocks of commented-out code etc) should expect to attract
fewer marks than an application which does not have these problems.
Poor programming style will lose marks.
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 42 / 45
48. Marking process (1/3)
1. A new, previously-unseen simulation script is created.
The application will be tested on both seen and unseen simulation
scripts.
2. The accompanying documentation is read for instructions on how to
use the application.
Submissions with insufficient documentation will lose marks here.
3. The Objective-C project is compiled and inspected for errors or
warnings
Submissions with errors or static analysis warnings will lose marks here
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 43 / 45
49. Marking process (2/3)
4. The compiled code is run on previously-seen sample simulation scripts
Submissions which fail to execute will lose marks here
Submissions which produce incorrect output will lose marks here
5. The compiled code is run on previously-unseen sample simulation
scripts
Submissions which fail to execute will lose marks here
Submissions which produce incorrect output will lose marks here
6. The submitted simulation scripts are inspected as evidence of
developer testing.
Submissions which have had insufficient testing will lose marks here
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 44 / 45
50. Marking process (3/3)
7. Other additional features of the application will be explored
Submissions with useful additional features will gain marks here
8. The Objective-C source code will be inspected for good programming
style
Submissions with insufficient structure will lose marks here
Submissions which do not use Objective-C features will lose marks here
Submissions with too few comments will lose marks here
Submissions with blocks of commented-out code will lose marks here
Stephen Gilmore (School of Informatics) Computer Science Large Practical Friday 28th September, 2012 45 / 45