Peter Lawrey gave a presentation on deterministic behavior and performance in trading. Some key points:
- Using lambda functions and state machines can help make systems more deterministic and easy to reason about.
- Recording all inputs and outputs allows systems to be replayed and upgraded deterministically. This supports testing.
- Little's Law relates throughput, latency, and number of workers. For trading systems, reducing latency increases throughput.
- Avoiding "coordinated omission" is important for accurate latency testing.
- In Java 8, escape analysis and inlining can avoid object creation with lambdas, improving performance.
- Systems using Chronicle Queue can achieve low 25 microsecond latency while ensuring data is
Deterministic behaviour and performance in trading systems
1. Peter Lawrey
CEO of Higher Frequency Trading
Google Developer Group 2015
Deterministic Behaviour and
Performance in Trading
2. Peter Lawrey
Java Developer/Consultant for hedge fund and trading firms for 6
years.
Most answers for Java and JVM on stackoverflow.com
Founder of the Performance Java User’s Group.
Architect of Chronicle Software
3. Agenda
• Lambda functions and state machines
• Record every input
• Determinism by Design
• Record every output
• The consequences of Little’s Law
• Java 8 and Garbage
• Chronicle Queue demo
4. Lambda Functions
• No mutable state
• Easy to reason about
• Easy to componentize
• But … no mutable state.
5. State Machine
• Local mutable state
• Easier to reason about,
than shared state
• Easier to componentize
• Not as simple as Lambda
Functions
7. Record every input
• By recording every input you can recreate the state of the
system at any point and recreate bugs, test rare conditions, and
test the latency distribution of your system.
But
• This approach doesn’t support software upgrades.
• A replay facility which is implemented after the fact might not
recreate your system completely.
8. Determinism by design
• You want a system where producers write every event, and
consumers and continuously in replay. This way you can be
sure that you have this facility early in the development cycle
and you know that you have recorded every event/input.
• This facility can help you in the testing of your system by
allowing to you build small simple tests to huge complex data
driven tests.
9. Record every output
• Supports live software upgrades. By recording and replaying
outcome you can have a system which commits to any decision
the previous one made. Ie you can change the software to
make different decisions.
• This can be tested at the API level by having two state
machines, where the input of one is the output of the other.
10. Little’s law
Little’s law states;
The long-term average number of customers in a stable system L
is equal to the long-term average effective arrival rate, λ,
multiplied by the (Palm-)average time a customer spends in the
system, W; or expressed algebraically: L = λW
11. Little’s law as work.
The number of active workers must be at least the average arrival
rate of tasks multiplied by the average time to complete those
tasks.
workers >= tasks/second * seconds to perform task.
Or
throughput <= workers / latency.
12. Consequences of Little’s law
• If you have a problem with a high degree of independent tasks,
you can throw more workers at the problem to handle the
load. E.g. web services
• If you have a problem with a low degree of independent tasks,
adding more workers will mean more will be idle. E.g. many
trading systems. The solution is to reduce latency to increase
throughput.
13. Consequences of Little’s law
• Average latency is a function, sometimes the inverse, of the
throughput.
• Throughput focuses on the average experience. The worst case
is often the ones which will hurt you, but averages are very
good at hiding your worst cases. E.g. from long GC pauses.
• Testing with Co-ordinated omission also hides worst case
latencies.
14. Co-ordinated omission
• A term coined by Gil Tene.
• Co-ordinated omission occurs when the system being tested is
allowed to apply back pressure on the system doing the
testing. When the tested system being tested is slow, it can
effectively pause the test, esp. when averages or latency
percentiles are considered.
15. Co-ordinated omission: Example
• A shop is open 10 hours a day between 8 AM and 6 PM.
• A customer comes every 5 minutes, waits to be served and
leaves.
• When the shop keeper is there, he takes 1 minute to serve.
• But if he takes a 2 hour lunch break, how does this effect the
average latency or the 98th percentile?
16. How not to measure latency.
• You have one person go to the shop and time how long she has
to wait. Once per day she has to wait 2 hours and 1 minute,
but the rest of the day it only takes 1 minute.
• The average of 97 tests is 2.2 minutes. Had the shop been open
all day, there would be 120 tests, but one took 2 hours. Not
great but doesn’t sound much worse than 1 minute.
• The 98th percentile is 1 minute.
17. Avoiding co-ordinated omission
• You have as many people as you need. Most of the time, only
one is waiting, however over the lunch break, there is 31
people delayed 121, 117, 113, 109 … 5 mins.
• The average of 120 tests is 16.5 minutes wait time. This is much
higher than the 2.2 minutes calculated previously.
• The 98th percentile is 111 minutes, instead of 1 minute in the
previous test.
18. Doesn’t the GC stop the world?
• The GC only pauses the JVM when it has some work to do.
Produce less garbage and it will pause less often
• Produce less than 1 GB/hour of garbage and you can get less
than one pause per day. (With a 24 GB Eden)
19. Do I need to avoid all objects?
• In Java 8 you can have very short lived objects placed on the
stack. This requires your code to be inlined and escape analysis
to kick in. When this happens, no garbage is created and the
code is faster.
• You can have very long lived objects, provided you don’t have
too much.
• The rest of your data you can place in native memory (off
heap)
• You can create 1 GB/hour of garbage and still not GC
20. Do I need to avoid all objects?
• In Java 8 you can have very short lived objects placed on the
stack. This requires your code to be inlined and escape analysis
to kick in. When this happens, no garbage is created and the
code is faster.
• You can have very long lived objects, provided you don’t have
too much.
• The rest of your data you can place in native memory (off
heap)
• You can create 1 GB/hour of garbage and still not GC
21. How does Java 8 avoid creating objects?
One way to think of Java 8 lambdas is the ability to pass behaviour
to a library. With inlining, an alternative view is the ability to
template your code. Consider this locking example
lock.lock();
try {
doSomething();
} finally {
lock.unlock();
}
22. How does Java 8 avoid creating objects?
This boiler place can be templated
public static void withLock(Lock lock,
Runnable runnable) {
lock.lock();
try {
runnable.run();
} finally {
lock.unlock();
}
}
23. How does Java 8 avoid creating objects?
This simplifies the code to be
withLock(lock, () -> doSometing());
Doesn’t using a Runnable create an object?
With inlining and escape analysis the Runnable can be placed on
the stack and eliminated (as it has no fields)
24. Low Latency with lots of Lambdas
Chronicle Wire is an API for generic serialization and
deserialization. You determine what you want to read/write, but
the exact wire format can be injected. This works for Yaml, Binary
Yaml, and raw data. It will support XML, FIX, JSON and BSON.
This uses lambdas extensively but the objects associated can be
eliminated.
25. Low Latency with lots of Lambdas
wire.writeDocument(false, out ->
out.write(() -> "put")
.marshallable(m ->
m.write(() -> "key").int64(n)
.write(() -> "value").text(words[n])));
As Yaml
--- !!data
put: { key: 1, value: hello }
As Binary Yaml
⒗٠٠٠Ãputu0082⒎٠٠٠⒈åhello
26. Isn’t writing to disk slow?
• Uncommitted synchronous writes can be extremely fast.
Typically around a micro-second. The writes are synchronous
to the application so data is not lost if the application dies, but
not actually committed to disk.
• To prevent loss of data on power failure, you can use
replication.
27. A low latency with fail over
• Data sent between
servers is half round
trip.
• Inputs are written on
both servers.
• Outputs are written on
both servers.
• The end to end latency
can be 25 µs, 99% of
the time.
29. Next Steps
• Chronicle is open source so you can start right away!
• Working with clients to produce Chronicle Enterprise
• Support contract for Chronicle and consultancy
30. Q & A
Peter Lawrey
@PeterLawrey
http://chronicle.software
http://vanillajava.blogspot.com