11. Mechanical Sympathy
Ethernet ping
1MB Disk/Ethernet
RDMA over Infiniband
Cross
Continental ms μs ns ps
Round
Trip
0.000,000,000,000
Main Memory L1 Cache Ref
Ref
1MB Main Memory L2 Cache Ref
* L1 ref is about 2 clock cycles or 0.7ns. This is the
time it takes light to travel 20cm
12. Key Point #1
Simple computer
programs, operating in a
single address space are
extremely fast.
13.
14. Why are there so many
types of database
these days?
…because we need
different architectures
for different jobs
33. Each machine is responsible for a subset
of the records. Each record exists on only
one machine.
1, 2, 3… 97, 98, 99…
765, 769… 169, 170…
Client
333, 334… 244, 245…
34. #3: The In Memory Database
(single address-space)
39. Memory is at least 100x faster than
disk
ms μs ns ps
1MB Disk/Network 1MB Main Memory
0.000,000,000,000
Cross Continental Main Memory L1 Cache Ref
Round Trip Ref
Cross Network Round L2 Cache Ref
Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the
time it takes light to travel 20cm
51. But at the cost
of loosing the
single address
space
52. Traditional
Shared Shared
In Memory
Disk Nothing
Distributed Simpler
In Memory Contract
53. Key Point #4
There are three key forces:
Simplify the
Distribution No Disk
contract
Improve
Gain
scalability
scalability All data is
by picking
through a held in
appropriate
distributed RAM
ACID
architecture
properties.
54. These three non-
functional themes
lay behind the design
of ODC, RBS‟s in-
memory data
warehouse
59. Which is best for latency?
Shared
Nothing
(Distributed)
Traditional In-Memory
Database Database
Latency?
60. Which is best for throughput?
Shared
Nothing
(Distributed)
Traditional In-Memory
Database Database
Throughput?
61. So why do we use distributed
in-memory?
In Plentiful
Memory hardware
Latency Throughput
62. ODC – Distributed, Shared Nothing, In
Memory, Semi-Normalised, Realtime Graph
DB
450 processes 2TB of RAM
Messaging (Topic Based) as a system of record
(persistence)
63. The Layers
Access Layer Jav Jav
a a
clie clie
nt
API nt
API
Query Layer
Transaction
Data Layer
s
Mtms
Cashflows
Persistence
Layer
64. Three Tools of Distributed Data
Architecture
Indexing
Partitioning Replication
77. …and all the duplication means
you run out of space really
quickly
78. Spaces issues are exaggerated
further when data is versioned
Trade
r Part Version 1
y
Trad
e Trade
r Part Version 2
y
Trad
e Trade
r Part Version 3
y
Trad
e Trade
r Part Version 4
y
Trad
…and you need e
versioning to do MVCC
79. And reconstituting a previous
time slice becomes very
difficult.
Trad Trade
Part
e r
y
Part Trade
Trad y r
e
Part
y Trade
r
Trad
e Part
y
80. So we want to hold
entities separately
(normalised) to alleviate
concerns around
consistency and space
usage
81. Remember this means the
object graph will be split across
multiple machines. Data is
Independently
Versioned Trade
Singleton
r Part
y
Trad
e
Trad Trade Part
e r
y
82. Binding them back together involves a
“distributed join” => Lots of network
hops
Trade
r Part
y
Trad
e
Trad Trade Part
e r
y
84. So what we want is the advantages
of a normalised store at the speed
of a denormalised one!
This is what using Snowflake Schemas and
the Connected Replication pattern is all
about!
85. Looking more closely: Why
does normalisation mean we
have to spread data around the
cluster. Why can‟t we hold it all
together?
95. Looking at the data:
Facts:
=>Big,
common
keys
Dimensions
=>Small,
crosscutting
Keys
96. We remember we are a grid. We
should avoid the distributed
join.
97. … so we only want to „join‟ data
that is in the same process
Use a Key
Assignment
Trade
Policy
MTMs (e.g. KeyAssociation
s
in Coherence)
Common
Key
98. So we prescribe different
physical storage for Facts and
Dimensions
Replicated
Trader
Party
Trade
Partitioned
99. Facts are
partitioned, dimensions are
replicated
Query Layer
Trader
Party
Trade
Transactions
Data Layer
Mtms
Cashflows
Fact Storage
(Partitioned)
100. Facts are
partitioned, dimensions are
replicated
Dimension
s
(repliacte)
Transactions
Facts
Mtms
Cashflows
(distribute/
partition)
Fact Storage
(Partitioned)
101. The data volumes back this up
as a sensible hypothesis
Facts:
=>Big
=>Distribut
e
Dimensions
=>Small
=> Replicate
102. Key Point
We use a variant on a
Snowflake Schema to
partition big entities that can
be related via a partitioning
key and replicate small stuff
who’s keys can’t map to our
partitioning key.
104. So how does they help us to run
queries without distributed
joins?
Select Transaction, MTM,
RefrenceData From MTM,
Transaction, Ref Where Cost Centre
= ‘CC1’
105. What would this look like
without this pattern?
Get Get Get Get Get Get Get
Cost Ledger Source Transa MTMs Legs Cost
Center Books Books c-tions Center
s s
Network
Time
106. But by balancing Replication and
Partitioning we don‟t need all those hops
Get Get Get Get Get Get Get
Cost Ledger Source Transac MTMs Legs Cost
Centers Books Books -tions Centers
Network
107. Stage 1: Focus on the where
clause:
Where Cost Centre = „CC1‟
108. Stage 1: Get the right keys to
query the Facts
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre =
‘CC1’
Join
Dimensions in
Query Layer
Transactions
Mtms
Cashflows
Partitioned
109. Stage 2: Cluster Join to get
Facts
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre =
‘CC1’
Join
Dimensions in
Query Layer
Transactions
Join Facts Mtms
acrossCashflows
cluster
Partitioned
110. Stage 2: Join the facts together
efficiently as we know they are
collocated
111. Stage 3: Augment raw Facts
with relevant Dimensions
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre =
‘CC1’
Join Join Dimensions
Dimensions in Query Layer
in Query
Layer
Transactions
Join FactsMtms
across Cashflows
cluster
Partitioned
130. One recent independent study
from the database community
showed that 80% of data
remains unused
131. So we only replicate
‘Connected’ or ‘Used’
dimensions
132. As data is written to the data store we
keep our „Connected Caches‟ up to date
Processing Layer
Dimension
Caches
(Replicated)
Transactions
Data Layer
As new Facts are added Mtms
relevant Dimensions that
they reference are moved
Cashflows
to processing layer
caches
Fact Storage
(Partitioned)
133. The Replicated Layer is updated
by recursing through the arcs
on the domain model when facts
change
134. Saving a trade causes all it‟s 1
levelst
references to be triggered
Query Layer
Save Trade (With connected
dimension Caches)
Data Layer
Cache
Trad (All Normalised)
Store e
Partitioned
Trigger Cache
Party Sourc Ccy
Alias e
Book
135. This updates the connected caches
Query Layer
(With connected
dimension Caches)
Data Layer
Trad (All Normalised)
e
Party Sourc Ccy
Alias e
Book
136. The process recurses through the
object graph
Query Layer
(With connected
dimension Caches)
Data Layer
Trad (All Normalised)
e
Party Sourc Ccy
Alias e
Book
Party Ledge
rBook
137. ‘Connected Replication’
A simple pattern which
recurses through the foreign
keys in the domain
model, ensuring only
‘Connected’ dimensions are
replicated
138. With ‘Connected
Replication’ only
1/10th of the data
needs to be replicated
(on average).
I started a project back in 2004. It was a trading system back at barcap. When it came to persisting our data there were three choices, Oracle, Sybase or Sql Server. A lot of has changed in that time. Today, we are far more likely to look at one of a variety of technologies to satisfy our need to store and re-retrieve our data. So how many of you use a traditional database?What about a distributed database like Oracle RAC?NoSQL?.. do you use it with a database or stand alone.What about an in memory database? in production?Finally what about distributed in memory?This talk is about an in memory database. It's not really a distributed cache, despite being implemented in Coherence, although you could call it one if you preferred. In truth it has a variety of elements that make it closer to what you might perceive to be a database. It is normalised: that is to say that it holds entities independently from one another and versions them as such. It has some basic guarantees of the automaticity when writing certain groups of objects that are collocated. Most importantly it is both fast and scalable regardless of the join criteria you impose on it, this being something fairly illusive in the world of distributed data storage. I have a few aims for today:I hope you will leave with a broader view on what stores are available to you and what is coming in the future.I hope you'll see the benefits that niece storage solutions can provide through simpler contracts between client and data store.I'd like you to understand the benefits of memory over disk.
Better example is amazonPartition by user so orders and basket are held togetherProducts will be shared by multiple users
Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)
Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)