Key-value stores are widely used in applications that only require primary key data access, which is common in many web applications. Because developing an industrial grade key value store is expensive, the conventional solution is to use one of the existing key-value stores and layer application semantics on top of the primitives provided by the store. This approach leads to potential inefficiencies, because application specific semantics can often allow optimizations in the implementation of the store. We present an alternative approach, using the TACC platform to provide a key-value store implementation that is both performant and easily customizable. The TACC programming model separates state from logic: state is stored in a collection of distributed in-memory database instances, while logic is performed by distributed agents that react asynchronously to changes in objects stored in the database instances. Agents can selectively subscribe to updates using a fine-grain hierarchical directory system to mount objects into a local namespace. TACC provides performance comparable to hand-coded C while reducing the actual source code size to a fraction of that. We describe the implementation and performance of a scalable and fault tolerant key-value store using TACC, pointing out the benefits realized by using TACC's strong, user-defined types and triggering/notification.Key-value stores are widely used in applications that only require primary key data access, which is common in many web applications. Because developing an industrial grade key value store is expensive, the conventional solution is to use one of the existing key-value stores and layer application semantics on top of the primitives provided by the store. This approach leads to potential inefficiencies, because application specific semantics can often allow optimizations in the implementation of the store. We present an alternative approach, using the TACC platform to provide a key-value store implementation that is both performant and easily customizable. The TACC programming model separates state from logic: state is stored in a collection of distributed in-memory database instances, while logic is performed by distributed agents that react asynchronously to changes in objects stored in the database instances. Agents can selectively subscribe to updates using a fine-grain hierarchical directory system to mount objects into a local namespace. TACC provides performance comparable to hand-coded C while reducing the actual source code size to a fraction of that. We describe the implementation and performance of a scalable and fault tolerant key-value store using TACC, pointing out the benefits realized by using TACC's strong, user-defined types and triggering/notification.
http://sdec.kr/
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
SDEC2011 Going by TACC
1. Going
by
TACC:
Beyond
Key-‐Value
to
Fault-‐Tolerant
Stores
with
Easily
Customizable
Semantics
Henk
Goosen,
CEO
goosen@optumsoft.com
2. Key-‐value
stores
rule
the
Web
Many
applications
only
need
primary
key
data
access
Examples:
catalogs,
shopping
carts,
web
session
state
No
need
for
the
complexity,
performance
overhead,
and
lack
of
scalability
of
a
full
database
Hence:
Key-‐value
stores
are
everywhere
Dynamo,
CouchDB,
Cassandra,
Project
Voldemort,
Riak,
Redis,
memcached,
MongoDB,
…
Improving key-value stores is important
OptumSoft, Inc. Proprietary and 2
Confidential
3. Key-‐value
stores
in
practice
Developing
a
key-‐value
store
from
scratch
using
conventional
languages
is
expensive:
scalability,
performance,
and
fault
tolerance
Conventional
solution:
use
existing
key-‐value
store
Layer
on
get()
and
put()
semantics
Mismatches
between
application
requirements
and
library:
either
accept
or
extensively
modify
library
code
Applications are more complex,
performance suffers
OptumSoft, Inc. Proprietary and 3
Confidential
4. TACC
provides
a
different
model
Use
a
very
high-‐level
language
to
specify
the
key-‐value
store
Then
customize
the
store,
applying
application-‐specific
semantics
Benefits:
Simplifies
the
application
business
logic
Improves
the
performance
of
both
store
and
application
TACC model is better!
OptumSoft, Inc. Proprietary and 4
Confidential
5. TACC
is
an
object-‐oriented,
strongly
typed
language
User-‐defined
type:
a
list
of
attributes
(nouns)
Read
or
write
attributes
(there
are
no
methods/verbs)
Logic
primarily
implemented
via
constraints
imperative
code
is
also
supported
Compact
code
First
class
high
level
data
types
(eg,
queues,
hash
tables)
Several
design
patterns
directly
supported
in
language
(eg
observer
pattern)
Compact code fewer bugs, quicker to market
5
6. TACC:
efficient
development
of
distributed
systems
Reduce
development
time
by
a
factor
of
2x
to
3x
Reduce
lines
of
code
by
10x
or
more
Eliminate
most
synchronization
and
concurrency
bugs
High,
predictable
performance
using
optimized
code
generation
Fault-‐Tolerance
built
into
the
model,
and
easy
to
implement
TACC is a general purpose language,
focused on distributed systems
6
7. Stateful
remote
proxy
objects
LR
1
LR
2
Agents 1
Proxy:
local
copy
of
data
1
Writes
are
asynchronously
object added copied
to
SysDB
to collection
SysDB
changes
are
copied
to
“interested”
agents
R/W
access
is
local,
fast
SysDB 1
No
remote
access
collection
exceptions
Simple semantics, and fast
OptumSoft, Inc. Proprietary and 7
Confidential
8. SysDB:
a
hierarchical
in-‐memory
object
database
Stores
state
(ideally
no
logic)
Minimizes
risk
of
program
logic
bugs,
hence
reliable
Agents
Concise
specification
of
user-‐
defined
types
TACC
compiler
automatically
generates
all
required
code
for
remote
access
SysDB
Agents
receive
automatic
notification
when
values
change
OptumSoft, Inc. Proprietary and 8
Confidential
9. Distributed,
hierarchical
name
space
SysDB
defines
and
exports
an
hierarchical
name
space
(similar
to
a
distributed
file
system)
Remote
agents
can
“mount”
remote
directories
into
a
local
namespace
Each
object
is
instantiated
into
a
directory,
state
is
made
available
remotely
via
proxy
objects
Updates
propagate
asynchronously,
notifications
are
delivered
on
changes
Simple, powerful, proven way to
provide large, structured name space
OptumSoft, Inc. Proprietary and 9
Confidential
10. Fault-‐tolerance
is
built
in
When
an
agent
restarts,
it
recovers
its
state
from
SysDB
A1
A2
A3
A4
Agents
implement
invariants,
therefore
can
be
restarted
at
any
time,
on
any
server
Any
number
of
backup
SP
SB
SysDBs
are
supported
Fast
recovery
for
high
availability
10
11. Example:
Location
Service
as
customized
key-‐value
store
Application
needs
to
track
real-‐time
location
of
user
User
allowed
in
only
one
location
at
a
time
Three
operations:
ENTER
<user
id>
<session
id>
<location
id>
LEAVE
<user
id>
QUERY
<user
id>
Throughput
>
10,000
requests/sec,
latency
<
1
ms
High throughput, low latency required
OptumSoft, Inc. Proprietary and 11
Confidential
12. Location
Service
Overview
Load
balancer
Get
GS
location
LR
HTTP
access
to
service
GS
Application
(GS)
contacts
Leave
any
LR
server
via
load
LR
balancer
GS
LR
servers
replicated
for
GS
LR
scalability
and
for
fault
Enter
tolerance
GS
LR
Challenge: ensure responses from
GS
Enter
multiple LR servers are handled
correctly
OptumSoft, Inc. Proprietary and 12
Confidential
13. Key-‐value
store
tracks
location
for
each
user
Load
Key-‐value
balancer
store
GS
LR
Shard
GS
A-‐J
LR
Smith,1
GS
Has
to
be
Enter
atomic
Shard
Smith,1
K-‐R
GS
get(),
LR
put()
GS
Shard
Enter
LR
Smith,2
get(),
Smith
S-‐Z
GS
Smith,2
put()
OptumSoft, Inc. Proprietary and 13
Confidential
14. TACC
allows
easy
customization
of
key-‐value
update
semantics
Each
partition
stores
a
unique
subset
of
the
user
state
We
directly
implement
ENTER,
LEAVE,
and
QUERY
semantics,
using
a
TACC
Constrainer
No
locking
or
inter-‐agent
synchronization
required
Requests
and
responses
sent
asynchronously
High
performance:
there
is
no
waiting
or
blocking
Specializing the key-value store semantics
simplifies the application and improves performance
OptumSoft, Inc. Proprietary and 14
Confidential
15. Single-‐writer
collections:
no
need
for
synchronization
R
S
R
LR
R S
S
R Shard
R Request
Collection
A-‐J
S
Response
Collection
S
R R
S
S
LR
R
S
R
S
R R Shard
S
S
K-‐R
LR
R R
S
S
OptumSoft, Inc. Proprietary and 15
Confidential
16. The
Serializer
Constrainer
Logic
Notify
Update user
Write result status
Request
Collection
Response
Collection
A
Enter
U1,
R5
A
OK
Status
Collection
K
Enter
U1,
R5
K
NOT
ALLOWED
U1
R5
D
Enter
U8,
R9
D
OK
U8
R9
Really simple!
OptumSoft, Inc. Proprietary and 16
Confidential
17. Details
of
Constrainer
implementation
Code
for
the
Serializer
constrainer
defines
three
collections:
Input
collection:
requests
Output
collections:
responses
and
user
status
A
dependency
constraint
causes
imperative
code
to
be
executed
when
a
new
request
arrives
from
LR
server
The
imperative
code
in
the
constrainer
implements
the
application
specific
semantics
This code is a minor tweak on put() implementation
OptumSoft, Inc. Proprietary and 17
Confidential
18. Constraints,
strong
typing
improves
event
handling
code
Constraint
handling
code
automatically
inserted
by
compiler
No
need
to
manually
maintain
invariants
in
many
call
sites
User-‐defined
types
organize
constraint
handling
code
and
protect
against
mistakes
TACC
coroutine
further
simplifies
event
handling
TACC changes event-handling spaghetti into
well-structured, type-safe code
OptumSoft, Inc. Proprietary and 18
Confidential
19. Instrumentation
and
Measurements
Stress
Agent
and
SysDB
instrumented
to
collect
timestamps
(stored
in
memory,
I/O
after
test)
tcpdump
run
on
Stress
Agent
and
SysDB
servers
Correlate
timestamps
with
tcpdump
OptumSoft, Inc. Proprietary and 19
Confidential
20. Low
latency
pitfalls
to
avoid
Network
and
TCP
behavior
Many
TCP
settings
have
a
dramatic
and
non-‐linear
performance
impact
Memory
management
Memory
allocation/deallocation
Avoid
garbage
collection
“The devil is in the details”
OptumSoft, Inc. Proprietary and 20
Confidential
21. Zero-‐load
Latency
(μs)
End-‐to-‐end
Time
Latency
Request
0
created
1
Request
48
48
SysDB
Time
Latency
packet
2
Receive
request
3
0.0
Response
248
200
Notification
4
42.3
42.3
packet
7
Response
75.1
32.8
Notification
8
288
40
enqueued
5
Response
packet
6
108.5
33.4
Latencies are low and predictable
OptumSoft, Inc. Proprietary and 21
Confidential
22. Latency,
throughput
vs
SysDBs
High scalability under Latency converges to
strict latency bound zero-load latency
OptumSoft, Inc. Proprietary and 22
Confidential
23. Summary
Tacc
enables
developers
to
efficiently
create
predictably
high
performance,
scalable,
fault-‐tolerant
distributed
applications
Eliminates
synchronization
and
locking
bugs
Fewer
lines
of
code
Faster
to
develop,
shorter
time
to
market
Easier
to
maintain
Fewer
bugs
23
24. Contact
me
for
more
information
about
TACC
and
OptumSoft!
goosen@optumsoft.com
OptumSoft, Inc. Proprietary and 24
Confidential