ScyllaDB recently announced Project Alternator, a new open source project that will enable Amazon DynamoDB users to easily migrate to an open-source database that runs anywhere — on most cloud platforms, on-premises, on bare-metal, virtual machines or via Kubernetes — all while preserving their investments in their existing application code.
Project Alternator will help DynamoDB users achieve much better and more reliable performance, reduce database costs by 80% - 90%, support large items (10s of MBs) and large partitions (multiple GBs), control the number of replicas, balance cost vs. redundancy, and much more.
Join ScyllaDB founders Avi Kivity and Dor Laor and lead engineer Nadav Har’El for a live webinar on September 25th, where they will share an overview of Project Alternator, including:
Alternator’s design implementation and goals
How to configure Alternator (ok, add alternator_port: 8000 to your scylla.yaml)
Demo how to easily run it from docker/rpm
Run several examples:
Tic-tac-toe based DynamoDB example with Alternator
How to benchmark Scylla Alternator with YCSB and considerations around it
How to run a serverless application along with Alternator
How to migrate DynamoDB data to Alternator using the Spark migrator
Discuss the current limitations of Alternator
Plus we will discuss current limitations of Alternator, describe different consistencies and active-active vs leader model, share the project roadmap, and answer your questions at the end.
2. 2
Dor Laor is the CEO of ScyllaDB. Previously, Dor was part of the founding team of the KVM
hypervisor under Qumranet that was acquired by Red Hat. At Red Hat Dor was managing the KVM
and Xen development for several years. Dor holds an MSc from the Technion and a Phd in
snowboarding.
Avi Kivity, CTO of ScyllaDB, is known mostly for starting the Kernel-based Virtual Machine (KVM)
project, the hypervisor underlying many production clouds. He has worked for Qumranet and Red
Hat as KVM maintainer until December 2012. Avi is now CTO of ScyllaDB, a company that seeks to
bring the same kind of innovation to the public cloud space.
Nadav Har’El has had a diverse 20-year career in computer programming and computer science. In
the past he worked on scientific computing, networking software, and information retrieval, but in
recent years his focus has been on virtualization and operating systems, and among other things he
has worked on nested virtualization and exit-less I/O in KVM, and today he maintains the OSv kernel
and also works on Seastar and ScyllaDB.
3. 3
+ The Real-Time Big Data Database
+ Drop-in replacement for Cassandra
+ 10X the performance & low tail latency
+ New: Scylla Cloud, DBaaS
+ Open source and enterprise editions
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA; Herzelia, Israel
4. What is Alternator
+ Why a DynamoDB-compatible API?
Live demos
+ Get started in 5 minutes in docker
+ CLI access to the DynamoDB API
+ Play a game on Alternator
+ Monitoring Alternator
Alternator implementation
+ How Scylla differs from DynamoDB
+ Current compatibility and limitations
Migrating from DynamoDB to Alternator
6. + Scylla is an efficient NoSQL data store, announced September 2015.
+ It is open source, with enterprise support and cloud (SaaS) options.
+ Compatible with Cassandra and its APIs (CQL, Thrift).
+ Alternator: Adding a DynamoDB API to Scylla.
7. + Efficient implementation for modern hardware
+ Throughput 10x higher than Cassandra
+ Linear scalability to many-core machines
+ Focused on modern fast SSDs
+ Low tail latency
+ Reliability
+ Autonomous database (minimal configuration)
We can apply these advantages to more than just Cassandra compatibility!
8. DynamoDB is similar in design and data model to Scylla
More details on the similarities, and differences, later.
Amazon Dynamo
(2007 paper)
Google Bigtable
(2006 paper)
9. DynamoDB is SaaS
+ SaaS is easy to get started with,
SaaS trend in industry
+ DynamoDB popularity growing
Vs. Cassandra:
Cassandra
DynamoDB
11. Vendor lock-in
+ Users want to move their DynamoDB application to
+ a different cloud provider,
+ a private datacenter,
+ or a hybrid of several clouds or datacenters.
+ Scylla can be run on any cloud or datacenter.
13. + Running Alternator is simply running Scylla, with the parameter
“alternator-port” set - to listen for the DynamoAPI API on that port.
+ You can get it running on local your machine in 5 minutes using docker:
docker run --name scylla -d -p 8000:8000
scylladb/scylla-nightly:alternator --alternator-port=8000
14. + We can then run Amazon’s DynamoDB CLI tools against port 8000
+ aws --endpoint-url 'http://172.17.0.1:8000' dynamodb create-table
--table-name mytab
--attribute-definitions AttributeName=key,AttributeType=S
--key-schema AttributeName=key,KeyType=HASH
--billing-mode PAY_PER_REQUEST
+ First attempt works, second fails (as expected)
15. https://github.com/awsdocs/amazon-dynamodb-developer-guide/blob/master/doc_source/TicTacToe.Phase1.md
+ An open-source Python application using DynamoDB
+ Written by Amazon, demonstrates many DynamoDB features
+ Written in Python, using the Amazon’s AWS client library (boto3)
+ python application.py --mode local --port 8000
Implements a multiplayer Tic-Tac-Toe game server.
Many users can connect, invite each other to games, and play against each other,
and keep score.
19. + Live workload
+ Cluster of three 30-core nodes in AWS, each in separate AZ.
+ 1.1 TB data - 1 billion items, 1.1KB each.
+ YCSB workload, 50% read 50% write, Zipfian distribution.
21. Let’s survey some of the similarities and differences of Scylla and DynamoDB
+ How did we handle the differences?
+ What still needs to be done?
A much more detailed survey can be found in this document.
22. + Fast scalable NoSQL databases with real-time response and huge volume
+ Key/Value and Wide-row stores
+ Eventual and configurable consistencies
+ Hashed partition keys - and also sort key for items inside a partition
23. + DynamoDB is Cloud Native
+ Only available as a service, part of a huge deployment
+ Scylla has different options: OSS, Enterprise, as-a-service
+ More flexible - runs everywhere
+ Many configuration options - amount of replicas, different consistency levels,
+ Scylla has node operations, cli, etc
+ Scylla integrates with many OSS projects, Prometheus, Kafka, Spark (as first citizens)
+ Scylla’s units are servers (cpu shards). Dynamo’s units are IOPS/Tablets
+ Dynamo uses HTTP(s)/Json, Scylla used CQL
24. + Data is divided into tables.
+ Data composed of Items in partitions.
Same as Scylla’s rows in partitions.
+ Item’s key has hash and range parts - like Scylla’s partition and clustering key.
(in DynamoDB API - only one of each)
+ Type of key columns defined in table’s schema (string, number, bytes)
25. + But DynamoDB items can have additional attributes - not defined in schema
+ Attributes may be scalars (string, number, etc.), lists, sets, document.
+ Similar to a JSON document.
+ Needs to be emulated in Scylla
+ Put in table one map for top-level attributes
(mapping attribute name to JSON value).
+ Map instead of single JSON allows concurrent updates to
different top-level attributes of same item.
+ TODO: support updates to deep attributes.
26. ● DynamoDB natively supports Read-Modify-Write (RMW) updates:
○ Conditional updates (set a = 2 if a == 1)
○ Counters (set a = a + 1)
○ Attribute copy (set a = b)
● Scylla natively supports independent writes to different columns
(CRDT):
○ Efficient updates to different columns - not requiring a read.
We are adding support for Read-modify-write operations - LWT.
27. Scylla DynamoDB
Log Structured Merge (LSM)
■ Efficient writes - without prior
read
BTree
■ Write includes a free read, but
are slow.
Quorum-based consistency
■ Writes done on several
replicas independently
■ Concurrent read-modify-write
operations are not serialized.
Leader model
■ One replica (“leader”)
responsible for a write
operation, so can serialize
read-modify-write operations.
28. + Each node in Scylla cluster also answers DynamoDB API requests on this port.
+ So no need for separate sizing of an API translation cluster.
+ Same nodes can do both CQL and DynamoDB API
+ As needed, forwards the request to other nodes holding the requested data.
+ Uses internal Scylla function calls and RPC - no translation to CQL.
+ Client needs to send requests to the different Scylla nodes. Can be done via
DNS or HTTP load balancer.
30. + See detailed current status in alternator.md, and issues in bug tracker:
+ https://github.com/scylladb/scylla/blob/master/docs/alternator/alternator.md
+ https://github.com/scylladb/scylla/issues
+ Several DynamoDB applications already work unmodified.
+ Some of the issues we will address for the GA:
+ Safe concurrent read-modify-write operations
+ A few operations and subcases of operations not yet supported
+ Authentication
+ SaaS on Scylla Cloud
+ DynamoDB streams (CDC)
32. + Install Scylla and load balancer (or wait for Scylla Cloud SaaS availability)
+ Tell your application, written to use DynamoDB, Scylla’s endpoint address
+ This is a preview release. Watch out for unsupported features and unsafe
concurrent RMW operations.
+ Migrate existing data from DynamoDB to Scylla using DynamoDB API
+ E.g. Spark migrator:
https://www.scylladb.com/2019/09/12/migrating-from-dynamodb-to-scylla
33. + Scylla is a very efficient, reliable, low latency NoSQL data store,
that began with Cassandra compatibility.
+ The Alternator Project adds to Scylla DynamoDB API compatibility.
+ Can run existing applications designed for DynamoDB,
+ On any cloud or data center, not just on AWS.
+ Open source.
+ Currently a preview release, with some limitations, but GA expected soon.