The CloudStack European User group met on Thursday 11th for our quarterly meeting.
Stuart Mcall from Basho talked about their RiakCS technology & community
6. Riak CS
is... storage
enterprise cloud
built g
in S3-compatibility
on top fe r
of o f
multi-tenancy
Riak per user reporting
large object storage
7. Enabling you to host your own
PUBLIC &
PRIVATE CLOUDS
or….
Reliable Storage Behind Apps
8. Basho's Commits
@john_burwell 's contribution:
S3-backed secondary storage feature in 4.1.0
Uses S3 to sync secondary storage across zones
Long term: (shhhhhh!)
Native S3 Support
Federated authentication and authorization
9. DataPipe
blog.datapipe.com/datapipe-cloudstack
“Riak CS provides the high-performance,
distributed datastore
we need to deliver a sound foundation for
our cloud storage needs now
and for many years into the future”
- Ed Laczynski, VP Cloud Strategy, Datapipe.
10. Yahoo!
“Today, Yahoo! leverages Riak CS Enterprise to offer an
S3-compatible public cloud storage service,
as well as dedicated hosting options ...
Yahoo! is highly supportive of open source software
and we view Basho’s (OSS) announcement as
a positive move that will work
to accelerate its ability to innovate
and ultimately strengthen our cloud platform.”
- Shingo Saito, cloud product manager, Yahoo!
12. Riak
Dynamo-inspired key/value store
Written in Erlang with C/C++
Open source under Apache 2 license
Thousands of production deployments
13. Riak
High availability
Low-latency
Horizontal scalability
Fault-tolerance
Ops friendliness
14. Riak
Masterless
• No master/slave or different roles
• All nodes are equal
• Write availability and scalability
• All nodes can accept/route requests
15. Riak
No Sharding
• Consistent hashing
• Prevents “hot spots”
• Lowers operational burden of scale
• Data rebalanced automatically
16. Riak
Availability and Fault-Tolerance
• Automatically replicates data
• Read and write data during hardware
failure and network partition
• Hinted handoff
20. Large Object 1. User uploads an
object
S3 Reporting S3 Reporting S3 Reporting S3 Reporting S3 Reporting
API API API API API API API API API API
Riak CS Riak CS Riak CS Riak CS Riak CS
1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB
2. Riak CS
3. Riak CS Riak breaks object
streams chunks
Node
into 1 MB chunks
to Riak nodes Riak Riak
Node Node
Riak Riak
4. Riak replicates
Node Node and stores chunks
21. IC S
S T
BA EP
C
CON
USERS
multi-tenancy:
Riak CS will track
individual usage/stats
users identified by users authenticated by
access_key secret_key
22. IC S
S T
BA EP
C
CON
BUCKETS
users create buckets.
buckets are like folders.
store objects in buckets.
names are globally unique.
23. IC S
S T
BA EP
C
CON
OBJECTS
stored in buckets.
objects are opaque.
store any file type.
25. Riak CS
Large Object Support
• Started with 5GB / object
• Now have multipart upload
• Content agnostic
26. Riak CS
S3-Compatible API
• Use existing S3 libraries and tools
• RESTful operations
• Multipart upload
• S3-style ACLs for object/bucket
permissions
• S3 authentication scheme
27. Riak CS
Administration and Users
• Interface for user creation, deletion,
and credentials
• Configure so only admins can create
users
28. Riak CS
New Stuff in Riak 1.3
• Multipart upload: parts between 5MB
and 5GB
• Support for GET range queries
• Restrict access to buckets based on
source IP
34. THE
“USAGE”
BUCKET
TRACK INDIVIDUAL USER’S
ACCESS STORAGE
35. QUERY USAGE STATS
Storage and access statistics tracked on
per-user basis, as rollups for slices of time
•Operations, Count, BytesIn,
BytesOut, + system and user
error
•Objects, Bytes
37. Multi-Datacenter Replication
• For active backups, availability zones,
disaster recovery, global traffic
• Real-time or full-sync
• 24/7 support
• Per-node or storage-based pricing
38. SIGN UP FOR AN
ENTERPRISE DEVELOPER
TRIAL
basho.com
http://docs.basho.com/
Very high level discussion, segue into brief discussion of Riak
What you get is a platform on which you can host your own public and private clouds.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
a Riak CS stack is composed of 3 critical components. Riak CS exposes an API to the users and is responsible for logging/tracking stats. All the data is stored in and retrieved from Riak. Run multiple instances of Riak and Riak CS for scale. Theres a third component, a single instance of a piece of software called stanchion that is responsible for tying it all together. Stanchion in essence provides the S3-like behavior at an architectural level, ensures user and bucket uniqueness globally, etc....
1-to-1 pairing, and why.
1. user PUTs object into Riak CS. The request will be via an S3 API and signed by their credentials. 2. once authenticated, object is chunked (remind why this is important) 3. as object is chunked, chunks sent to Riak. (you can use haproxy in the middle here) 4. Riak stores the chunks, yay!
Riak CS is multi-tenant. Each user is assigned an access_key and a secret_key. Users are authenticated by the system by signing requests using a combination of both keys. If the keys are valid, the requests will be allowed; else, denied. User details stored in “ user ” bucket, identified by access_key. Furthermore, every user ’ s activity will be tracked by Riak CS and stored for billing/metering purposes(more later)
Objects are stored in buckets. Users ’ s can create and remove buckets as well as list their contents. Buckets are essentially a namespace, and are very much like folders. Bucket names must be globally unique, so if you have two users both try to create a bucket named “ kittens ” , whoever creates that bucket first will own it, etc.
Put objects in buckets. Objects are chunked and replicated, but that all happens behind the scenes and not exposed to the user.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
Riak CS provides stats on user activity and total cluster operations, as well as ships with DTrace probes you can use to inspect/debug a live system. So at any given time you can monitor a Riak CS cluster for both expected behavior and anomalies. From an administrative perspective, (as mentioned earlier) Riak CS will track each individual user ’ s activity, so that you can define usage limits and billing policies if necessary.
Riak CS, just like Riak, uses Boundary ’ s Folsom stats library for monitoring cluster operations. These start when Riak CS starts, are not persisted to disk. Get stats with an HTTP request to /riak-cs/stats. You ’ ll get back counters and histograms that track the total number of operations performed on blocks, buckets and objects. For instance, see the total number of GET or PUT operations on objects in the Riak CS cluster. These stats are going to be most useful if you ’ re trying to diagnose unexpected behavior. Hopefully that ’ s never the case, but shit happens.
RiakCS has a reserved namespace for tracking user activity. This is the “ usage ” bucket and is the foundation for metering and building custom billing policies in Riak CS. Every time a user performs an operation, RiakCS will store this data in an object in the usage bucket identified by that users ’ s access_key. You can configure the frequency with which these reports are persisted as well as the ability for user ’ s to request their own usage statistics.