4. ● Needed a data store:
o Scalable & highly available
o High throughput, low latency
o Netflix use case is active-active
● Master-slave storage engines:
o Do not support bi-directional replication
o Cannot withstand a Monkey attack
o Cannot easily perform maintenance
Problems & Observations
5. What is Dynomite?
● A framework that makes non-distributed data
stores, distributed.
Features: highly available, automatic failover, node
warmup, tunable consistency, backups/restores
6. Dynomite @ Netflix
● Running around 2.5 years in PROD
● 70 clusters (100% Y/Y)
● ~1000 nodes used by internal microservices
● Microservices based on Java, Python,
NodeJS
7. Pluggable Storage Engines
● Layer on top of a non-distributed key value data store
○ Peer-peer, Shared Nothing
○ Auto Sharding
○ Multi-datacenter
○ Linear scale
○ Replication(Encrypted)
○ Gossiping
8. Replication
● A client can connect to any node on
the Dynomite cluster when sending
requests.
o If node owns the data,
▪ data are written in local
data-store and
asynchronously replicated.
o If node does not own the data
▪ node acts as a coordinator
and sends the data in the
same rack & replicates to
other nodes in other racks
and DC.
9. ● Each rack contains one
copy of data, partitioned
across multiple nodes in
that rack
● Multiple Racks == Higher
Availability (HA)
Topology
10. Dynomite on the Cloud
Discovery Service
Insights (Metrics)
Continuous Delivery
Healthcheck
Backups & Restores
Dynomite Manager
RESP = Redis Serialization Protocol
REST/HTTP
12. Dyno Load Balancing
● Dyno client employs token
aware load balancing.
● Dyno client is aware of the
cluster topology of Dynomite
within the region,
can write to specific node
using consistent
hashing.
15. Netflix Data Benchmark for Redis
● Dynamically change the benchmark configurations,
○ perform tests along with our production microservices.
● Be able to integrate with platform cloud services
○ dynamic configurations, discovery, metrics, etc.
● Run for an infinite duration in order to introduce failure scenarios
● Provide pluggable patterns and loads.
● Support different client APIs.
● Deploy, manage and monitor multiple instances from a single entry point.
18. Netflix Data Explorer - Dynomite
● Exploring Netflix Data Sources
● Providing a UI for Dynomite and Redis
19. Netflix Data Explorer - Use Cases
Netflix needed a client to satisfy the following requirements:
● Support Redis API
● Avoid blocking calls (e.g. Redis KEYS *)
● UI needs to scale to millions of keys
● Customizable UI
● Ability to share UI components amongst Netflix projects
● Provide extensive logging for audit trail purposes
28. Orchestration - Use Cases
● Content Ingest & Delivery
● Title Setup
● Studio Deliveries
● Content Quality Checks
● Content Localization
29. Once Upon A Time...
● Peer to Peer Messaging
● 10’s MM messages per day
● Process flows embedded in applications
● Lack of control (STOP deployment!)
● Lack of visibility into progress
30. Peer to Peer
Application C Application BApplication BApplication A
Request Content Content Inspection Result Encode Publish
Events / API calls Events / API calls Events / API calls
31. Peer to Peer
Application C Application BApplication BApplication A
Request Content Content Inspection Result Encode Publish
Events / API calls Events / API calls Events / API calls
● Logical flow is not easily trackable
● Modifying steps is not easy (tightly coupled)
● Controlling flow is not possible
● Reusing tasks is not trivial
32. Conductor
● BYO Task (Reuse existing code)
● REST/HTTP support
● Extensible and Hackable
● JSON based DSL to define blueprint
● Scale Out Horizontally
● Visibility, Traceability & Control
● UI to monitor and manage workflows (node.js/react)
33. Same Flow - New Flavor
Request
Content
Content
Inspection
Result Encode PublishStart
Stop
Conductor
Application A
Task
Request
Content
Application B
Task
Content
Inspection
Application C
Task
Encode
Application B
Task
Publish
OrchestrationExecution
35. High Level Architecture
API
Workflows Metadata Tasks
SERVICE
Workflow Service Task Service
Decider Service Queue Service
STORE
Storage (Redis /Dynomite)
Start and manage
workflows
Define blueprints
and tasks
Gets tasks from
queue and execute
Index (Elasticsearch)
36. Conductor - Scale
● Peer-to-Peer - Scale horizontally
● Stateless server - state is persisted in Redis
● Storage scalability : Dynomite
● Workload scale: Dyno-Queues
37. Storage Layer
● Dynomite
○ Generic Dynamo implementation (Redis, Memcache)
○ Multi-datacenter
○ Highly available
○ Peer-to-Peer
● Elasticsearch
○ Indexing workflow and task executions
○ Verbose logging of worker executions
38. Dyno-Queues
● Distributed lock free queues used by Conductor
● OSS
○ Apache 2.0 License
○ https://github.com/Netflix/dyno-queues
● Delayed Queues
● Loose priorities and FIFO
● Redis based
● At-least once delivery
40. Conductor @ Netflix
● In production > 1.5 year
● Used by Content Platform Engineering
○ Content Ingest & Encoding
○ Content Processing
● ~150 Process Flows & ~300 Tasks / Services
● 1+ MM Executions / Month
41. More information
● Dynomite Ecosystem:
o https://github.com/Netflix/dynomite
o https://github.com/Netflix/dyno
o https://github.com/Netflix/dyno-queues
o https://github.com/Netflix/dynomite-manager
● NDBench:
o https://github.com/Netflix/ndbench
● Conductor:
o https://github.com/Netflix/Conductor
● Chat:
o https://gitter.im/Netflix/dynomite
o https://gitter.im/Netflix/conductor
Notes de l'éditeur
By a show of hands, how many people have seen this? Great. we have 60-90 seconds before you ditch us and do something else.
Obviously the choice of content plays a big role, but just as important is a seamless user experience. Netflix has both.
Our job is to deliver that to the members and keep them happy and streaming.
The use cases for data caching range from session storage, managing viewing history, tracking bookmarks, managing playlist, ratings, and personalized recommendations to name a few.
Our business use case is to stream movies at any cost. Hence we moved from the SQL to Cassandra in order to have high availability.
We are very sensitive to 99th latencies,
Cassandra
Started Migrating to NoSQL
Quickly Became the Defacto standard for data storage
Scaled out Cassandra to reduce data per node and reduce latency
Definitely Not economical. Needed something in memory to meet the throughput and latency
Needed:
Typical deployment is in 3 data centers and 3 availability zones in each
Redis:
Kong exercises: Monkey, Gorilla and Kong
Two types of use cases : As a Cache and As a datastore
Use Master branch of github since that is the stable one and thats what we run in production
Fix errors in arrows - Minh has source :)
and rack names
All nodes know the topology in the system
Dynomite customer base was growing rapidly and introducing new users to Redis
Dynomite supports most native Redis commands
Users new to Redis might not follow best practices and perform KEYS *.
Scalable UI mandatory. Storing session data is a common use case.
Within the Netflix Cloud Engineering organization we have many projects where we try to share Web Components.
The UI leverages Polymer to build reusable Web Components that can be shared among other Netflix projects.
The Server is a Node.js Express app
Supports pluggable discovery modules
Netflix Discovery
Local Redis environment
File system based configuration
Supports pluggable Authentication using Passport.
Netflix uses Meechum for authentication.
Can be extended for other passport-based auth like facebook
Supports pluggable ACL modules
Currently integrates with other Netflix services to get access control information for all Dynomite clusters
Currently supports the Redis/Dynomite API
The list of visible clusters is restricted by user group membership
Hashes are great for representing objects
Commands for manipulating hashes: HMSET, HSET, HGET, HGETALL
As we mentioned, session storage is a common use case for Dynomite.
Using Redis’ TTL provides a convenient way to expire data.
App is deployed in multiple regions and multiple availability zones for resiliency
Wanted to consolidate logging across all app instances to provide an audit trail
FileBeat runs on each instance and ships logs to an ElasticSearch cluster which allows us to create a variety of dashboards using Kibana.