Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
1. Dyn + DataStax: Helping Companies Deliver
Exceptional End-User Experience
May 17, 2016
Tim Chadwick, Principal Engineer, Infrastructure, Dyn
Rick Bross, Principal Engineer, Scalability, Dyn
2. The Story at Dyn
The Road to Production
Lessons and Direction
Journey to DataStax Enterprise
3. The Story at Dyn
Dyn is a cloud-based
Internet Performance Management (IPM)
company that provides unrivaled visibility and
control into cloud and public Internet resources.
Dyn’s platform monitors, controls and optimizes
applications and infrastructure through Data,
Analytics, and Traffic Steering, ensuring traffic
gets delivered faster, safer, and more reliably
than ever.
http://techcrunch.com/2016/05/10/dyn-series-b/
6. tchadwick@piedmont:~$ dig SOA ifc.com | grep -A 1 "ANSWER SECT"
;; ANSWER SECTION:
ifc.com. 7175 IN
SOA ns1.p28.dynect.net.
postmaster.ifc.com. 2016042900 3600 600 604800 1800Build a sustainable system that
can track usage by customer
and zone (domain).
The consumers are our
customers, our billing department,
and Chris Baker.
Who Needs These Data?
7. For each five minute interval of an
invoice period, determine the Queries
per Second (QPS) and sort in
descending order.
Discard the top 5%, and it is the
maximum value remaining which is a
customer’s 95th Percentile, or
monthly bill rate.
http://dyn.com/blog/the-95th-percentile-burstable-billing-model-managed-dns/
https://en.wikipedia.org/wiki/Burstable_billing#95th_percentile
Traffic Telemetry
8. 1. Operations
-Flexible Topology
-Resilient Clusters
-Visibility and Administration
2. Data Model
-Idempotent Writes
-Low Concurrency
-Application Redundancy
Oh, and it must perform well.
Benchmarking Cassandra
Scalability on AWS
Over a million writes per second
Priorities that Led to DataStax Enterprise
16. • Customers
• Zones (Domains)
• Zone Record Types
• Fully Qualified Domain Names
(qnames)
• Regions (ANYCAST)
• Data Centers
• Nameservers
• “Top 10s”
Many, many more customers.
Many, many more dimensions.
I Want More From You....
20. How DataStax Enterprise Provided Value
● Support in Every Phase
○ Proof of Concept
○ Design
○ Operations
○ Optimization
● Integrated Toolkit
○ OpsCenter
○ SPARK
We get the value of many, many
people at the cost of about 1/2
FTE.
22. Top Lessons Learned
1. Include all teams in planning, deployment and implementation.
2. Consult knowledgeable people before making decisions and “optimizations”.
3. Understand compaction strategies to immediately eliminate those that are not a fit.
4. Ensure that client load balancing policies and consistency levels match DC
topology and schema replication factors.
5. Model and understand all failure scenarios.
6. Use Spark to aggregate data in order to save storage and improve performance.
23. #1: Include all teams
● Product management
● Application engineering
● DBAs
● Operations
● Network engineering
● System engineering
● Finance and Management
24. #2: Consult knowledgeable people . . .
● Schema
● Cluster topology and tuning
● Tuning
● Compaction algorithms
● Client interaction
Talk to Datastax! They’ve probably seen it before!
25. #3: Understand Compaction Strategies!
DTCS was our first choice. It didn’t work . . . .
Tim Goodaire September 02, 2015 17:10
We have changed the compaction strategy,
concurrent_compactors, compaction_throughput, and
heap size. It took a while for the cluster to complete
the compactions, but it's done now. The cluster is up
and appears to be healthy.
Today, we've been adding a few more nodes and
resetting the heap size back to 8 GB.
26. #4: Ensure client and cluster settings match
Load balancing policies, read and write consistency, schema replication
factor, cluster topology . . .
27. #5: Model Failure Scenarios
What happens when a node fails? Two? The DC?
Will the client fail? How will queries be satisfied?
28. 700 rows for a single 5 minute interval
Daily billing went from 14 hours, to 2 hours on DSE/C*, and 12 minutes with DSE/SPARK
#6: Use DSE Spark to aggregate
20 rows for an hour interval
31. Coming Soon!
● June 8: How to Half Hour - Building Data Pipelines with SMACK: Storage
Strategy using Cassandra and DSE
● July 6: How to Half Hour - Building Data Pipelines with SMACK: Analyzing
Data with Spark
● For the latest schedule of webinars, check out our Webinars page:
http://www.datastax.com/resources/webinars.
33. Client
● Client cluster and session object configuration
○ Cluster seeds (DCAwareRoundRobinPolicy
implications)
○ Other load balancing policies to wrap
○ Read and write consistency setting
○ # connections per host
○ # requests per connection
○ Pool timeout
● Client query settings
○ Read and write consistency (may override default
for specific query)
○ Batches (rarely if ever should be used)
○ Stored procedures (usually best practice for groups
of queries - ex. we use for high velocity inserts)
○ Sync or Async? Depends on the specific query, but
usually best practice with stored procedures.
○ Write with a consistent TTL per table.
○ How many threads should share the client session
object? We’ve found that balancing the DC
capabilities, client latency, and a (native) thread
pool turbocharges inserts.
Cassandra Cluster
● Network topology
○ Colocated latency? Inter DC latency?
○ Replication factor per DC per schema
● Schema
○ Don’t mix schemas with different use cases!
○ Dyn’s usage pattern
■ Optimize INSERTs.
■ Ensure READs succeed.
■ Avoid UPDATEs (“out of order” TTLs)
■ Ban DELETEs (turn off the repair service)
○ Attempt to have all (voluminous) tables use the
same compaction strategy.
○ Use consistent TTLs for writes. If you override
the default, always override with the same value.
● Compaction algorithms
○ With ✓ time series data, ✓ no deletes, ✓ no
updates and ✓ consistent TTLs: you can use
DTCS, which will simply drop old sstables.
Client/Cluster Settings - Must Work Together!