From the StampedeCon 2015 Big Data Conference: There is an adage, “If you fail to plan, you plan to fail” . When developing systems the adage can be taken a step further, “If you fail to plan FOR FAILURE, you plan to fail”. At Huffington post data moves between a number of systems to provide statistics for our technical, business, and editorial teams. Due to the mission-critical nature of our data, considerable effort is spent building resiliency into processes.
This talk will focus on designing for failure. Some material will focus understanding the traits of specific distributed systems such as message queues or NoSQL databases and what are the consequences for different types of failures. While other parts of the presentation will focus on how systems and software can be designed to make re-processing batch data simple, or how to determine what failure mode semantics are important for a real time event processing system.
Resilience: the key requirement of a [big] [data] architecture - StampedeCon 2015
1. 1
Resilience: The key requirement of a [big]
[data] architecture
Let 'em know
@HuffPostCode @edwardcapriolo
2. 2
About me
Data Architech @ Huffpo
Apache Hive
Commiter/PMC
Author: Programming Hive
− 2nd edition comming. Save
up!
Husband & dad
Crazed inventor:
github.com/edwardcapriolo
3. 3
Huffingtonpost & Me
What is huffingtonpost?
− News, blogs, and video
− Desktop and mobile
− Multiple editions worldwide
What do I do there?
− Provide APIs, dashboards, reports
− Crunch BigData using uber tech
− Say no to bad tech decisions
via 'ed says no' meme
4. 4
For the next hour...
I am going to present
that everything I have
designed and use is
perfect and it never
breaks!
5. 5
Reality check: Things break
all the time.
Anomalous cloud outages
External software bugs
Internal software bugs
− Aka. Anomalous cloud outage in
post mortem
Fat fingers
Preventable failures
6. 6
To be resilient, design a system
that causes minimal panic
when something does break
7. 7
What does Resilience
not sound like?
'HADOOP IS DOWN'
− “We are losing data!
Call OPS!”
“One of the 10
NoSQL nodes is
down”
− “Users are seeing
inaccurate numbers,
and request are
failing!”
Why is this bad?
8. 8
What does Resilience sound like?
'Hadoop is down'
− No problem. The
process loading
hadoop can queue
messages for up to
40 hours
'One of the 10
NoSQL nodes is
down'
− No Problem. We can
tolerate multiple node
failures with minimal
9. 9
Agenda
Software stacks (especially our 'Fright stack')
Planning for building a resilient service
Redundancy
Component Overview
Case study: Building the Lifetime API
Questions
11. 11
Dont be scurred!
Compontents are named after horror movies
Batch & Realtime aka 'Lamb Duh' architecture
− Accomplish lower hanging fruit in real-time
− Expensive/complex processing in batch
Designed for throughput
Designed for horizontal scale
Less is more
12. 12
Components of streaming stack
Kafka : The strong silent type
− Persistent, distribued, commit log
− Massive Throughput without GC issues at scale
Cassandra : In my Column Family
− Cells : Columns that hold values (last update wins)
− Counters : Columns that support Increment
operation
Teknek: KISS stream platform
− No Single Point of Failure
− Simple Take data off Feed apply function
13. 13
Key compoents of batch stack
Hive / Hadoop : The big hammer
− SQL on Hadoop
− Flexibility of formats
− UDF / Streaming
− MetaStore
Impala: The scalpel
− Interactive speeds for reasonable datasets
− Avoids of having to bulk load into OLAP datastore
18. 18
After the initial excitement
of X, everyone:
Expexts someone else to manage X
Is more excited to work with Y
Will preach to you that X is a backwards
technology holding everything back
− Even though they were a staunch advocate for X
months ago
Everyone includes you
Source:
http://www.chrisunderwoodsblog.com/2014/0
1/new-deal-trough-or-plateau.html
19. 19
Planning the service life-cycle
Build a playbook of setup/administration tasks
Get multiple groups buy in
Determine who carries out schema changes,
planning upgrades, etc
Build monitoring and determine escalations
20. 20
Performance demands
on the service
Acceptable performance
− Request latenty
− 99th percentile
− Job time
Requests per second
Storage requirements
Acceptable caching/delay
24. 24
What redundency does for you
Less chance that single event causes panic
Less manuals/wikis about what to do if...
Less user facing issues
More peace of mind
Availability of N services
Active/Passive is old school
Active/Active/Active + scalable is hip
25. 25
Do not agile your redundancy
Be very afraid if someone tries to convince of
anything that sounds like this:
− For MVP we do not need Namenode HA. We can
get it running now and add the HA later.
− For MVP we need to get solution X working. We
can worry about scaling it later.
− For MVP it does not have to respond quickly. We
won't have much load and can speed it up
later.
28. 28
Criteria for software selection
Initial setup and ongoing administration
General Utility (duct tape vs star screwdriver)
'Web Scale' design effort
Customizable/pluggable
No 'at scale' gotchas
Insane specialty superpower
29. 29
Apache Kafka
Replication: set per topic (2)
Scale: Partitions dictate clients (10, 100)
Durability: sync vs async producers
Idempotence: Messages persisted to disk
Idempotence: Messages are multiplexed
Performance: Insane throughput
30. 30
Message Queues
without persistence
Producer might be too fast for consumers and
messages are dropped
− You would need + 100% capacity to safely deal
with all surges
Consumer crash results in dropped message
− You cannot stop for anything, not even an update,
without loosing data
31. 31
Apache Kafka
with persistence
Can handle traffic surges
Can safely queue data for upgrades
− Disk is cheap
Can replay data (bad release/backfill)
Multiplex data to multiple consumer groups
32. 32
Apache Cassandra
Replication: At the keyspace level (3)
Redundant: No Single Point of Failure
Durability: Self healing with quorum
Idempotence: Cell writes
Idempotence: Compare and Swap
Performance: Lightning fast writes
33. 33
Cassandra @ Work
Counters and Column Family can model a
good number of low latency stats problems
BatchMutations and stream save round trips
Clients do not need shard awareness
Masterless design ideal for high availability
To read it you have to be able to write it first
34. 34
Apache Hadoop
Replication: Set per file (3)
Scale: Storage is incremental
Durability: Limit semantics
Performance: Typically brute force
Tuning: Too many tunes
Redundancy: Too many parts
36. 36
Lifetime API
Result data per entry
− GET /api/lifetime/5656
− { „views“: 45454545, „clicks“: 343434 }
Provide the total lifetime sum
− Views
− Facebook shares
− Etc
Also provide 28 day counts
37. 37
Planning
Acceptable performance
− Used in edit dashboards via web service call
Request per second
− Hundreds to thousands
Storage requirements
− Single value for each column *
Freshness
− Update hourly
38. 38
Previous Vertica implementation
Does some queries sick fast
Enforces primary key on read
− If your double insert, later reads fail
Query slots limiting (OLAP)
Many projections can be problematic
Updates and deletes are PITA
Stonebreaker and I have beef
− http://www.edwardcapriolo.com/roller/edwardcapri
olo/entry/hadoop_is_the_best_thing
39. 39
Let's NoSQL it!
Design for the read path
− Only fetch one entry at a time
Fetch entire history
Fetch last 28 days
Many entries have short shelf life
Do not store a single value, store a by-day
timeline instead!
40. 40
Data modeling:
'Fixed' columns by day
Key = Entry:5555
− [2015-09-01:Views] = 30
− [2015-09-01:Clicks] = 10
− [2015-09-02:Views] = 2
Sparse data
Ordered by time
− Allows us to efficiently ask for ranges of data
41. 41
Data modeling:
Dealing with $hipforaday
social networks
Key = Entry:5555
− [2015-09-01:networks/zintrest/zshares] = 22
− [2015-09-01:networks/zintrest/zlikes] = 10
− [2015-09-01:networks/dug/dougs] = 2
Two level dynamic:
− network/type
− True old schoolers mash strings bra
Schema-less is eloquent with the social
networks!
− Explain ire of schema and social networks
43. 43
Data Modeling: Multiple granularity
in same row with TTL
Compute daily data once a day
Houlry data with time-to-live during the day
Entry:5555
− [2015-09-01-01:Views] = 30 *ttl 24 hours
− [2015-09-01-01:Clicks] = 10 *ttl 24 hours
− [2015-09-01-02:Views] = 2 *ttl 24 hours
API needs some intelligence not to count
hourly data if the daily column exists
− Could have named these columns so that they
always appear at the beginning or end of the data
44. 44
Compute in batch write to NoSQL
Hive Queries from
scheduler produce
hourly data
Hour data
aggregated into day
table
TheRing: HCat API
[table] -> Cassandra
45. 45
Results
Entry data divided
evenly across cluster
Survive multiple node
failures
API sums data on
read path
Horizontally scalable
http://sparkletechthoughts.blogspot.com/2013/03/how-to-setup-cassandra-cluster-using.html
46. 46
How Resilient is this service?
Hourly/Daily processing can easily be re-run
Bulk loading cells is idempotent
NoSQL (Cassandra) has fault tollerance
NoSQL can take massive load
API server is stateless easily load balanced