3. Scenario
• Situation:
– You have hundreds of services producing logs in a datacenter .
– They produce a lot of logs that you want to analyzed
– You have Hadoop, a system for processing large volumes of data.
• Problem:
– How do I reliably ship all my logs to a place that Hadoop can analyze
them?
7/15/2010 3
4. Use cases
• Collecting logs from nodes in
Hadoop cluster
• Collecting logs from services such
as httpd, mail, etc.
• Collecting impressions from
custom apps for an ad network
• But wait, there’s more!
– Basic metrics of available It’s log, log .. Everyone wants a log!
– Basic online in-stream analysis
7/15/2010 4
6. You need a “Flume”
• Flume is a distributed system that gets
your logs from their source and
aggregates them to where you want to
process them.
• Open source, Apache v2.0 License
• Goals:
– Reliability
– Scalability
– Extensibility
– Manageability
Columbia Gorge, Broughton Log Flume
7/15/2010 6
7. Key abstractions
• Data path and control path Agent
• Nodes are in the data path
– Nodes have a source and a sink
Collector
– They can take different roles
• A typical topology has agent nodes and collector nodes.
• Optionally it has processor nodes.
• Masters are in the control path. Master
– Centralized point of configuration.
– Specify sources and sinks
– Can control flows of data between nodes
– Use one master or use many with a ZK-backed quorum
7/15/2010 7
10. Outline
• What is Flume?
– Goals and architecture
• Reliability
– Fault-tolerance and High availability
• Scalability
– Horizontal scalability of all nodes and masters
• Extensibility
– Unix principle, all kinds of data, all kinds of sources, all kinds of sinks
• Manageability
– Centralized management supporting dynamic reconfiguration
7/15/2010 10
11. RELIABILITY
The logs will still get there.
7/15/2010 11
12. Failures
• Faults can happen at many levels
– Software applications can fail
– Machines can fail
– Networking gear can fail
– Excessive networking congestion or machine load
– A node goes down for maintenance.
• How do we make sure that events make it to a permanent store?
7/15/2010 12
13. Tunable data reliability levels
• Best effort
– Fire and forget Agent Collector HDFS
• Store on failure + retry
– Local acks, local errors Agent Collector HDFS
detectable
– Failover when faults detected.
• End to end reliability Agent Collector
– End to end acks HDFS
– Data survives compound failures,
and may be retried multiple
times
7/15/2010 13
14. Dealing with Agent failures
• We do not want to lose data
• Make events durable at the generation point.
– If a log generator goes down, it is not generating logs.
– If the event generation point fails and recovers, data will reach the end
point
• Data is durable and survive if machines crashes and reboots
– Allows for synchronous writes in log generating applications.
• Watchdog program to restart agent if it fails.
7/15/2010 14
15. Dealing with Collector Failures
• Data is durable at the agent:
– Minimize the amount of state and possible data loss
– Not necessary to durably keep intermediate state at collector
– Retry if collector goes down.
• Use hot failover so agents can use alternate paths:
– Master predetermines failovers to load balance when collectors go down.
7/15/2010 15
16. Master Service Failures
• An master machine should not be the single point of failure!
• Masters keep two kinds of information:
• Configuration information (node/flow configuration)
– Kept in ZooKeeper ensemble for persistent, highly available metadata store
– Failures easily recovered from
• Ephemeral information (heartbeat info, acks, metrics reports)
– Kept in memory
– Failures will lose data
– This information can be lazily replicated
7/15/2010 16
19. Data path is horizontally scalable
Agent
Agent Collector HDFS
Agent
Agent
• Add collectors to increase availability and to handle more data
– Assumes a single agent will not dominate a collector
– Fewer connections to HDFS.
– Larger more efficient writes to HDFS.
• Agents have mechanisms for machine resource tradeoffs
• Write log locally to avoid collector disk IO bottleneck and catastrophic failures
• Compression and batching (trade cpu for network)
• Push computation into the event collection pipeline (balance IO, Mem, and CPU
resource bottlenecks)
7/15/2010 19
20. Load balancing
Agent
Agent Collector
Agent
Agent Collector
Agent Collector
Agent
• Agents are logically partitioned and send to different collectors
• Use randomization to pre-specify failovers when many collectors
exist
• Spread load if a collector goes down.
• Spread load if new collectors added to the system.
7/15/2010 20
21. Load balancing
Agent
Agent Collector
Agent
Agent Collector
Agent Collector
Agent
• Agents are logically partitioned and send to different collectors
• Use randomization to pre-specify failovers when many collectors
exist
• Spread load if a collector goes down.
• Spread load if new collectors added to the system.
7/15/2010 21
22. Control plane is horizontally scalable
Node Master ZK1
Node Master ZK2
Node Master ZK3
• A master controls dynamic configurations of nodes
– Uses consensus protocol to keep state consistent
– Scales well for configuration reads
– Allows for adaptive repartitioning in the future
• Nodes can talk to any master.
• Masters can talk to an ZK member
7/15/2010 22
23. Control plane is horizontally scalable
Node Master ZK1
Node Master ZK2
Node Master ZK3
• A master controls dynamic configurations of nodes
– Uses consensus protocol to keep state consistent
– Scales well for configuration reads
– Allows for adaptive repartitioning in the future
• Nodes can talk to any master.
• Masters can talk to an ZK member
7/15/2010 23
24. Control plane is horizontally scalable
Node Master ZK1
Node Master ZK2
Node Master ZK3
• A master controls dynamic configurations of nodes
– Uses consensus protocol to keep state consistent
– Scales well for configuration reads
– Allows for adaptive repartitioning in the future
• Nodes can talk to any master.
• Masters can talk to an ZK member
7/15/2010 24
25. EXTENSIBILITY
Turn raw logs into something useful…
7/15/2010 25
26. Flume is easy to extend
• Simple source and sink APIs
– Event granularity streaming design
– Have many simple operations and compose for complex behavior.
• End-to-end principle
– Put smarts and state at the end points. Keep the middle simple.
• Flume deals with reliability.
– Just add a new source or add a new sink and Flume has primitives to deal
with reliability
7/15/2010 26
27. Variety of Data sources
• Can deal with push and pull sources. push
Agent
• Supports many legacy event sources
– Tailing a file poll
– Output from periodically Exec’ed program
App Agent
– Syslog, Syslog-ng
– Experimental: IRC / Twitter / Scribe / AMQP embed
App
Agent
7/15/2010 27
28. Variety of Data output
• Send data to many sinks
– Files, Hdfs, Console, RPC
– Experimental: hbase, voldemort, s3, etc..
• Supports an extensible variety of outputs formats and destinations
– Output to language neutral and open data formats (json, avro, text)
– Compressed output files in development
• Uses decorators to process event data in flight.
– Sampling, attribute extraction, filtering, projection, checksumming,
batching, wire compression, etc..
7/15/2010 28
30. Centralized data flow management
• One place to specify node sources, sinks and data flows.
– Simply specify the role of the node: collector, agent
– Or specify a custom configuration for a node
• Control Interfaces:
– Flume Shell
– Basic web
– HUE + Flume Manager App (Enterprise users)
7/15/2010 30
31. Output bucketing
Collector /logs/web/2010/0715/1200/data-xxx.txt
/logs/web/2010/0715/1200/data-xxy.txt
/logs/web/2010/0715/1300/data-xxx.txt
HDFS /logs/web/2010/0715/1300/data-xxy.txt
/logs/web/2010/0715/1400/data-xxx.txt
Collector …
node : collectorSource | collectorSink
(“hdfs://namenode/logs/web/%Y/%m%d/%H00”, “data”)
• Automatic output file management
– Write hdfs files in over time based tags
7/15/2010 31
32. Simplified configurations
• To make configuring flume nodes higher level, we use logical
nodes.
– The Flume node process is a physical node
– Each Flume node process can host multiple logical nodes
• Allows for:
– Reduces the amount of detail required in configurations.
– Reduces management process-centric management overhead
– Allows for finer-grained resource control and isolation with flows
7/15/2010 32
33. Flow Isolation
Agent
Agent Collector
Agent
Agent Collector
Agent
Agent Collector
• Isolate different kinds of data when and where it is generated
– Have multiple logical nodes on a machine
– Each has their own data source
– Each has their own data sink
7/15/2010 33
34. Flow Isolation
Agent
Agent Collector
Agent
Agent Collector
Agent
Agent Collector
• Isolate different kinds of data when it is generated
– Have multiple logical nodes on a machine
– Each has their own data source
– Each has their own data sink
7/15/2010 34
35. For advanced users
• A concise and precise configuration language for specifying
arbitrary data paths.
– Dataflows are essentially DAGs
– Control specific event flows
• Enable durability mechanism and failover mechanisms
• Tune the parameters these mechanisms
– Dynamic updates of configurations
• Allows for live failover changes
• Allows for handling newly provisioned machines
• Allows for changing analytics
7/15/2010 35
37. Summary
• Flume is a distributed, reliable, scalable, system for collecting and
delivering high-volume continuous event data such as logs
– Tunable data reliability levels for day
– Reliable master backed by ZK
– Write data to HDFS into buckets ready for batch processing
– Dynamically configurable node
– Simplified automated management for agent+collector topologies
• Open Source Apache v2.0.
7/15/2010 37