SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
March 2012
Apache Flume (NG)
Alexander Lorenz | Customer Operations Engineer
Overview

    • Stream data (events, not files) from clients
      to sinks
    • Clients: files, syslog, avro, …
    • Sinks: HDFS files, HBase, …
    • Configurable reliability levels
      – Best effort: “Fast and loose”
      – Guaranteed delivery: “Deliver no matter what”
    • Configurable routing / topology

2                                    ©2012
                       Cloudera, Inc. All Rights Reserved.
Architecture
    Component   Function

    Agent       The JVM running Flume. One per machine. Runs
                many sources and sinks.
    Client      Produces data in the form of events. Runs in a
                separate thread.
    Sink        Receives events from a channel. Runs in a separate
                thread.
    Channel     Connects sources to sinks (like a queue).
                Implements the reliability semantics.
    Event       A single datum; a log record, an avro object, etc.
                Normally around ~4KB.




3                                 ©2012
                    Cloudera, Inc. All Rights Reserved.
Agent

    • Runs many clients and sinks
    • Java properties-based configuration
    • Low overhead (-Xmx20m)
      – But adding RAM increases performance




4                                   ©2012
                      Cloudera, Inc. All Rights Reserved.
Sources

    • Plugin interface
    • Managed by a SourceRunner that controls
      threading and execution model (e.g. polling
      vs. event-based)
    • Included: exec, avro, syslog, …




5                                   ©2012
                      Cloudera, Inc. All Rights Reserved.
Sources
    public class MySource implements PollableSource {
      public Status process() {
        // Do something to create an Event..
        Event e = EventBuilder.withBody(…).build();
        // A channel instance is injected by Flume.
        Transaction tx = channel.getTransaction();
        tx.begin();
        try {
          channel.put(e);
          tx.commit();
        } catch (ChannelException ex) {
          tx.rollback();
          return Status.BACKOFF;
        } finally {
          tx.close();
        }
        return Status.READY;
      }
    }




6                                          ©2012
                             Cloudera, Inc. All Rights Reserved.
Channel

    •   Plugin interface
    •   Transactional
    •   Provide queuing between source / sink
    •   Provide reliability semantics
        – MemoryChannel: Basically a Java
          BlockingQueue.
        – JDBC
        – WAL


7                                     ©2012
                        Cloudera, Inc. All Rights Reserved.
Sinks

    • Plugin interface
    • Managed by a SinkRunner that controls
      threading and execution model
    • Included: HDFS files (various formats)




8                                  ©2012
                     Cloudera, Inc. All Rights Reserved.
Sinks Code
    public class MySink implements PollableSink {
      public Status process() {
        Transaction tx = channel.getTransaction();
        tx.begin();
        try {
          Event e = channel.take();
          if (e != null) {
            // …
            tx.commit();
          } else {
            return Status.BACKOFF;
          }
        } catch (ChannelException ex) {
          tx.rollback();
          return Status.BACKOFF;
        } finally {
          tx.close();
        }
        return Status.READY;
      }
    }




9                                          ©2012
                             Cloudera, Inc. All Rights Reserved.
Tiered collection

 • Send events from agents to another tier of
   agents to aggregate
 • Use an Avro sink (really just a client) to
   send events to an Avro source (really just a
   server) in another machine
 • Failover supported
 • Load balancing (soon)
 • Transactions guarantee handoff

10                               ©2012
                   Cloudera, Inc. All Rights Reserved.
Tiered collection – The handoff

 •   Agent 1: Tx begin
 •   Agent 1: Channel take event
 •   Agent 1: Sink send
 •   Agent 2: Tx begin
 •   Agent 2: Channel put
 •   Agent 2: Tx commit, respond OK
 •   Agent 1: Tx commit (or rollback)


11                                 ©2012
                     Cloudera, Inc. All Rights Reserved.
Configuration

 • done in a single file
 • identifier for flows
 • <identifier>.type.subtype.parameter.config,
   where <identifier> is the name of the agent
 • flow has a client, type, channel, sink
 • can have multiple channels in one flow




12                                ©2012
                    Cloudera, Inc. All Rights Reserved.
Simple Configuration Example
   syslog-agent.sources = Syslog
syslog-agent.channels = MemoryChannel-1
syslog-agent.sinks = Console

syslog-agent.sources.Syslog.type = syslogTcp
syslog-agent.sources.Syslog.port = 5140

syslog-agent.sources.Syslog.channels =
MemoryChannel-1
syslog-agent.sinks.Console.channel = MemoryChannel-1

syslog-agent.sinks.Console.type = logger
syslog-agent.channels.MemoryChannel-1.type = memory



13                                 ©2012
                     Cloudera, Inc. All Rights Reserved.
HDFS Configuration Example
   syslog-agent.sources = Syslog
syslog-agent.channels = MemoryChannel-1
syslog-agent.sinks = HDFS-LAB

syslog-agent.sources.Syslog.type = syslogTcp
syslog-agent.sources.Syslog.port = 5140

syslog-agent.sources.Syslog.channels = MemoryChannel-1
syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1

syslog-agent.sinks.HDFS-LAB.type = hdfs

syslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://NN.URI:PORT/
flumetest/'%{host}''
syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfiles
syslog-agent.sinks.HDFS-LAB.hdfs.file.rollInterval = 60
syslog-agent.sinks.HDFS-LAB.hdfs.file.Type = SequenceFile
syslog-agent.channels.MemoryChannel-1.type = memory


 14                                     ©2012
                          Cloudera, Inc. All Rights Reserved.
Features

 •   Fan out: One source, many channels
 •   Fan in: Many sources, one channel
 •   Processors (aka: decorators)
 •   Auto-batching of events in RPCs…
 •   Multiplexing channels for datamining
 •   Avro implementation in both ways



15                                 ©2012
                     Cloudera, Inc. All Rights Reserved.
Thank You

 • Web: https://cwiki.apache.org/FLUME/
   getting-started.html
 • ML: flume-user@incubator.apache.org

 • Mail: alexander@cloudera.com
 • Blog: mapredit.blogspot.com
 • Twitter: @mapredit


16                              ©2012
                  Cloudera, Inc. All Rights Reserved.

Contenu connexe

Tendances

Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
DataWorks Summit
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
Swapnil Dubey
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
Jayesh Thakrar
 

Tendances (20)

Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutes
 
Apache flume
Apache flumeApache flume
Apache flume
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Flume intro-100715
Flume intro-100715Flume intro-100715
Flume intro-100715
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
 

Similaire à Apache Flume (NG)

Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
Praveen Gollakota
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps Approach
Akshaya Mahapatra
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
Docker, Inc.
 

Similaire à Apache Flume (NG) (20)

Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Play Framework
Play FrameworkPlay Framework
Play Framework
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
 
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud ManagementOracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
 
How we scale DroneCi on demand
How we scale DroneCi on demandHow we scale DroneCi on demand
How we scale DroneCi on demand
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Lessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networkingLessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networking
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
signalr
signalrsignalr
signalr
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps Approach
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and Californium
 
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
 
Weblogic
WeblogicWeblogic
Weblogic
 
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaPlay Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and Scala
 
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
 

Plus de Alexander Alten

BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
Alexander Alten
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
Alexander Alten
 

Plus de Alexander Alten (10)

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Apache Flume (NG)

  • 1. March 2012 Apache Flume (NG) Alexander Lorenz | Customer Operations Engineer
  • 2. Overview • Stream data (events, not files) from clients to sinks • Clients: files, syslog, avro, … • Sinks: HDFS files, HBase, … • Configurable reliability levels – Best effort: “Fast and loose” – Guaranteed delivery: “Deliver no matter what” • Configurable routing / topology 2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3. Architecture Component Function Agent The JVM running Flume. One per machine. Runs many sources and sinks. Client Produces data in the form of events. Runs in a separate thread. Sink Receives events from a channel. Runs in a separate thread. Channel Connects sources to sinks (like a queue). Implements the reliability semantics. Event A single datum; a log record, an avro object, etc. Normally around ~4KB. 3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4. Agent • Runs many clients and sinks • Java properties-based configuration • Low overhead (-Xmx20m) – But adding RAM increases performance 4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5. Sources • Plugin interface • Managed by a SourceRunner that controls threading and execution model (e.g. polling vs. event-based) • Included: exec, avro, syslog, … 5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6. Sources public class MySource implements PollableSource { public Status process() { // Do something to create an Event.. Event e = EventBuilder.withBody(…).build(); // A channel instance is injected by Flume. Transaction tx = channel.getTransaction(); tx.begin(); try { channel.put(e); tx.commit(); } catch (ChannelException ex) { tx.rollback(); return Status.BACKOFF; } finally { tx.close(); } return Status.READY; } } 6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7. Channel • Plugin interface • Transactional • Provide queuing between source / sink • Provide reliability semantics – MemoryChannel: Basically a Java BlockingQueue. – JDBC – WAL 7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8. Sinks • Plugin interface • Managed by a SinkRunner that controls threading and execution model • Included: HDFS files (various formats) 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9. Sinks Code public class MySink implements PollableSink { public Status process() { Transaction tx = channel.getTransaction(); tx.begin(); try { Event e = channel.take(); if (e != null) { // … tx.commit(); } else { return Status.BACKOFF; } } catch (ChannelException ex) { tx.rollback(); return Status.BACKOFF; } finally { tx.close(); } return Status.READY; } } 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10. Tiered collection • Send events from agents to another tier of agents to aggregate • Use an Avro sink (really just a client) to send events to an Avro source (really just a server) in another machine • Failover supported • Load balancing (soon) • Transactions guarantee handoff 10 ©2012 Cloudera, Inc. All Rights Reserved.
  • 11. Tiered collection – The handoff • Agent 1: Tx begin • Agent 1: Channel take event • Agent 1: Sink send • Agent 2: Tx begin • Agent 2: Channel put • Agent 2: Tx commit, respond OK • Agent 1: Tx commit (or rollback) 11 ©2012 Cloudera, Inc. All Rights Reserved.
  • 12. Configuration • done in a single file • identifier for flows • <identifier>.type.subtype.parameter.config, where <identifier> is the name of the agent • flow has a client, type, channel, sink • can have multiple channels in one flow 12 ©2012 Cloudera, Inc. All Rights Reserved.
  • 13. Simple Configuration Example syslog-agent.sources = Syslog syslog-agent.channels = MemoryChannel-1 syslog-agent.sinks = Console syslog-agent.sources.Syslog.type = syslogTcp syslog-agent.sources.Syslog.port = 5140 syslog-agent.sources.Syslog.channels = MemoryChannel-1 syslog-agent.sinks.Console.channel = MemoryChannel-1 syslog-agent.sinks.Console.type = logger syslog-agent.channels.MemoryChannel-1.type = memory 13 ©2012 Cloudera, Inc. All Rights Reserved.
  • 14. HDFS Configuration Example syslog-agent.sources = Syslog syslog-agent.channels = MemoryChannel-1 syslog-agent.sinks = HDFS-LAB syslog-agent.sources.Syslog.type = syslogTcp syslog-agent.sources.Syslog.port = 5140 syslog-agent.sources.Syslog.channels = MemoryChannel-1 syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1 syslog-agent.sinks.HDFS-LAB.type = hdfs syslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://NN.URI:PORT/ flumetest/'%{host}'' syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfiles syslog-agent.sinks.HDFS-LAB.hdfs.file.rollInterval = 60 syslog-agent.sinks.HDFS-LAB.hdfs.file.Type = SequenceFile syslog-agent.channels.MemoryChannel-1.type = memory 14 ©2012 Cloudera, Inc. All Rights Reserved.
  • 15. Features • Fan out: One source, many channels • Fan in: Many sources, one channel • Processors (aka: decorators) • Auto-batching of events in RPCs… • Multiplexing channels for datamining • Avro implementation in both ways 15 ©2012 Cloudera, Inc. All Rights Reserved.
  • 16. Thank You • Web: https://cwiki.apache.org/FLUME/ getting-started.html • ML: flume-user@incubator.apache.org • Mail: alexander@cloudera.com • Blog: mapredit.blogspot.com • Twitter: @mapredit 16 ©2012 Cloudera, Inc. All Rights Reserved.