SlideShare une entreprise Scribd logo
1  sur  10
Télécharger pour lire hors ligne


Final Project


Apache Flume




Rapheephan Thongkham-Uan (Nancy)

cscie90 Cloud Computing
Harvard University Extension School
Prof. Zoran B. Djordjević

@TakeshiDemonkey

1
What is Apache Flume?
▪ Flume is distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of log data from
many different sources to a centralised data store. (http://
flume.apache.org/FlumeUserGuide.html)

!
!
!
!
!
!
!
▪ Currently available versions are 0.9.x and 1.x
▪ I want to focus on Flume use cases in manufacturing.

@Rapheephan

2
Applying Flume to Manufacturing Process
▪ In the factory, there are many machines used in the production.

!
!
!
!
!
▪ If a machine produces 1 log data file when 1 lot of product finishes
processing. In one day, there will be a big amount of log data stored
in the server.
▪ For the quality control and the production control improvement,
analysing these log files in a real time is our objective.
▪ First, we need to collect these log data files from the production
lines into the HDFS, then pass them through the analysis process.

@Rapheephan

3
Multi-agent flow image in the production system

AGENT 1

consolidation

AGENT 2

AGENT 4
HDFS

AGENT 3

@Rapheephan

4
My Sample
agent 1
CHANNEL

SOURCE

SINK

HDFS

▪ My system
▪ Java Runtime Environment (Java1.6.0_31)
▪ Cloudera's Distribution Including Apache Hadoop (CDH4.3)

▪ Working steps
1. Install Apache Flume on the Host machine
(Flume installation guide for CDH4: http://www.cloudera.com/content/cloudera-content/
cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_12.html)

2. Create 2 log generation java applications for machine1 and machine2
3. Configure Flume agent
4. Start Flume agent and test the system

@Rapheephan

5
Prepare the log generation application
▪ Create 2 Virtual Machines for generating machine1’s and machine2’s
log data.
▪ Create a simple socket java program to producing log events to
agent’s source with specific port (11111)

!
!
!
!
!
!
!
▪ Export it as an executable JAR file, and move it to the virtual
machine1
▪ Copy and move the other to the virtual machine2
@Rapheephan

6
Configuration Flume-ng agent on Host
▪ We have to configure all sink, channel, and source in the flow. My
agent name is hdfs-agent
▪ First, name the components in the agent.

! hdfs-agent.sources = log-collect
= memoryChannel
! hdfs-agent.channelshdfs-write
hdfs-agent.sinks =
!
▪ Next, define the source’s properties as follow

! hdfs-agent.sources.log-collect.type = netcat
! hdfs-agent.sources.log-collect.bind = 133.196.211.209
hdfs-agent.sources.log-collect.port = 11111
!!
hdfs-agent.sources.log-collect.channels = memoryChannel
!
▪ My source type is netcat-like source, that listens on the port ‘11111’
▪ Don’t forget to define the channel used by the source.
@Rapheephan

7
Configuration Flume-ng agent on Host (2)
▪ We want to collect the log data and write to the ‘testFlume’
directory on the HDFS cluster. Therefore, the sink should be defined
as follow.

! hdfs-agent.sinks.hdfs-write.type = hdfs
! hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://<namenode>/user/
<myusername>/testflume
= Text
! hdfs-agent.sinks.hdfs-write.hdfs.writeFormatDataStream
hdfs-agent.sinks.hdfs-write.hdfs.fileType =
!!
hdfs-agent.sinks.hdfs-write.channel = memoryChannel
!
▪ Don’t forget to specify the channel used by the sink.
▪ Finally, configure the channel

! hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity = 1000
!
▪ The channel will store the log data in-memory with the maximum
1000 events.
@Rapheephan

8
Start the Flume agent and get result
▪ My configuration file name is ‘flume.conf’, and my agent name is
‘hdfs-agent’.
▪ Start the Flume agent using the following command.
$! flume-ng agent --conf-file flume.conf --name hdfs-agent

▪ Execute the genLog.jar on both machines
▪ On Flume master, you will be able to see something like this

!
!
!
!

13/12/17 14:36:13 INFO hdfs.BucketWriter: Creating hdfs://<namenode>:
8020/user/<my userid>/testflume/FlumeData.1387258572230.tmp
13/12/17 14:36:19 INFO hdfs.BucketWriter: Renaming hdfs://cmccldULL6400.toshiba.co.jp:8020/user/g0092010/testflume/FlumeData.
1387258572230.tmp to hdfs://<namenode>:8020/user/<my userid>/testflume/
FlumeData.1387258572230

▪ Verify that the log data has stored as events on the HDFS
g0092010@cmc-cldULL6400:~$ hadoop fs -cat
2013-12-17 14:32:19: This is a sample log
2013-12-17 14:32:24: This is a sample log
2013-12-17 14:32:27: This is a sample log
2013-12-17 14:32:29: This is a sample log
@Rapheephan

testflume/*30
file from machine
file from machine
file from machine
file from machine

1.
1.
2.
1.

9
Next steps
▪ Analyse the log data and Visualise it in the (near) real time.

!
!
!
!
!
!
!
!
!

AGENT 1

MapReduce
Hive

AGENT 2

AGENT 4

Mahout

Visualisation
tools

HDFS

Impala

AGENT 3

▪ Improving throughputs of the system.
▪ Analysing and Predicting the future trend.
▪ etc.
@Rapheephan

10

Contenu connexe

Tendances

ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016Jayesh Thakrar
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Cloudera, Inc.
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperAnandMHadoop
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLCloudera, Inc.
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...kawamuray
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...Yahoo Developer Network
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkStreamNative
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...JinfengHuang3
 

Tendances (20)

ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Flume intro-100715
Flume intro-100715Flume intro-100715
Flume intro-100715
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
ApacheCon-HBase-2016
ApacheCon-HBase-2016ApacheCon-HBase-2016
ApacheCon-HBase-2016
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
 

En vedette

Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveIMC Institute
 
Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutesdwmclary
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil DubeySwapnil Dubey
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexasArvind Prabhakar
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with FlumeRatnakar Pawar
 
Enabling Microservices @Orbitz - DockerCon 2015
Enabling Microservices @Orbitz - DockerCon 2015Enabling Microservices @Orbitz - DockerCon 2015
Enabling Microservices @Orbitz - DockerCon 2015Steve Hoffman
 
DataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - DevnestDataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - DevnestOllie Parsley
 
Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011ZoeMM
 
The Asset Consultancy_PPT _final
The Asset Consultancy_PPT _finalThe Asset Consultancy_PPT _final
The Asset Consultancy_PPT _finalRushin Naik
 
Apache Flume NG
Apache Flume NGApache Flume NG
Apache Flume NGhuguk
 
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester Hortonworks
 
Tweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion miningTweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion miningSngular Meaning
 
Demo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product designDemo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product designChristine Outram
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
 

En vedette (20)

Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and Hive
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
'Flume' Case Study
'Flume' Case Study'Flume' Case Study
'Flume' Case Study
 
Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutes
 
Flume Case Study
Flume Case StudyFlume Case Study
Flume Case Study
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Enabling Microservices @Orbitz - DockerCon 2015
Enabling Microservices @Orbitz - DockerCon 2015Enabling Microservices @Orbitz - DockerCon 2015
Enabling Microservices @Orbitz - DockerCon 2015
 
Impalaチューニングポイントベストプラクティス
ImpalaチューニングポイントベストプラクティスImpalaチューニングポイントベストプラクティス
Impalaチューニングポイントベストプラクティス
 
Hadoop ~Yahoo! JAPANの活用について~
Hadoop ~Yahoo! JAPANの活用について~Hadoop ~Yahoo! JAPANの活用について~
Hadoop ~Yahoo! JAPANの活用について~
 
DataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - DevnestDataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - Devnest
 
Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011
 
The Asset Consultancy_PPT _final
The Asset Consultancy_PPT _finalThe Asset Consultancy_PPT _final
The Asset Consultancy_PPT _final
 
Apache Flume NG
Apache Flume NGApache Flume NG
Apache Flume NG
 
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
 
Tweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion miningTweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion mining
 
Demo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product designDemo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product design
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
 

Similaire à Apache Flume and its use case in Manufacturing

Your Inner Sysadmin - MidwestPHP 2015
Your Inner Sysadmin - MidwestPHP 2015Your Inner Sysadmin - MidwestPHP 2015
Your Inner Sysadmin - MidwestPHP 2015Chris Tankersley
 
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)Chris Tankersley
 
Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - FlumeAnandMHadoop
 
Php through the eyes of a hoster confoo
Php through the eyes of a hoster confooPhp through the eyes of a hoster confoo
Php through the eyes of a hoster confooCombell NV
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
 
Why we choose Symfony2
Why we choose Symfony2Why we choose Symfony2
Why we choose Symfony2Merixstudio
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Amin Astaneh
 
Setting up a big data platform at kelkoo
Setting up a big data platform at kelkooSetting up a big data platform at kelkoo
Setting up a big data platform at kelkooFabrice dos Santos
 
Your Inner Sysadmin - LonestarPHP 2015
Your Inner Sysadmin - LonestarPHP 2015Your Inner Sysadmin - LonestarPHP 2015
Your Inner Sysadmin - LonestarPHP 2015Chris Tankersley
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101Adam Muise
 
Installing and Getting Started with Alfresco
Installing and Getting Started with AlfrescoInstalling and Getting Started with Alfresco
Installing and Getting Started with AlfrescoWildan Maulana
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0venkatakrishnan k
 
Deepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa
 
lamp technology
lamp technologylamp technology
lamp technologyDeepa
 

Similaire à Apache Flume and its use case in Manufacturing (20)

Your Inner Sysadmin - MidwestPHP 2015
Your Inner Sysadmin - MidwestPHP 2015Your Inner Sysadmin - MidwestPHP 2015
Your Inner Sysadmin - MidwestPHP 2015
 
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
 
Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - Flume
 
Php through the eyes of a hoster confoo
Php through the eyes of a hoster confooPhp through the eyes of a hoster confoo
Php through the eyes of a hoster confoo
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Running php on nginx
Running php on nginxRunning php on nginx
Running php on nginx
 
Why we choose Symfony2
Why we choose Symfony2Why we choose Symfony2
Why we choose Symfony2
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)
 
Setting up a big data platform at kelkoo
Setting up a big data platform at kelkooSetting up a big data platform at kelkoo
Setting up a big data platform at kelkoo
 
Your Inner Sysadmin - LonestarPHP 2015
Your Inner Sysadmin - LonestarPHP 2015Your Inner Sysadmin - LonestarPHP 2015
Your Inner Sysadmin - LonestarPHP 2015
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
 
Installing and Getting Started with Alfresco
Installing and Getting Started with AlfrescoInstalling and Getting Started with Alfresco
Installing and Getting Started with Alfresco
 
Its3 Drupal
Its3 DrupalIts3 Drupal
Its3 Drupal
 
Its3 Drupal
Its3 DrupalIts3 Drupal
Its3 Drupal
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0
 
Linux presentation
Linux presentationLinux presentation
Linux presentation
 
Deepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa ppt about lamp technology
Deepa ppt about lamp technology
 
lamp technology
lamp technologylamp technology
lamp technology
 

Dernier

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Dernier (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Apache Flume and its use case in Manufacturing

  • 1. 
 Final Project
 Apache Flume
 
 Rapheephan Thongkham-Uan (Nancy) cscie90 Cloud Computing Harvard University Extension School Prof. Zoran B. Djordjević @TakeshiDemonkey 1
  • 2. What is Apache Flume? ▪ Flume is distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data from many different sources to a centralised data store. (http:// flume.apache.org/FlumeUserGuide.html) ! ! ! ! ! ! ! ▪ Currently available versions are 0.9.x and 1.x ▪ I want to focus on Flume use cases in manufacturing. @Rapheephan 2
  • 3. Applying Flume to Manufacturing Process ▪ In the factory, there are many machines used in the production. ! ! ! ! ! ▪ If a machine produces 1 log data file when 1 lot of product finishes processing. In one day, there will be a big amount of log data stored in the server. ▪ For the quality control and the production control improvement, analysing these log files in a real time is our objective. ▪ First, we need to collect these log data files from the production lines into the HDFS, then pass them through the analysis process. @Rapheephan 3
  • 4. Multi-agent flow image in the production system AGENT 1 consolidation AGENT 2 AGENT 4 HDFS AGENT 3 @Rapheephan 4
  • 5. My Sample agent 1 CHANNEL SOURCE SINK HDFS ▪ My system ▪ Java Runtime Environment (Java1.6.0_31) ▪ Cloudera's Distribution Including Apache Hadoop (CDH4.3) ▪ Working steps 1. Install Apache Flume on the Host machine (Flume installation guide for CDH4: http://www.cloudera.com/content/cloudera-content/ cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_12.html) 2. Create 2 log generation java applications for machine1 and machine2 3. Configure Flume agent 4. Start Flume agent and test the system @Rapheephan 5
  • 6. Prepare the log generation application ▪ Create 2 Virtual Machines for generating machine1’s and machine2’s log data. ▪ Create a simple socket java program to producing log events to agent’s source with specific port (11111) ! ! ! ! ! ! ! ▪ Export it as an executable JAR file, and move it to the virtual machine1 ▪ Copy and move the other to the virtual machine2 @Rapheephan 6
  • 7. Configuration Flume-ng agent on Host ▪ We have to configure all sink, channel, and source in the flow. My agent name is hdfs-agent ▪ First, name the components in the agent. ! hdfs-agent.sources = log-collect = memoryChannel ! hdfs-agent.channelshdfs-write hdfs-agent.sinks = ! ▪ Next, define the source’s properties as follow ! hdfs-agent.sources.log-collect.type = netcat ! hdfs-agent.sources.log-collect.bind = 133.196.211.209 hdfs-agent.sources.log-collect.port = 11111 !! hdfs-agent.sources.log-collect.channels = memoryChannel ! ▪ My source type is netcat-like source, that listens on the port ‘11111’ ▪ Don’t forget to define the channel used by the source. @Rapheephan 7
  • 8. Configuration Flume-ng agent on Host (2) ▪ We want to collect the log data and write to the ‘testFlume’ directory on the HDFS cluster. Therefore, the sink should be defined as follow. ! hdfs-agent.sinks.hdfs-write.type = hdfs ! hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://<namenode>/user/ <myusername>/testflume = Text ! hdfs-agent.sinks.hdfs-write.hdfs.writeFormatDataStream hdfs-agent.sinks.hdfs-write.hdfs.fileType = !! hdfs-agent.sinks.hdfs-write.channel = memoryChannel ! ▪ Don’t forget to specify the channel used by the sink. ▪ Finally, configure the channel ! hdfs-agent.channels.memoryChannel.type = memory hdfs-agent.channels.memoryChannel.capacity = 1000 ! ▪ The channel will store the log data in-memory with the maximum 1000 events. @Rapheephan 8
  • 9. Start the Flume agent and get result ▪ My configuration file name is ‘flume.conf’, and my agent name is ‘hdfs-agent’. ▪ Start the Flume agent using the following command. $! flume-ng agent --conf-file flume.conf --name hdfs-agent ▪ Execute the genLog.jar on both machines ▪ On Flume master, you will be able to see something like this ! ! ! ! 13/12/17 14:36:13 INFO hdfs.BucketWriter: Creating hdfs://<namenode>: 8020/user/<my userid>/testflume/FlumeData.1387258572230.tmp 13/12/17 14:36:19 INFO hdfs.BucketWriter: Renaming hdfs://cmccldULL6400.toshiba.co.jp:8020/user/g0092010/testflume/FlumeData. 1387258572230.tmp to hdfs://<namenode>:8020/user/<my userid>/testflume/ FlumeData.1387258572230 ▪ Verify that the log data has stored as events on the HDFS g0092010@cmc-cldULL6400:~$ hadoop fs -cat 2013-12-17 14:32:19: This is a sample log 2013-12-17 14:32:24: This is a sample log 2013-12-17 14:32:27: This is a sample log 2013-12-17 14:32:29: This is a sample log @Rapheephan testflume/*30 file from machine file from machine file from machine file from machine 1. 1. 2. 1. 9
  • 10. Next steps ▪ Analyse the log data and Visualise it in the (near) real time. ! ! ! ! ! ! ! ! ! AGENT 1 MapReduce Hive AGENT 2 AGENT 4 Mahout Visualisation tools HDFS Impala AGENT 3 ▪ Improving throughputs of the system. ▪ Analysing and Predicting the future trend. ▪ etc. @Rapheephan 10