Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.


Final Project


Apache Flume




Rapheephan Thongkham-Uan (Nancy)

cscie90 Cloud Computing
Harvard University Extension ...
What is Apache Flume?
▪ Flume is distributed, reliable, and available service for efficiently
collecting, aggregating, and...
Applying Flume to Manufacturing Process
▪ In the factory, there are many machines used in the production.

!
!
!
!
!
▪ If ...
Multi-agent flow image in the production system

AGENT 1

consolidation

AGENT 2

AGENT 4
HDFS

AGENT 3

@Rapheephan

4
My Sample
agent 1
CHANNEL

SOURCE

SINK

HDFS

▪ My system
▪ Java Runtime Environment (Java1.6.0_31)
▪ Cloudera's Distribu...
Prepare the log generation application
▪ Create 2 Virtual Machines for generating machine1’s and machine2’s
log data.
▪ Cr...
Configuration Flume-ng agent on Host
▪ We have to configure all sink, channel, and source in the flow. My
agent name is hd...
Configuration Flume-ng agent on Host (2)
▪ We want to collect the log data and write to the ‘testFlume’
directory on the H...
Start the Flume agent and get result
▪ My configuration file name is ‘flume.conf’, and my agent name is
‘hdfs-agent’.
▪ St...
Next steps
▪ Analyse the log data and Visualise it in the (near) real time.

!
!
!
!
!
!
!
!
!

AGENT 1

MapReduce
Hive

A...
Prochain SlideShare
Chargement dans…5
×

Apache Flume and its use case in Manufacturing

3 407 vues

Publié le

CSCI E90 Cloud Computing
Harvard University Extension School

Publié dans : Formation, Technologie
  • Soyez le premier à commenter

Apache Flume and its use case in Manufacturing

  1. 1. 
 Final Project
 Apache Flume
 
 Rapheephan Thongkham-Uan (Nancy) cscie90 Cloud Computing Harvard University Extension School Prof. Zoran B. Djordjević @TakeshiDemonkey 1
  2. 2. What is Apache Flume? ▪ Flume is distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data from many different sources to a centralised data store. (http:// flume.apache.org/FlumeUserGuide.html) ! ! ! ! ! ! ! ▪ Currently available versions are 0.9.x and 1.x ▪ I want to focus on Flume use cases in manufacturing. @Rapheephan 2
  3. 3. Applying Flume to Manufacturing Process ▪ In the factory, there are many machines used in the production. ! ! ! ! ! ▪ If a machine produces 1 log data file when 1 lot of product finishes processing. In one day, there will be a big amount of log data stored in the server. ▪ For the quality control and the production control improvement, analysing these log files in a real time is our objective. ▪ First, we need to collect these log data files from the production lines into the HDFS, then pass them through the analysis process. @Rapheephan 3
  4. 4. Multi-agent flow image in the production system AGENT 1 consolidation AGENT 2 AGENT 4 HDFS AGENT 3 @Rapheephan 4
  5. 5. My Sample agent 1 CHANNEL SOURCE SINK HDFS ▪ My system ▪ Java Runtime Environment (Java1.6.0_31) ▪ Cloudera's Distribution Including Apache Hadoop (CDH4.3) ▪ Working steps 1. Install Apache Flume on the Host machine (Flume installation guide for CDH4: http://www.cloudera.com/content/cloudera-content/ cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_12.html) 2. Create 2 log generation java applications for machine1 and machine2 3. Configure Flume agent 4. Start Flume agent and test the system @Rapheephan 5
  6. 6. Prepare the log generation application ▪ Create 2 Virtual Machines for generating machine1’s and machine2’s log data. ▪ Create a simple socket java program to producing log events to agent’s source with specific port (11111) ! ! ! ! ! ! ! ▪ Export it as an executable JAR file, and move it to the virtual machine1 ▪ Copy and move the other to the virtual machine2 @Rapheephan 6
  7. 7. Configuration Flume-ng agent on Host ▪ We have to configure all sink, channel, and source in the flow. My agent name is hdfs-agent ▪ First, name the components in the agent. ! hdfs-agent.sources = log-collect = memoryChannel ! hdfs-agent.channelshdfs-write hdfs-agent.sinks = ! ▪ Next, define the source’s properties as follow ! hdfs-agent.sources.log-collect.type = netcat ! hdfs-agent.sources.log-collect.bind = 133.196.211.209 hdfs-agent.sources.log-collect.port = 11111 !! hdfs-agent.sources.log-collect.channels = memoryChannel ! ▪ My source type is netcat-like source, that listens on the port ‘11111’ ▪ Don’t forget to define the channel used by the source. @Rapheephan 7
  8. 8. Configuration Flume-ng agent on Host (2) ▪ We want to collect the log data and write to the ‘testFlume’ directory on the HDFS cluster. Therefore, the sink should be defined as follow. ! hdfs-agent.sinks.hdfs-write.type = hdfs ! hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://<namenode>/user/ <myusername>/testflume = Text ! hdfs-agent.sinks.hdfs-write.hdfs.writeFormatDataStream hdfs-agent.sinks.hdfs-write.hdfs.fileType = !! hdfs-agent.sinks.hdfs-write.channel = memoryChannel ! ▪ Don’t forget to specify the channel used by the sink. ▪ Finally, configure the channel ! hdfs-agent.channels.memoryChannel.type = memory hdfs-agent.channels.memoryChannel.capacity = 1000 ! ▪ The channel will store the log data in-memory with the maximum 1000 events. @Rapheephan 8
  9. 9. Start the Flume agent and get result ▪ My configuration file name is ‘flume.conf’, and my agent name is ‘hdfs-agent’. ▪ Start the Flume agent using the following command. $! flume-ng agent --conf-file flume.conf --name hdfs-agent ▪ Execute the genLog.jar on both machines ▪ On Flume master, you will be able to see something like this ! ! ! ! 13/12/17 14:36:13 INFO hdfs.BucketWriter: Creating hdfs://<namenode>: 8020/user/<my userid>/testflume/FlumeData.1387258572230.tmp 13/12/17 14:36:19 INFO hdfs.BucketWriter: Renaming hdfs://cmccldULL6400.toshiba.co.jp:8020/user/g0092010/testflume/FlumeData. 1387258572230.tmp to hdfs://<namenode>:8020/user/<my userid>/testflume/ FlumeData.1387258572230 ▪ Verify that the log data has stored as events on the HDFS g0092010@cmc-cldULL6400:~$ hadoop fs -cat 2013-12-17 14:32:19: This is a sample log 2013-12-17 14:32:24: This is a sample log 2013-12-17 14:32:27: This is a sample log 2013-12-17 14:32:29: This is a sample log @Rapheephan testflume/*30 file from machine file from machine file from machine file from machine 1. 1. 2. 1. 9
  10. 10. Next steps ▪ Analyse the log data and Visualise it in the (near) real time. ! ! ! ! ! ! ! ! ! AGENT 1 MapReduce Hive AGENT 2 AGENT 4 Mahout Visualisation tools HDFS Impala AGENT 3 ▪ Improving throughputs of the system. ▪ Analysing and Predicting the future trend. ▪ etc. @Rapheephan 10

×