SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
Introduction
Flume
Flume
 Is a distributed, reliable tool/service for collecting large
amount of streaming data to a centralized storage.
 Or in simple way it means flume is helpful when we need
to load/collect data continuously in real time and not like
the traditional RDBMS that loads data in batch periodically.
Rupak Roy
 One of the biggest advantage is when the rate of incoming
data exceeds the rate at which data can be stored to its
destination, then Flume acts as a medium or middle person
between the data source and data storage to provide a steady
flow of data between them.
 Example: Log File in general is a file/record that consists of events
of the system operations such as a software creates log file
whenever there is a failure in its operations. On analyzing such
data one can figure out the behavior and locate the failures of
the software.
 So whenever we transfer data to HDFS using –put or
–copyFromLocal command, we can only transfer one file at a
time, so to overcome this issue Flume was created to transfer
streaming data without any delay.
 Another advantage of flume is the reliability of transferring the
data to HDFS because during a file transfer to HDFS the size of
the file will be zero until it is finished. So if there is any network
issue or power failure in the middle of transferring data, the data
in the HDFS will be lost.
Rupak Roy
Apache Flume - Architecture
 Source: Source extracts the data from the clients and
transfers it to one or more channels of the Flume.
Source type can be: avro, netcat, seq, exec,
syslogudp, http, twitter etc.
 Channel: it acts as a mediator between source and
the sink. It temporarily stores the data from the source
and buffers them until they are consumed by the sinks.
Channel can be of many types like Memory Channel,
File Channel, JDBC channel, custom channel etc.
 Sink: consumes the data from the channel and
transfers them to the centralized storages like Hbase
and HDFS. Some of the sink types are logger, avro,
hdfs, irc, file_roll etc.
Rupak Roy
 Agent: is an independent daemon process. It is a
collection of Source, Channel, Sink that receives
the data from the clients or other Flume Agents
and transfers it to its destination.
A flume agent can have multiple sources, sinks
and channels.
Flume Architecture
Rupak Roy
Channel Types:
 Memory Channels
These are volatile based memory and restrict the flumes
functions with RAM availability. Whenever there are some
interruptions due to power failures or network issue, any data
that are not transferred will be lost.
However it provides with one universal advantage of volatile
based memory is its SPEED. Memory channel types are faster
than File based channels.
 File Channels: are robust channels and uses disk instead of
RAM for any events. It is a bit slower than Memory based
channel but comes with another solid advantage that the
events or the data will not be lost even if there is any
interruptions in the Flume operations due to power failures
or network issues.
Rupak Roy
Configuration
First enter the Flume Folder : cd conf
: conf ls
: vi flumepractice.conf then press i (insert mode) then
#Name the components
test.sources = ts1
test.sinks = tk1
test.channels = tc1
#Describe/Configure the source
test.sources.ts1.type = exec
test.sources.ts1.command = tail –F /home/cloudera/hadoop/logs/ ………..
#Describe the sink
test.sink.tk1.type = hdfs
test.sink.tk1.path = hdfs://localhost:9001/flume
#Describe the channel
test.channels.tc1.type = memory
Rupak Roy
configuration
#Join/Bind the source and sink to the channel
#Joining the source to the channel
test.sources.ts1.channels= tc1
#Join the sinks to the channel
test.sinks.tk1.channel = tc1
Press ‘esc’ and then type ‘ :wq!‘ To save and exit.
Then run the following commands to start the Flume Job
Flume$ bin/flume–ng agent –n test –f conf/flumepractice.conf;
Where,
flume-ng is the executable file of flume
Agent: specifies a flume agent to be execute.
-n: allows to direct the name of the agent mentioned in the
configuration file.
-f: allows to specify the path of the configuration file.
Rupak Roy
Next
 Scoop to transfer bulk data to and from
HDFS to Structured Databases
Rupak Roy

Contenu connexe

Tendances

Tendances (20)

Hadoop basic commands
Hadoop basic commandsHadoop basic commands
Hadoop basic commands
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
 
HDFS_Command_Reference
HDFS_Command_ReferenceHDFS_Command_Reference
HDFS_Command_Reference
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Hadoop migration and upgradation
Hadoop migration and upgradationHadoop migration and upgradation
Hadoop migration and upgradation
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overviewHdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Import Database Data using RODBC in R Studio
Import Database Data using RODBC in R StudioImport Database Data using RODBC in R Studio
Import Database Data using RODBC in R Studio
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Unit 1
Unit 1Unit 1
Unit 1
 
Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answers
 

Similaire à Introduction to Flume

Flume with Twitter Integration
Flume with Twitter IntegrationFlume with Twitter Integration
Flume with Twitter Integration
RockyCIce
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
Swapnil Dubey
 
File transfer protocol
File transfer protocolFile transfer protocol
File transfer protocol
Amandeep Kaur
 
Dfs (Distributed computing)
Dfs (Distributed computing)Dfs (Distributed computing)
Dfs (Distributed computing)
Sri Prasanna
 

Similaire à Introduction to Flume (20)

Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - Flume
 
Apache flume
Apache flumeApache flume
Apache flume
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Apache flume - Twitter Streaming
Apache flume - Twitter Streaming Apache flume - Twitter Streaming
Apache flume - Twitter Streaming
 
Flume DS -JSP.pptx
Flume DS -JSP.pptxFlume DS -JSP.pptx
Flume DS -JSP.pptx
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Flume
FlumeFlume
Flume
 
Data ingestion
Data ingestionData ingestion
Data ingestion
 
Avvo fkafka
Avvo fkafkaAvvo fkafka
Avvo fkafka
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 
untitled_document.pptx
untitled_document.pptxuntitled_document.pptx
untitled_document.pptx
 
Flume with Twitter Integration
Flume with Twitter IntegrationFlume with Twitter Integration
Flume with Twitter Integration
 
File Transfer Protocol (FTP)
File Transfer Protocol (FTP)File Transfer Protocol (FTP)
File Transfer Protocol (FTP)
 
file transfer and access utilities
file transfer and access utilitiesfile transfer and access utilities
file transfer and access utilities
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
 
File transfer protocol
File transfer protocolFile transfer protocol
File transfer protocol
 
Using an FTP client - Client server computing
Using an FTP client -  Client server computingUsing an FTP client -  Client server computing
Using an FTP client - Client server computing
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Dfs (Distributed computing)
Dfs (Distributed computing)Dfs (Distributed computing)
Dfs (Distributed computing)
 

Plus de Rupak Roy

Plus de Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store Functions
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
 
Map Reduce Execution Architecture
Map Reduce Execution Architecture Map Reduce Execution Architecture
Map Reduce Execution Architecture
 

Dernier

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Dernier (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 

Introduction to Flume

  • 2. Flume  Is a distributed, reliable tool/service for collecting large amount of streaming data to a centralized storage.  Or in simple way it means flume is helpful when we need to load/collect data continuously in real time and not like the traditional RDBMS that loads data in batch periodically. Rupak Roy
  • 3.  One of the biggest advantage is when the rate of incoming data exceeds the rate at which data can be stored to its destination, then Flume acts as a medium or middle person between the data source and data storage to provide a steady flow of data between them.  Example: Log File in general is a file/record that consists of events of the system operations such as a software creates log file whenever there is a failure in its operations. On analyzing such data one can figure out the behavior and locate the failures of the software.  So whenever we transfer data to HDFS using –put or –copyFromLocal command, we can only transfer one file at a time, so to overcome this issue Flume was created to transfer streaming data without any delay.  Another advantage of flume is the reliability of transferring the data to HDFS because during a file transfer to HDFS the size of the file will be zero until it is finished. So if there is any network issue or power failure in the middle of transferring data, the data in the HDFS will be lost. Rupak Roy
  • 4. Apache Flume - Architecture  Source: Source extracts the data from the clients and transfers it to one or more channels of the Flume. Source type can be: avro, netcat, seq, exec, syslogudp, http, twitter etc.  Channel: it acts as a mediator between source and the sink. It temporarily stores the data from the source and buffers them until they are consumed by the sinks. Channel can be of many types like Memory Channel, File Channel, JDBC channel, custom channel etc.  Sink: consumes the data from the channel and transfers them to the centralized storages like Hbase and HDFS. Some of the sink types are logger, avro, hdfs, irc, file_roll etc. Rupak Roy
  • 5.  Agent: is an independent daemon process. It is a collection of Source, Channel, Sink that receives the data from the clients or other Flume Agents and transfers it to its destination. A flume agent can have multiple sources, sinks and channels. Flume Architecture Rupak Roy
  • 6. Channel Types:  Memory Channels These are volatile based memory and restrict the flumes functions with RAM availability. Whenever there are some interruptions due to power failures or network issue, any data that are not transferred will be lost. However it provides with one universal advantage of volatile based memory is its SPEED. Memory channel types are faster than File based channels.  File Channels: are robust channels and uses disk instead of RAM for any events. It is a bit slower than Memory based channel but comes with another solid advantage that the events or the data will not be lost even if there is any interruptions in the Flume operations due to power failures or network issues. Rupak Roy
  • 7. Configuration First enter the Flume Folder : cd conf : conf ls : vi flumepractice.conf then press i (insert mode) then #Name the components test.sources = ts1 test.sinks = tk1 test.channels = tc1 #Describe/Configure the source test.sources.ts1.type = exec test.sources.ts1.command = tail –F /home/cloudera/hadoop/logs/ ……….. #Describe the sink test.sink.tk1.type = hdfs test.sink.tk1.path = hdfs://localhost:9001/flume #Describe the channel test.channels.tc1.type = memory Rupak Roy
  • 8. configuration #Join/Bind the source and sink to the channel #Joining the source to the channel test.sources.ts1.channels= tc1 #Join the sinks to the channel test.sinks.tk1.channel = tc1 Press ‘esc’ and then type ‘ :wq!‘ To save and exit. Then run the following commands to start the Flume Job Flume$ bin/flume–ng agent –n test –f conf/flumepractice.conf; Where, flume-ng is the executable file of flume Agent: specifies a flume agent to be execute. -n: allows to direct the name of the agent mentioned in the configuration file. -f: allows to specify the path of the configuration file. Rupak Roy
  • 9. Next  Scoop to transfer bulk data to and from HDFS to Structured Databases Rupak Roy