SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 1
Analyse Tweets using Flume 1.4,
Hadoop 2.7 and Hive
May 2015
Dr.Thanachart Numnonda
Certified Java Programmer
thanachart@imcinstitute.com
Danairat T.
Certified Java Programmer, TOGAF – Silver
danairat@gmail.com
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
Lecture: Understanding Flume
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
Introduction
Apache Flume is:
●
A distributed data transport and aggregation system for
event- or log-structured data
●
Principally designed for continuous data ingestion into
Hadoop… But more flexible than that
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
Architecture Overview
odiago
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
Flume terminology
●
Every machine in Flume is a node
●
Each node has a source and a sink
●
Some sinks send data to collector nodes, which
aggregate data from many agents before writing to HDFS
●
All Flume nodes heartbeat to/receive config from master
●
Events enter Flume within seconds of generation
Odiago
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
Flume isn’t an analytic system
●
No ability to inspect
message bodies
●
No notion of aggregates,
rolling counters, etc
Odiago
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
Hands-On: Loading Twitter Data to
Hadoop HDFS
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
Exercise Overview
Hive.apache.org
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
1. Installing Flume
$ wget
http://apache.mirrors.hoobly.com/flume/1.4.0/apache
-flume-1.4.0-bin.tar.gz
$ tar -xvzf apache-flume-1.4.0-bin.tar.gz
$ sudo mv apache-flume-1.4.0-bin flume
$ sudo mv flume /usr/local
$ rm apache-flume-1.4.0-bin.tar.gz
Install Flume binary file
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
1. Installing Flume (cont.)
Edit $HOME ./bashrc
$ sudo vi $HOME/.bashrc
$ exec bash
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
2. Installing a jar file
$ wget http://files.cloudera.com/samples/flume-sources-1.0-
SNAPSHOT.jar
$ sudo mv flume-sources-1.0-SNAPSHOT.jar
/usr/local/flume/lib/
$ cd /usr/local/flume/conf/
$ sudo cp flume-env.sh.template flume-env.sh
$ sudo vi flume-env.sh
Copy a jar file and edit conf file
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
3. Create a new Twitter App
Login to your Twitter @ twitter.com
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
3. Create a new Twitter App (cont.)
Create a new Twitter App @ apps.twitter.com
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
3. Create a new Twitter App (cont.)
Enter all the details in the application:
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
3. Create a new Twitter App (cont.)
Your application will be created:
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
3. Create a new Twitter App (cont.)
Click on Keys and Access Tokens:
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
3. Create a new Twitter App (cont.)
Click on Keys and Access Tokens:
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
3. Create a new Twitter App (cont.)
Your Access token got created:
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
4. Configuring the Flume Agent
Copy the flume.conf file from the following url:
https://github.com/cloudera/cdh-twitter-
example/blob/master/flume-sources/flume.conf
$ vi /usr/local/flume/conf/flume.conf
flume.conf file
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
Fixing Bug for running Flume1.4 on Hadoop 2.x
Need to remove file guava-10.0.1.jar and
protobuf-java-2.4.1.jar
$ cd /usr/local/flume/rm
$ rm guava-10.0.1.jar
$ rm protobuf-java-2.4.1.jar
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
5. Fetching the data from twitter
$ flume-ng agent -n TwitterAgent -c conf -f
/usr/local/flume/conf/flume.conf
Wait for 60-90 seconds and let flume stream the data on
HDFS, then press Ctrl-c to break the command and stop the
streaming. (Ignore the exceptions)
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
6. View the straming data
$ hdfs dfs -ls /user/flume/tweets
$ hdfs dfs -cat /user/flume/tweets/FlumeData.1431058050787
$ hdfs dfs -rm /user/flume/tweets/*.tmp
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
7. Analyse data using Hive
$ wget
http://files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar
$ mv hive-serdes-1.0-SNAPSHOT.jar /usr/local/apache-hive-
1.1.0-bin/lib/
$ hive
hive> ADD JAR /usr/local/apache-hive-1.1.0-bin/lib/hive-
serdes-1.0-SNAPSHOT.jar;
Get a Serde Jar File for parsing JSON file
Register the Jar file.
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
7. Analyse data using Hive (cont.)
Running the following hive command
http://www.thecloudavenue.com/2013/03/analyse-tweets-using-flume-hadoop-and.html
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
7. Analyse data using Hive (cont)
hive> elect user.screen_name, user.followers_count c from
tweets order by c desc;
Finding user who has the most number of followers
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop
Thank you
www.imcinstitute.com
www.facebook.com/imcinstitute

Contenu connexe

Tendances

Big data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartBig data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartIMC Institute
 
Thailand Hadoop Big Data Challenge #1
Thailand Hadoop Big Data Challenge #1Thailand Hadoop Big Data Challenge #1
Thailand Hadoop Big Data Challenge #1IMC Institute
 
Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveIMC Institute
 
Big Data Hadoop using Amazon Elastic MapReduce: Hands-On Labs
Big Data Hadoop using Amazon Elastic MapReduce: Hands-On LabsBig Data Hadoop using Amazon Elastic MapReduce: Hands-On Labs
Big Data Hadoop using Amazon Elastic MapReduce: Hands-On LabsIMC Institute
 
Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2IMC Institute
 
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02Realtimeanalyticsattwitter strata2011-110204123031-phpapp02
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02matrixvn
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]knowbigdata
 
Reproducible datascience [with Terraform]
Reproducible datascience [with Terraform]Reproducible datascience [with Terraform]
Reproducible datascience [with Terraform]David Przybilla
 
API analytics with Redis and Google Bigquery. NoSQL matters edition
API analytics with Redis and Google Bigquery. NoSQL matters editionAPI analytics with Redis and Google Bigquery. NoSQL matters edition
API analytics with Redis and Google Bigquery. NoSQL matters editionjavier ramirez
 
Apache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかApache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかKouhei Sutou
 
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNOne Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNDataWorks Summit
 
New developments in open source ecosystem spark3.0 koalas delta lake
New developments in open source ecosystem spark3.0 koalas delta lakeNew developments in open source ecosystem spark3.0 koalas delta lake
New developments in open source ecosystem spark3.0 koalas delta lakeXiao Li
 
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data AnalyticsData 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data AnalyticsAvkash Chauhan
 
Apache Storm
Apache StormApache Storm
Apache StormEdureka!
 
How does that PySpark thing work? And why Arrow makes it faster?
How does that PySpark thing work? And why Arrow makes it faster?How does that PySpark thing work? And why Arrow makes it faster?
How does that PySpark thing work? And why Arrow makes it faster?Rubén Berenguel
 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Jeffrey Breen
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 

Tendances (20)

Big data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartBig data processing using Cloudera Quickstart
Big data processing using Cloudera Quickstart
 
Thailand Hadoop Big Data Challenge #1
Thailand Hadoop Big Data Challenge #1Thailand Hadoop Big Data Challenge #1
Thailand Hadoop Big Data Challenge #1
 
Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and Hive
 
Big Data Hadoop using Amazon Elastic MapReduce: Hands-On Labs
Big Data Hadoop using Amazon Elastic MapReduce: Hands-On LabsBig Data Hadoop using Amazon Elastic MapReduce: Hands-On Labs
Big Data Hadoop using Amazon Elastic MapReduce: Hands-On Labs
 
Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2
 
Setting up Hadoop YARN Clustering
Setting up Hadoop YARN ClusteringSetting up Hadoop YARN Clustering
Setting up Hadoop YARN Clustering
 
Linux intermediate level
Linux intermediate levelLinux intermediate level
Linux intermediate level
 
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02Realtimeanalyticsattwitter strata2011-110204123031-phpapp02
Realtimeanalyticsattwitter strata2011-110204123031-phpapp02
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]
 
Reproducible datascience [with Terraform]
Reproducible datascience [with Terraform]Reproducible datascience [with Terraform]
Reproducible datascience [with Terraform]
 
API analytics with Redis and Google Bigquery. NoSQL matters edition
API analytics with Redis and Google Bigquery. NoSQL matters editionAPI analytics with Redis and Google Bigquery. NoSQL matters edition
API analytics with Redis and Google Bigquery. NoSQL matters edition
 
Apache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかApache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのか
 
Aws r
Aws rAws r
Aws r
 
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNOne Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
 
New developments in open source ecosystem spark3.0 koalas delta lake
New developments in open source ecosystem spark3.0 koalas delta lakeNew developments in open source ecosystem spark3.0 koalas delta lake
New developments in open source ecosystem spark3.0 koalas delta lake
 
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data AnalyticsData 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
How does that PySpark thing work? And why Arrow makes it faster?
How does that PySpark thing work? And why Arrow makes it faster?How does that PySpark thing work? And why Arrow makes it faster?
How does that PySpark thing work? And why Arrow makes it faster?
 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 

En vedette

Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIMC Institute
 
Big Data Maturity Model and Governance
Big Data Maturity Model and GovernanceBig Data Maturity Model and Governance
Big Data Maturity Model and GovernanceIMC Institute
 
Big Data as a Service
Big Data as a ServiceBig Data as a Service
Big Data as a ServiceIMC Institute
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in ChinaIMC Institute
 
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016IMC Institute
 
Big data project management
Big data project managementBig data project management
Big data project managementIMC Institute
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015IMC Institute
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibIMC Institute
 
เทคโนโลยี Cloud Computing สำหรับงานสถาบันการศึกษา
เทคโนโลยี  Cloud Computing  สำหรับงานสถาบันการศึกษาเทคโนโลยี  Cloud Computing  สำหรับงานสถาบันการศึกษา
เทคโนโลยี Cloud Computing สำหรับงานสถาบันการศึกษาIMC Institute
 
บทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazineบทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-MagazineIMC Institute
 
Big Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop WorkshopBig Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop WorkshopIMC Institute
 
บทที่ 6 ความปลอดภัยบนระบบคอมพิวเตอร์และเครือข่าย
บทที่ 6 ความปลอดภัยบนระบบคอมพิวเตอร์และเครือข่ายบทที่ 6 ความปลอดภัยบนระบบคอมพิวเตอร์และเครือข่าย
บทที่ 6 ความปลอดภัยบนระบบคอมพิวเตอร์และเครือข่ายWanphen Wirojcharoenwong
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัลCloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัลIMC Institute
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public CloudIMC Institute
 
การบริหารจัดการระบบ Cloud Computing สำหรับองค์กรธุรกิจ SME
การบริหารจัดการระบบ  Cloud Computing  สำหรับองค์กรธุรกิจ SMEการบริหารจัดการระบบ  Cloud Computing  สำหรับองค์กรธุรกิจ SME
การบริหารจัดการระบบ Cloud Computing สำหรับองค์กรธุรกิจ SMEIMC Institute
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
 

En vedette (19)

Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
 
Big Data Maturity Model and Governance
Big Data Maturity Model and GovernanceBig Data Maturity Model and Governance
Big Data Maturity Model and Governance
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data as a Service
Big Data as a ServiceBig Data as a Service
Big Data as a Service
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in China
 
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
 
Big data project management
Big data project managementBig data project management
Big data project management
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlib
 
เทคโนโลยี Cloud Computing สำหรับงานสถาบันการศึกษา
เทคโนโลยี  Cloud Computing  สำหรับงานสถาบันการศึกษาเทคโนโลยี  Cloud Computing  สำหรับงานสถาบันการศึกษา
เทคโนโลยี Cloud Computing สำหรับงานสถาบันการศึกษา
 
ITSS Overview
ITSS OverviewITSS Overview
ITSS Overview
 
บทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazineบทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazine
 
Big Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop WorkshopBig Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop Workshop
 
บทที่ 6 ความปลอดภัยบนระบบคอมพิวเตอร์และเครือข่าย
บทที่ 6 ความปลอดภัยบนระบบคอมพิวเตอร์และเครือข่ายบทที่ 6 ความปลอดภัยบนระบบคอมพิวเตอร์และเครือข่าย
บทที่ 6 ความปลอดภัยบนระบบคอมพิวเตอร์และเครือข่าย
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัลCloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public Cloud
 
การบริหารจัดการระบบ Cloud Computing สำหรับองค์กรธุรกิจ SME
การบริหารจัดการระบบ  Cloud Computing  สำหรับองค์กรธุรกิจ SMEการบริหารจัดการระบบ  Cloud Computing  สำหรับองค์กรธุรกิจ SME
การบริหารจัดการระบบ Cloud Computing สำหรับองค์กรธุรกิจ SME
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 

Similaire à Analyse Tweets using Flume 1.4, Hadoop 2.7 and Hive

Big Data Hadoop Local and Public Cloud (Amazon EMR)
Big Data Hadoop Local and Public Cloud (Amazon EMR)Big Data Hadoop Local and Public Cloud (Amazon EMR)
Big Data Hadoop Local and Public Cloud (Amazon EMR)IMC Institute
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSDataWorks Summit
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with ExamplesJoe McTee
 
Open Source - NOVALUG January 2019
Open Source  - NOVALUG January 2019Open Source  - NOVALUG January 2019
Open Source - NOVALUG January 2019plarsen67
 
Ubuntu And Parental Controls
Ubuntu And Parental ControlsUbuntu And Parental Controls
Ubuntu And Parental Controlsjasonholtzapple
 
Terraform: Tales from the Trenches
Terraform: Tales from the TrenchesTerraform: Tales from the Trenches
Terraform: Tales from the TrenchesRobert Fox
 
Massively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian HustonMassively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian HustonPyData
 
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinMoon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinFlink Forward
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Rajat Mittal
 
Velocity London - Chaos Engineering Bootcamp
Velocity London - Chaos Engineering Bootcamp Velocity London - Chaos Engineering Bootcamp
Velocity London - Chaos Engineering Bootcamp Ana Medina
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Anne Nicolas
 
Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013olberger
 
10 cosas que un firewall debería hacer
10 cosas que un firewall debería hacer10 cosas que un firewall debería hacer
10 cosas que un firewall debería haceraloscocco
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduceSkillspeed
 
Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12N Masahiro
 

Similaire à Analyse Tweets using Flume 1.4, Hadoop 2.7 and Hive (20)

Big Data Hadoop Local and Public Cloud (Amazon EMR)
Big Data Hadoop Local and Public Cloud (Amazon EMR)Big Data Hadoop Local and Public Cloud (Amazon EMR)
Big Data Hadoop Local and Public Cloud (Amazon EMR)
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
Open Source - NOVALUG January 2019
Open Source  - NOVALUG January 2019Open Source  - NOVALUG January 2019
Open Source - NOVALUG January 2019
 
Ubuntu And Parental Controls
Ubuntu And Parental ControlsUbuntu And Parental Controls
Ubuntu And Parental Controls
 
Terraform: Tales from the Trenches
Terraform: Tales from the TrenchesTerraform: Tales from the Trenches
Terraform: Tales from the Trenches
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
 
Massively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian HustonMassively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian Huston
 
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinMoon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data
 
Django firebird project
Django firebird projectDjango firebird project
Django firebird project
 
Velocity London - Chaos Engineering Bootcamp
Velocity London - Chaos Engineering Bootcamp Velocity London - Chaos Engineering Bootcamp
Velocity London - Chaos Engineering Bootcamp
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Django Deployment-in-AWS
Django Deployment-in-AWSDjango Deployment-in-AWS
Django Deployment-in-AWS
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
 
Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013
 
10 cosas que un firewall debería hacer
10 cosas que un firewall debería hacer10 cosas que un firewall debería hacer
10 cosas que un firewall debería hacer
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
 
Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12
 

Plus de IMC Institute

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14IMC Institute
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019IMC Institute
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AIIMC Institute
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12IMC Institute
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital TransformationIMC Institute
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIMC Institute
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมIMC Institute
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationIMC Institute
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon ValleyIMC Institute
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationIMC Institute
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)IMC Institute
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง IMC Institute
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9 IMC Institute
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016IMC Institute
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger IMC Institute
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.orgIMC Institute
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgIMC Institute
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital TransformationIMC Institute
 

Plus de IMC Institute (20)

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AI
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to Work
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon Valley
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.org
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.org
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Dernier (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Analyse Tweets using Flume 1.4, Hadoop 2.7 and Hive

  • 1. Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 1 Analyse Tweets using Flume 1.4, Hadoop 2.7 and Hive May 2015 Dr.Thanachart Numnonda Certified Java Programmer thanachart@imcinstitute.com Danairat T. Certified Java Programmer, TOGAF – Silver danairat@gmail.com
  • 2. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop Lecture: Understanding Flume
  • 3. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop Introduction Apache Flume is: ● A distributed data transport and aggregation system for event- or log-structured data ● Principally designed for continuous data ingestion into Hadoop… But more flexible than that
  • 4. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop Architecture Overview odiago
  • 5. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop Flume terminology ● Every machine in Flume is a node ● Each node has a source and a sink ● Some sinks send data to collector nodes, which aggregate data from many agents before writing to HDFS ● All Flume nodes heartbeat to/receive config from master ● Events enter Flume within seconds of generation Odiago
  • 6. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop Flume isn’t an analytic system ● No ability to inspect message bodies ● No notion of aggregates, rolling counters, etc Odiago
  • 7. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop Hands-On: Loading Twitter Data to Hadoop HDFS
  • 8. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop Exercise Overview Hive.apache.org
  • 9. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 1. Installing Flume $ wget http://apache.mirrors.hoobly.com/flume/1.4.0/apache -flume-1.4.0-bin.tar.gz $ tar -xvzf apache-flume-1.4.0-bin.tar.gz $ sudo mv apache-flume-1.4.0-bin flume $ sudo mv flume /usr/local $ rm apache-flume-1.4.0-bin.tar.gz Install Flume binary file
  • 10. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 1. Installing Flume (cont.) Edit $HOME ./bashrc $ sudo vi $HOME/.bashrc $ exec bash
  • 11. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 2. Installing a jar file $ wget http://files.cloudera.com/samples/flume-sources-1.0- SNAPSHOT.jar $ sudo mv flume-sources-1.0-SNAPSHOT.jar /usr/local/flume/lib/ $ cd /usr/local/flume/conf/ $ sudo cp flume-env.sh.template flume-env.sh $ sudo vi flume-env.sh Copy a jar file and edit conf file
  • 12. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 3. Create a new Twitter App Login to your Twitter @ twitter.com
  • 13. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 3. Create a new Twitter App (cont.) Create a new Twitter App @ apps.twitter.com
  • 14. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 3. Create a new Twitter App (cont.) Enter all the details in the application:
  • 15. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 3. Create a new Twitter App (cont.) Your application will be created:
  • 16. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 3. Create a new Twitter App (cont.) Click on Keys and Access Tokens:
  • 17. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 3. Create a new Twitter App (cont.) Click on Keys and Access Tokens:
  • 18. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 3. Create a new Twitter App (cont.) Your Access token got created:
  • 19. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 4. Configuring the Flume Agent Copy the flume.conf file from the following url: https://github.com/cloudera/cdh-twitter- example/blob/master/flume-sources/flume.conf $ vi /usr/local/flume/conf/flume.conf flume.conf file
  • 20. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop Fixing Bug for running Flume1.4 on Hadoop 2.x Need to remove file guava-10.0.1.jar and protobuf-java-2.4.1.jar $ cd /usr/local/flume/rm $ rm guava-10.0.1.jar $ rm protobuf-java-2.4.1.jar
  • 21. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 5. Fetching the data from twitter $ flume-ng agent -n TwitterAgent -c conf -f /usr/local/flume/conf/flume.conf Wait for 60-90 seconds and let flume stream the data on HDFS, then press Ctrl-c to break the command and stop the streaming. (Ignore the exceptions)
  • 22. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 6. View the straming data $ hdfs dfs -ls /user/flume/tweets $ hdfs dfs -cat /user/flume/tweets/FlumeData.1431058050787 $ hdfs dfs -rm /user/flume/tweets/*.tmp
  • 23. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 7. Analyse data using Hive $ wget http://files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar $ mv hive-serdes-1.0-SNAPSHOT.jar /usr/local/apache-hive- 1.1.0-bin/lib/ $ hive hive> ADD JAR /usr/local/apache-hive-1.1.0-bin/lib/hive- serdes-1.0-SNAPSHOT.jar; Get a Serde Jar File for parsing JSON file Register the Jar file.
  • 24. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 7. Analyse data using Hive (cont.) Running the following hive command http://www.thecloudavenue.com/2013/03/analyse-tweets-using-flume-hadoop-and.html
  • 25. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop 7. Analyse data using Hive (cont) hive> elect user.screen_name, user.followers_count c from tweets order by c desc; Finding user who has the most number of followers
  • 26. Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com Apr 2015Big Data using Hadoop workshop Thank you www.imcinstitute.com www.facebook.com/imcinstitute