Analysis of historical movie data by BHADRA

COMPUTER SCIENCE AND ENGINEERING
ANALYSIS OF HISTORICAL MOVIE DATA BY
USING HADOOP SYSTEM
INTERNAL GUIDE:T.CHANDRA SHEKAR REDDY
:
G.VEERABHADRA(13R21A05C8)

 Abstract
 Requirements
 Dataflow Diagram
 Methodology
 Screenshots
 Future Extension
 Conclusion
 References

Recommendation system provides the facility to understand a person's taste and
find new, desirable content for them automatically based on the pattern between
their likes and rating of different items. In this paper, we have proposed a
recommendation system for the large amount of data available on the web in the
form of ratings, reviews, opinions, complaints, remarks, feedback, and comments
about any item (product, event, individual and services) using Hadoop Framework.

 Hadoop 2.x
 My Sql
 HDFS
 Hive
 Pig
 Hue
 JDK 1.6

Dataflow Diagram
MS Excel (datasets
in csv format)
Import into
cloudera home
Load the data
into mysql
Create database
in mysql
Load the data into
hive using sqoop
Load the data into
Hue

Hadoop Distributed File System (HDFS):
 The Hadoop Distributed File System (HDFS) is designed to store very large data
sets reliably, and to stream those data sets at high bandwidth to user applications. In
a large cluster, thousands of servers both host directly attached storage and execute
user application tasks.
 An important characteristic of Hadoop is the partitioning of data and computation
across many (thousands) of hosts, and the execution of application computations in
parallel close to their data.

• Hive is a data warehousing frame work in hadoop where we store data in the form
of tables ( structured format).Hive runs on the top of hdfs and mapreduce.
• The back end storage for hive is hdfs and executing model is mapreduce.
• Hive provides SQL like language called HiveQL(HQL). HQL is very similar to
SQL.
• Hive is designed for scalability and easy of use.

 Tinyint(1 byte)
 SmallInt(2 bytes)
 int(4 bytes)
 Bigint(8 bytes)
 float(4 bytes)
 double(8 bytes)
 String(max size 2gb)
 varchar(hive-0.12.0 supports 1 to 65535 characters)
 Boolean --->true/false

 sqoop is a tool designed to transfer data between hadoop and relational databases.
You can use sqoop to import data from a relational database management system
such as MYSQL,or ORACLE into the hadoop distributed file system and then
export the data back into an RDBMS.
 Sqoop automates most of the this process, relying on the database to describe the
schema for the data to be imported . Sqoop uses mapreduce to import and export
the data which provides parallel operations as well as fault tolerance.

Copy the file from windows to cloudera.
 For creating the database: Mysql>create database name;
 For using the database: Mysql>use name;

For creating table name: Mysql>create table tablename(….);

To import data sets in to MYSQL the following command is used:
load the file Mysql>load data local infile ‘path of the file’ into table tablename fields
terminated by ‘,’ enclosed by ‘”’ lines terminated by ‘rn’;
exit;

For importing the data from mysql to hive the following command is used:
Sqoop import –connect jdbc:mysql//localhost/datbasename --username root –
password cloudera --table tablename --fields-terminated-by ’,’ --hive -import -m 1
To log in to HUE:
username: Cloudera
password: Cloudera
go to hive editor.
Where at the left side we have to select database and at the right side we can try
some analytical queries on the tables created. Once the result is displayed select
some charts and repeat the same process for all the respective years.

Clearly Big Data is in its beginnings, and is much more to be discovered. This
technology itself brings business benefits by being leveraged across domains like
Big Data, Business Intelligence and Analytics.
These business benefits are:
 Speed and Accelerated performance
Good query performance for improved decision making, boost of performance for
data load processes for a low data latency, accelerated memory planning
capabilities.
 New Business Insights
Self-service BI and more flexible modeling capabilities.
Faster Business Processes.

 The availability of Big Data, low-cost commodity hardware, and new information
management and analytic software has produced a unique moment in the history of
data analysis. The convergence of these trends means that we have the capabilities
required to analyze astonishing data sets quickly and cost-effectively for the first
time in history. These capabilities are neither theoretical nor trivial. They represent
a genuine leap forward and a clear opportunity to realize enormous gains in terms
of efficiency, productivity, revenue, and profitability. The Age of Big Data is here,
and these are truly revolutionary times if both business and technology
professionals continue to work together and deliver on the promise. Promises of
Big Data include innovation, growth and long term sustainability.
 From the results we can analyze the movies and project reports like the best rated,
highest budget and highest collection with in a click.

 https://www.tutorialspoint.com/
 http://hadooptutorials.co.in/tutorials/hadoop/internals-of-hdfs-file-read-
operations.html
 http://www.hadooptpoint.com/hadoop-hive-architecture/
 http://downloads.vmware.com/d/info/desktop_downloads/vmware_workstation/7_0
 http://www.cloudera.com/
 Hadoop: The Definitive Guide -- John White
 Big Data Analytics -- Wiley

Gantt Chart (definition):
Gantt chart is a chart in which a series of horizontal lines shows the amount of work
done or production completed in certain periods of time in relation to the amount
planned for those periods.

Future Work:
In the further process we will be analyzing the datasets which are loaded in the
Hive using Hue or R tool.

Conclusion:
In this project we have loaded large set of datasets in to HDFS using Sqoop and Hive
Further the movie data can be easily analyzed using Hue.

Analysis of historical movie data by BHADRA

Analysis of historical movie data by BHADRA

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (19)

Similaire à Analysis of historical movie data by BHADRA

Similaire à Analysis of historical movie data by BHADRA (20)

Plus de Bhadra Gowdra

Plus de Bhadra Gowdra (9)

Dernier

Dernier (20)

Analysis of historical movie data by BHADRA