This document discusses using MapReduce and Apache Hadoop for large-scale data mining and analytics. It describes several Apache Hadoop projects like HDFS, MapReduce, HBase and Mahout. It discusses using Mahout for tasks like clustering, classification and recommendation. The document reviews literature on parallel K-means clustering with MapReduce and using clouds for scalable big data analytics. It outlines a plan to study parallel K-means clustering and implement a solution to handle large datasets.
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
M.Tech Student Research on Apache Hadoop Projects and Application of Mahout for Data Clustering
1. Abhijit Kumar Behera
M.Tech (CSE)
Roll No. 1350001
School of Computer Engineering
Guided By : Dr. Laxman Sahoo
2. Contents
Introduction
Apache Hadoop related projects
Application of Mahout
Literature Survey
Plan of Action
Conclusion
References
3. Introduction
•The K-means algorithm is one of the most well-known clustering
algorithms that has been frequently used to variety of problems.
•MapReduce as the most popular cloud computing parallel
framework is effective to handle massive data, the researches of K-means
clustering algorithm which is based on MapReduce
become a focus for scholars.
10. Literature Survey
Clouds for Scalable Big Data Analytics
Authors Name: Domenico Talia
Journal: IEEE Computer Society
Year of Publication:2013
In this paper, author describe how cloud comp uting enhance the development and
functionality of Big Data Analytics when it deployed into it.
Cloud Service Model Features Users
Data analytics software as a service A single and complete data mining
application or task (including data sources)
offered as a service
End users, analytics managers, data
analysts
Data analytics platform as a service A data analysis suite or framework for
programming or developing high-level
applications, hiding the cloud
infrastructure and data storage
Data mining application developers,
data scientists
Data analytics infrastructure as a
service
A set of virtualized resources provided to a
programmer or data mining researcher for
developing, configuring, and running data
analysis frameworks or applications
Data mining programmers, data
management developers, data
mining researchers
11. Plan of Action
August - October 2014 Literature survey is done.
November 2014
Problem definition formulation is
done and problem solving outline are
yet to be done
December 2014- January 2015
Find out the appropriate solution of
the problem yet to be formulated
February-May 2015
Final implementation of the solution
with result yet to be done
12. Conclusion
Large-scale data mining has been a new challenge in recent years.
Using the Map-Reduce frame work the big data analytics can be
accomplished. The K-means algorithm is one of the most well-known
clustering algorithms. However, its processing performance
has usually encountered a bottleneck if being utilized to deal with
massive data. A parallel K-means algorithm with MapReduce which
shows obvious advantage is implemented to handle massive data.
13. References
[1] Walisa Romsaiyud, Wichian Premchaiswadi, " An Adaptive Machine Learning on Map-
Reduce Framework for Improving performance of Large-Scale Data Analysis on EC ",
Eleventh IEEE Int'l Conf. on ICT and knowledge Engineering, 2014
[2] Domenico Talia," Clouds for Scalable Big Data Analytics ", IEEE Computer Society, 2013
[3] Feng Ye, Zhijan Wang , "Cloud-based Big Data Mining & Analyzing Services
Platform integrating R", IEEE International Conference on Advance Cloud and Big Data
, 2013
[4].DzApache-Hadoopdz-http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F