SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Abhijit Kumar Behera 
M.Tech (CSE) 
Roll No. 1350001 
School of Computer Engineering 
Guided By : Dr. Laxman Sahoo
Contents 
 Introduction 
 Apache Hadoop related projects 
 Application of Mahout 
 Literature Survey 
 Plan of Action 
 Conclusion 
 References
Introduction 
•The K-means algorithm is one of the most well-known clustering 
algorithms that has been frequently used to variety of problems. 
•MapReduce as the most popular cloud computing parallel 
framework is effective to handle massive data, the researches of K-means 
clustering algorithm which is based on MapReduce 
become a focus for scholars.
Components of Hadoop 
HDFS 
•Name Node 
•Data Node 
•Secondary 
Name Node 
 Map Reduce 
•Map() 
•Combine() 
•Reduce() 
YARN 
•Job Tracker 
•TaskTracker 
HBase
MapReduce Word count process
HBase 
Hadoop 
( HDFS and 
MapReduce) 
Mahout 
Spark 
HIVE 
Zookeeper Sqoop 
PIG 
Apache Hadoop Projects
Application of Mahout 
 Collaborative Filtering 
 Matrix factorization based recommenders 
 A user based Recommender 
 Clustering 
 Canopy Clustering 
 K-Means Clustering 
 Fuzzy K-Means 
 Affinity Propagation Clustering 
 Classification 
 Naive Bayes 
 Random forest classifier
Literature Survey 
An Improved parallel K-means Clustering Algorithm with 
MapReduce 
Authors Name: Qing Liao, Fan Yang, Jingming Zhao 
Journal : Communication Technology (ICCT), IEEE 
Year of Publication:2014 
Parallel K-means Algorithm 
1) Initial 
2) Mapper 
3) Reducer
Literature Survey...
Literature Survey 
Clouds for Scalable Big Data Analytics 
Authors Name: Domenico Talia 
Journal: IEEE Computer Society 
Year of Publication:2013 
In this paper, author describe how cloud comp uting enhance the development and 
functionality of Big Data Analytics when it deployed into it. 
Cloud Service Model Features Users 
Data analytics software as a service A single and complete data mining 
application or task (including data sources) 
offered as a service 
End users, analytics managers, data 
analysts 
Data analytics platform as a service A data analysis suite or framework for 
programming or developing high-level 
applications, hiding the cloud 
infrastructure and data storage 
Data mining application developers, 
data scientists 
Data analytics infrastructure as a 
service 
A set of virtualized resources provided to a 
programmer or data mining researcher for 
developing, configuring, and running data 
analysis frameworks or applications 
Data mining programmers, data 
management developers, data 
mining researchers
Plan of Action 
August - October 2014 Literature survey is done. 
November 2014 
Problem definition formulation is 
done and problem solving outline are 
yet to be done 
December 2014- January 2015 
Find out the appropriate solution of 
the problem yet to be formulated 
February-May 2015 
Final implementation of the solution 
with result yet to be done
Conclusion 
Large-scale data mining has been a new challenge in recent years. 
Using the Map-Reduce frame work the big data analytics can be 
accomplished. The K-means algorithm is one of the most well-known 
clustering algorithms. However, its processing performance 
has usually encountered a bottleneck if being utilized to deal with 
massive data. A parallel K-means algorithm with MapReduce which 
shows obvious advantage is implemented to handle massive data.
References 
[1] Walisa Romsaiyud, Wichian Premchaiswadi, " An Adaptive Machine Learning on Map- 
Reduce Framework for Improving performance of Large-Scale Data Analysis on EC ", 
Eleventh IEEE Int'l Conf. on ICT and knowledge Engineering, 2014 
[2] Domenico Talia," Clouds for Scalable Big Data Analytics ", IEEE Computer Society, 2013 
[3] Feng Ye, Zhijan Wang , "Cloud-based Big Data Mining & Analyzing Services 
Platform integrating R", IEEE International Conference on Advance Cloud and Big Data 
, 2013 
[4].DzApache-Hadoopdz-http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F
M.Tech Student Research on Apache Hadoop Projects and Application of Mahout for Data Clustering

Contenu connexe

Tendances

Starfish-A self tuning system for bigdata analytics
Starfish-A self tuning system for bigdata analyticsStarfish-A self tuning system for bigdata analytics
Starfish-A self tuning system for bigdata analyticssai Pramoda
 
Federated Galaxy: Biomedical Computing at the Frontier
Federated Galaxy: Biomedical Computing at the FrontierFederated Galaxy: Biomedical Computing at the Frontier
Federated Galaxy: Biomedical Computing at the FrontierEnis Afgan
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC ConvergenceGeoffrey Fox
 
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO editCS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO editRichard Haney
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Geoffrey Fox
 
Twister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure CloudTwister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure CloudThilina Gunarathne
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoopdatabloginfo
 
GreenLight Data Collection Architecture
GreenLight Data Collection ArchitectureGreenLight Data Collection Architecture
GreenLight Data Collection ArchitectureJerry Sheehan
 
JPJ1402 A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...
JPJ1402   A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...JPJ1402   A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...
JPJ1402 A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...chennaijp
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningGianvito Siciliano
 
Intel Faster Risk Oct08 - Vassil Alexandrov
Intel Faster Risk Oct08 - Vassil AlexandrovIntel Faster Risk Oct08 - Vassil Alexandrov
Intel Faster Risk Oct08 - Vassil Alexandrovmikeohara
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3Robert Grossman
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
Cost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningCost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningSabidur Rahman
 
Gray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsGray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsATMOSPHERE .
 
CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research IJECEIAES
 
Combining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information servicesCombining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information servicesieeepondy
 

Tendances (19)

11
1111
11
 
Starfish-A self tuning system for bigdata analytics
Starfish-A self tuning system for bigdata analyticsStarfish-A self tuning system for bigdata analytics
Starfish-A self tuning system for bigdata analytics
 
Federated Galaxy: Biomedical Computing at the Frontier
Federated Galaxy: Biomedical Computing at the FrontierFederated Galaxy: Biomedical Computing at the Frontier
Federated Galaxy: Biomedical Computing at the Frontier
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
 
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO editCS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
CS7_HANEY_DataCentricExtremeScaleComputLegion PAO edit
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Twister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure CloudTwister4Azure - Iterative MapReduce for Azure Cloud
Twister4Azure - Iterative MapReduce for Azure Cloud
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
GreenLight Data Collection Architecture
GreenLight Data Collection ArchitectureGreenLight Data Collection Architecture
GreenLight Data Collection Architecture
 
JPJ1402 A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...
JPJ1402   A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...JPJ1402   A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...
JPJ1402 A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine Learning
 
Intel Faster Risk Oct08 - Vassil Alexandrov
Intel Faster Risk Oct08 - Vassil AlexandrovIntel Faster Risk Oct08 - Vassil Alexandrov
Intel Faster Risk Oct08 - Vassil Alexandrov
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Resume
ResumeResume
Resume
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Cost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningCost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learning
 
Gray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsGray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark Applications
 
CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research
 
Combining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information servicesCombining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information services
 

En vedette

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...ijcses
 
Survey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsSurvey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsIAEME Publication
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureDataWorks Summit
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniquesPoonam Kshirsagar
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering AlgorithmLino Possamai
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and TracingAutomation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and TracingMarkus Borg
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Applying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to BusinessApplying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to BusinessRussell Miles
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Big Data Spain
 
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...GeeksLab Odessa
 
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicMachine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicRaúl Garreta
 
Machine Learning -- The Artificial Intelligence Revolution
Machine Learning -- The Artificial Intelligence RevolutionMachine Learning -- The Artificial Intelligence Revolution
Machine Learning -- The Artificial Intelligence RevolutionExtentia Information Technology
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Why You Should Care about Machine Learning And Artificial Intelligence Richar...
Why You Should Care about Machine Learning And Artificial Intelligence Richar...Why You Should Care about Machine Learning And Artificial Intelligence Richar...
Why You Should Care about Machine Learning And Artificial Intelligence Richar...Business of Software Conference
 
Machine learning it is time...
Machine learning it is time...Machine learning it is time...
Machine learning it is time...Sandip Chatterjee
 

En vedette (20)

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...
 
Survey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsSurvey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applications
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and TracingAutomation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
 
Mark Lynch - Importance of Big Data and Analytics for the Insurance Market
Mark Lynch - Importance of Big Data and Analytics for the Insurance MarketMark Lynch - Importance of Big Data and Analytics for the Insurance Market
Mark Lynch - Importance of Big Data and Analytics for the Insurance Market
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Applying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to BusinessApplying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to Business
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
 
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicMachine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
 
Machine Learning -- The Artificial Intelligence Revolution
Machine Learning -- The Artificial Intelligence RevolutionMachine Learning -- The Artificial Intelligence Revolution
Machine Learning -- The Artificial Intelligence Revolution
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Why You Should Care about Machine Learning And Artificial Intelligence Richar...
Why You Should Care about Machine Learning And Artificial Intelligence Richar...Why You Should Care about Machine Learning And Artificial Intelligence Richar...
Why You Should Care about Machine Learning And Artificial Intelligence Richar...
 
Machine learning it is time...
Machine learning it is time...Machine learning it is time...
Machine learning it is time...
 

Similaire à M.Tech Student Research on Apache Hadoop Projects and Application of Mahout for Data Clustering

Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111NavNeet KuMar
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONijcsit
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big DataMrinal Kumar
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportAhmad El Tawil
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxshujee381
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkMahantesh Angadi
 
Vol 10 No 1 - February 2014
Vol 10 No 1 - February 2014Vol 10 No 1 - February 2014
Vol 10 No 1 - February 2014ijcsbi
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 

Similaire à M.Tech Student Research on Apache Hadoop Projects and Application of Mahout for Data Clustering (20)

Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
B017320612
B017320612B017320612
B017320612
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Vol 10 No 1 - February 2014
Vol 10 No 1 - February 2014Vol 10 No 1 - February 2014
Vol 10 No 1 - February 2014
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Resume
ResumeResume
Resume
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 

Dernier

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 

Dernier (20)

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 

M.Tech Student Research on Apache Hadoop Projects and Application of Mahout for Data Clustering

  • 1. Abhijit Kumar Behera M.Tech (CSE) Roll No. 1350001 School of Computer Engineering Guided By : Dr. Laxman Sahoo
  • 2. Contents  Introduction  Apache Hadoop related projects  Application of Mahout  Literature Survey  Plan of Action  Conclusion  References
  • 3. Introduction •The K-means algorithm is one of the most well-known clustering algorithms that has been frequently used to variety of problems. •MapReduce as the most popular cloud computing parallel framework is effective to handle massive data, the researches of K-means clustering algorithm which is based on MapReduce become a focus for scholars.
  • 4. Components of Hadoop HDFS •Name Node •Data Node •Secondary Name Node  Map Reduce •Map() •Combine() •Reduce() YARN •Job Tracker •TaskTracker HBase
  • 6. HBase Hadoop ( HDFS and MapReduce) Mahout Spark HIVE Zookeeper Sqoop PIG Apache Hadoop Projects
  • 7. Application of Mahout  Collaborative Filtering  Matrix factorization based recommenders  A user based Recommender  Clustering  Canopy Clustering  K-Means Clustering  Fuzzy K-Means  Affinity Propagation Clustering  Classification  Naive Bayes  Random forest classifier
  • 8. Literature Survey An Improved parallel K-means Clustering Algorithm with MapReduce Authors Name: Qing Liao, Fan Yang, Jingming Zhao Journal : Communication Technology (ICCT), IEEE Year of Publication:2014 Parallel K-means Algorithm 1) Initial 2) Mapper 3) Reducer
  • 10. Literature Survey Clouds for Scalable Big Data Analytics Authors Name: Domenico Talia Journal: IEEE Computer Society Year of Publication:2013 In this paper, author describe how cloud comp uting enhance the development and functionality of Big Data Analytics when it deployed into it. Cloud Service Model Features Users Data analytics software as a service A single and complete data mining application or task (including data sources) offered as a service End users, analytics managers, data analysts Data analytics platform as a service A data analysis suite or framework for programming or developing high-level applications, hiding the cloud infrastructure and data storage Data mining application developers, data scientists Data analytics infrastructure as a service A set of virtualized resources provided to a programmer or data mining researcher for developing, configuring, and running data analysis frameworks or applications Data mining programmers, data management developers, data mining researchers
  • 11. Plan of Action August - October 2014 Literature survey is done. November 2014 Problem definition formulation is done and problem solving outline are yet to be done December 2014- January 2015 Find out the appropriate solution of the problem yet to be formulated February-May 2015 Final implementation of the solution with result yet to be done
  • 12. Conclusion Large-scale data mining has been a new challenge in recent years. Using the Map-Reduce frame work the big data analytics can be accomplished. The K-means algorithm is one of the most well-known clustering algorithms. However, its processing performance has usually encountered a bottleneck if being utilized to deal with massive data. A parallel K-means algorithm with MapReduce which shows obvious advantage is implemented to handle massive data.
  • 13. References [1] Walisa Romsaiyud, Wichian Premchaiswadi, " An Adaptive Machine Learning on Map- Reduce Framework for Improving performance of Large-Scale Data Analysis on EC ", Eleventh IEEE Int'l Conf. on ICT and knowledge Engineering, 2014 [2] Domenico Talia," Clouds for Scalable Big Data Analytics ", IEEE Computer Society, 2013 [3] Feng Ye, Zhijan Wang , "Cloud-based Big Data Mining & Analyzing Services Platform integrating R", IEEE International Conference on Advance Cloud and Big Data , 2013 [4].DzApache-Hadoopdz-http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F