SlideShare une entreprise Scribd logo
1  sur  3
Télécharger pour lire hors ligne
ISSN 2348-1196 (print)
International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online)
Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com
Page | 159
Research Publish Journals
A Data Aware Caching (Dache) for Big-Data
Applications Using the MapReduce Framework
Utkarsh Honey1
, Yogesh More2
, Prasad Wandhekar3
, Santosh Wayal4
,
Prof. Jayashree Chaudhari5
1, 2, 3, 4, 5
Dr. D Y Patil School of Engineering, Pune, India 411047
Abstract: The big-data refers to the large-scale distributed data processing applications which works on
exceptionally large amounts of data. Google’s MapReduce and Apache’s Hadoop, its open-source implementation,
are the software systems for big-data applications. An observation of the MapReduce framework is that the
framework generates a large amount of intermediate data. MapReduce is unable to utilize such data so they are
thrown after used. We propose Dache, a data-aware cache framework used for big-data applications. In Dache,
tasks submit their intermediate results to the cache manager and queries the cache manager before executing the
actual computing work. A novel cache description system and a cache request and reply protocol aredesigned
Keyword: Big-data, MapReduce, Hadoop, caching.
I. INTRODUCTION
Google MapReduce is a programming model and also a software framework for Big -scale distributed Computing on
large amounts of data. Figure illustrates the high level work flow of a MapReduce Task. Application developers specify
the computation in terms of reduce function and a map and the underlying MapReduce Task scheduling system
automatically parallelizes the computation across the cluster of machines. While MapReduce obtain popularity for its
simple programming interface and excellent Performance when implement a large spectrum of applications. Since most
such applications take a huge amount of input data, they are called as “Big-dataapplications”.
Input data is first divided and then given to workers in the map stage. Every Individual data items are called records. The
MapReduce system parses the input splits to each worker and produces records. After the phase of map, intermediate
results generated in the map phase are shuffled and sorted by the MapReduce system which are then given into the
workers in the reduce phase. Final results are computed by multiple reducers and after it written back to the disk. Hadoop
is an open-source implementation of the Google MapReduce programming model. Hadoop includes of the Hadoop
Common, which provides access to the file systems supported by Hadoop. HDFS (Hadoop Distributed File System)
provides distributed file storage and is optimized for large unchangeable blocks of data. A small Hadoop cluster will
include a single master and multiple worker nodes known as slave. Whereas master node runs multiple processes,
including a Task tracker and a Name Node. The Task tracker is having control for the management of running jobs in the
Hadoop cluster. While the name node manages the HDFS, The Task tracker and the Name Node are primarily collocated
on the same physical machine. The other servers in the cluster run a Task tracker for Data Node processes. A MapReduce
job is divided into number of tasks. Tasks are managed by the Task tracker. The Task trackers and the Data Node are
collated on the same servers to provide data locality. MapReduce provides a standardized framework for implementing
the large-scale distributed computation, known as the big-data applications. Still, there is restriction of the system. i.e. The
inefficiency in incremental processing which refers to the applications that incrementally grow up the input data and
continuously apply computations on this input and produce output. There are potential duplicate computations and
operations are performed in this process. As the MapReduce does not have the any technique which is to identify such
duplicate computations and accelerate the job execution. While Motivating by this observation, in same paper we propose,
a data-aware cache scheme for big data applications using MapReduce framework, which goals at extending the
MapReduce framework and provisioning a cache layer which is used for efficiently identifying and accessing cache items
in a MapReduce job.
ISSN 2348-1196 (print)
International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online)
Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com
Page | 160
Research Publish Journals
II. OBJECTIVE(S) AND SCOPE
The Scope Of The Work Can Be Extended To Following:
 Requires minimum change to the original MapReduce programmingmodel.
 Application code only requires slight changes in order to utilize Dache.
 Implement Dache in Hadoop by extending the relevant components.
 Tested experiments show that it can easily eliminate all the duplicate tasks in incremental MapReducejobs.
 Minimum execution time and CPU utilization
Problem Statement:
In current Hadoop MapReduce framework is that the framework generates a large flow of intermediate data. MapReduce
is unable to save that such data so they are deleted after used. But in our system we introducing the cache memory that
holds the intermediate results in it, because of that the data processing, means job executing processing is faster than old
system, So that system is a time consuming, repetition of data processing arereduced.
Map Reduce Architecture:
Fig 1: Map Reduce Architecture
ISSN 2348-1196 (print)
International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online)
Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com
Page | 161
Research Publish Journals
2.3.1 Operations:
 Clients submit jobs to the Job Tracker.
 Job Tracker talks to the name node.
 Job Tracker creates execution plan.
 Job Tracker submits works to Task tracker.
 Task Trackers report progress via heart beats.
 Job Tracker manages the phases.
 Job Tracker updates the status.
Software & Hardware Requirement:
Software Requirements:
Operating System : WINDOWS XP & ABOVE
Database : My SQL,Hadoop
Language : java
Hardware Requirements:
System : Pentium Iv 2.4 GHz
Hard Disk : 40 GB
Monitor : 15 VGA Color
RAM : 512 Mb
III. CONCLUSIONS
This paper present the design and evaluation of a data aware cache framework that requires minimum changes to the
original MapReduce programming model for provisioning the incremental processing for Big data applications using
MapReduce model. In this paper we propose, a data-aware cache description scheme, architecture and protocol. In this
Paper Presented method only requires a slight modification in the input format processing and task management of
MapReduce framework. As a result, application code requires only slight changes in order to use Data in data aware
caching. This paper implements Hadoop by extending relevant components. In the future, we propose to adapt our
framework is to more general application scenarios and also implement the scheme in the Hadoopproject.
REFERENCES
[1] J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, Communication of ACM, vol.
51, no. 1, pp. 107-113.
[2] Hadoop, http://www.Hadoop.apache.org.
[3] Cache algorithms, http://en.wikipedia.org/wiki/Cache algorithms.
[4] Java programming language, http://www.java.com/.
[5] Google compute engine, http://cloud.google.com/products/computeengine.html
[6] Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters
http://labs.google.com/papers/mapreduce.html
[7] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System http://labs.google.com/
papers/gfs.html

Contenu connexe

Tendances

Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analyticsAvinash Pandu
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analyticsAvinash Pandu
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...CitiusTech
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on webcsandit
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsbarbie0909
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 

Tendances (17)

Unit 1
Unit 1Unit 1
Unit 1
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
Cppt
CpptCppt
Cppt
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
IJET-V2I6P25
IJET-V2I6P25IJET-V2I6P25
IJET-V2I6P25
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 

Similaire à A data aware caching 2415

Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System cscpconf
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERLOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERijdpsjournal
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...cscpconf
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111NavNeet KuMar
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Modeinventionjournals
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combinersijcsit
 
Ijircce publish this paper
Ijircce publish this paperIjircce publish this paper
Ijircce publish this paperSANTOSH WAYAL
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar ReportAtul Kushwaha
 

Similaire à A data aware caching 2415 (20)

B017320612
B017320612B017320612
B017320612
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERLOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
IJET-V3I2P24
IJET-V3I2P24IJET-V3I2P24
IJET-V3I2P24
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Hadoop ppt2
Hadoop ppt2Hadoop ppt2
Hadoop ppt2
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
Ijircce publish this paper
Ijircce publish this paperIjircce publish this paper
Ijircce publish this paper
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 

Dernier

S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxnuruddin69
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 

Dernier (20)

S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 

A data aware caching 2415

  • 1. ISSN 2348-1196 (print) International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online) Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com Page | 159 Research Publish Journals A Data Aware Caching (Dache) for Big-Data Applications Using the MapReduce Framework Utkarsh Honey1 , Yogesh More2 , Prasad Wandhekar3 , Santosh Wayal4 , Prof. Jayashree Chaudhari5 1, 2, 3, 4, 5 Dr. D Y Patil School of Engineering, Pune, India 411047 Abstract: The big-data refers to the large-scale distributed data processing applications which works on exceptionally large amounts of data. Google’s MapReduce and Apache’s Hadoop, its open-source implementation, are the software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. MapReduce is unable to utilize such data so they are thrown after used. We propose Dache, a data-aware cache framework used for big-data applications. In Dache, tasks submit their intermediate results to the cache manager and queries the cache manager before executing the actual computing work. A novel cache description system and a cache request and reply protocol aredesigned Keyword: Big-data, MapReduce, Hadoop, caching. I. INTRODUCTION Google MapReduce is a programming model and also a software framework for Big -scale distributed Computing on large amounts of data. Figure illustrates the high level work flow of a MapReduce Task. Application developers specify the computation in terms of reduce function and a map and the underlying MapReduce Task scheduling system automatically parallelizes the computation across the cluster of machines. While MapReduce obtain popularity for its simple programming interface and excellent Performance when implement a large spectrum of applications. Since most such applications take a huge amount of input data, they are called as “Big-dataapplications”. Input data is first divided and then given to workers in the map stage. Every Individual data items are called records. The MapReduce system parses the input splits to each worker and produces records. After the phase of map, intermediate results generated in the map phase are shuffled and sorted by the MapReduce system which are then given into the workers in the reduce phase. Final results are computed by multiple reducers and after it written back to the disk. Hadoop is an open-source implementation of the Google MapReduce programming model. Hadoop includes of the Hadoop Common, which provides access to the file systems supported by Hadoop. HDFS (Hadoop Distributed File System) provides distributed file storage and is optimized for large unchangeable blocks of data. A small Hadoop cluster will include a single master and multiple worker nodes known as slave. Whereas master node runs multiple processes, including a Task tracker and a Name Node. The Task tracker is having control for the management of running jobs in the Hadoop cluster. While the name node manages the HDFS, The Task tracker and the Name Node are primarily collocated on the same physical machine. The other servers in the cluster run a Task tracker for Data Node processes. A MapReduce job is divided into number of tasks. Tasks are managed by the Task tracker. The Task trackers and the Data Node are collated on the same servers to provide data locality. MapReduce provides a standardized framework for implementing the large-scale distributed computation, known as the big-data applications. Still, there is restriction of the system. i.e. The inefficiency in incremental processing which refers to the applications that incrementally grow up the input data and continuously apply computations on this input and produce output. There are potential duplicate computations and operations are performed in this process. As the MapReduce does not have the any technique which is to identify such duplicate computations and accelerate the job execution. While Motivating by this observation, in same paper we propose, a data-aware cache scheme for big data applications using MapReduce framework, which goals at extending the MapReduce framework and provisioning a cache layer which is used for efficiently identifying and accessing cache items in a MapReduce job.
  • 2. ISSN 2348-1196 (print) International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online) Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com Page | 160 Research Publish Journals II. OBJECTIVE(S) AND SCOPE The Scope Of The Work Can Be Extended To Following:  Requires minimum change to the original MapReduce programmingmodel.  Application code only requires slight changes in order to utilize Dache.  Implement Dache in Hadoop by extending the relevant components.  Tested experiments show that it can easily eliminate all the duplicate tasks in incremental MapReducejobs.  Minimum execution time and CPU utilization Problem Statement: In current Hadoop MapReduce framework is that the framework generates a large flow of intermediate data. MapReduce is unable to save that such data so they are deleted after used. But in our system we introducing the cache memory that holds the intermediate results in it, because of that the data processing, means job executing processing is faster than old system, So that system is a time consuming, repetition of data processing arereduced. Map Reduce Architecture: Fig 1: Map Reduce Architecture
  • 3. ISSN 2348-1196 (print) International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online) Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com Page | 161 Research Publish Journals 2.3.1 Operations:  Clients submit jobs to the Job Tracker.  Job Tracker talks to the name node.  Job Tracker creates execution plan.  Job Tracker submits works to Task tracker.  Task Trackers report progress via heart beats.  Job Tracker manages the phases.  Job Tracker updates the status. Software & Hardware Requirement: Software Requirements: Operating System : WINDOWS XP & ABOVE Database : My SQL,Hadoop Language : java Hardware Requirements: System : Pentium Iv 2.4 GHz Hard Disk : 40 GB Monitor : 15 VGA Color RAM : 512 Mb III. CONCLUSIONS This paper present the design and evaluation of a data aware cache framework that requires minimum changes to the original MapReduce programming model for provisioning the incremental processing for Big data applications using MapReduce model. In this paper we propose, a data-aware cache description scheme, architecture and protocol. In this Paper Presented method only requires a slight modification in the input format processing and task management of MapReduce framework. As a result, application code requires only slight changes in order to use Data in data aware caching. This paper implements Hadoop by extending relevant components. In the future, we propose to adapt our framework is to more general application scenarios and also implement the scheme in the Hadoopproject. REFERENCES [1] J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, Communication of ACM, vol. 51, no. 1, pp. 107-113. [2] Hadoop, http://www.Hadoop.apache.org. [3] Cache algorithms, http://en.wikipedia.org/wiki/Cache algorithms. [4] Java programming language, http://www.java.com/. [5] Google compute engine, http://cloud.google.com/products/computeengine.html [6] Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters http://labs.google.com/papers/mapreduce.html [7] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System http://labs.google.com/ papers/gfs.html