SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME

TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 5, September – October (2013), pp. 109-114
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com

IJCET
©IAEME

DYNAMIC DATA REPLICATION AND JOB SCHEDULING BASED ON
POPULARITY AND CATEGORY
Priya Deshpande1, Brijesh Khundhawala2, Prasanna Joeg3
1

Assistant Professor, MITCOE, Pune
2
ME-Student, MITCOE, Pune
3
Professor MITCOE, Pune

ABSTRACT
Dealing with a huge amount of data puts the requirement for efficient data access more
critical in data grids. Improving data access time is a one way of reducing the job execution time i.e.
improving performance. To speed up the data access and reduce bandwidth consumption, data grids
replicate data in multiple locations. This paper studies a new data replication strategy in data grid,
which takes into account two important issues concerning replication: storage capability of different
nodes and bandwidth consumption between nodes. It also considers the popularity of the file for
replacement. Lesser popular files get less priority then the higher popular file. We also need to
consider the limitation on storage. We can optimize the performance by putting the file as much
close to client as possible. Our algorithm optimizes the replication with taking in to consideration
popularity of the file, limited storage and category of the file.
Keywords: Date Replication, Job Scheduling, Replica Strategy
I. INTRODUCTION
Large scale geographically distributed system are becoming very much popular in dataintensive applications, most importantly scientific applications. Life Sciences, astrophysics and
bioinformatics research communities are deploying Grid Systems to process large amounts of
datasets and which are stored at geographically dispersed locations. Millions of files are generated
regularly which goes beyond the amount in terabytes. The volume of interesting data is measured in
terabytes and will become in petabytes in short time because the development of technology and the
ability of research are growing fast [13]. There is really a great need to ensure efficient access to
such huge and widely dispersed data in a data grid. In Data Grid, performance is majorly influenced
109
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME

by the data locality [1]. Data replication is a widely known method used to improve the performance
of data access in distributed systems. By creating replicas we can efficiently reduce the bandwidth
consumption and access latency. In particular increasing the data read performance from the
perspective of clients is the motif of the data replication algorithm. Replication is a mechanism for
creating and managing multiple copies of Files. Replica management service can be viewed as
composed of following activities: creating new replica(s), registering these new replicas in a Replica
Catalog and querying the catalog to find the location of the respective replica(s).The replication
mechanism includes three main subjects: which file should be replicated, when to replicate and
where to replicate. For improving the job execution time we have tried to consider job scheduling,
too. First of all we are trying to put the data in the grid category wise, i.e. data with same category
are placed as much close as possible. Then while replica replacement we replace the required data
with the least popular file which is decided based on the access frequency of the file. We are trying
to place the job as much close as possible to the required data. So, overall performances can be
improved.
II. RELATED WORKS
In the Grid Computing environment Data replication and Scheduling is primary concern for
performance optimization. Replica selection, Replica placement and Replica replacement has always
been very much crucial for the performance. Replica placement should be done in such a way that
there should be minimum file transfer time for job execution. Replica replacement has some
strategies like LRU and LFU. There are many researches going on in these areas.
EDGSim[2], a simulation implemented by the European Data Grid project, was designed to
simulate the performance of European Data Grid but was focused on the scheduling algorithm
optimization. Data location is important but no replication was considered. While, Gridnet[3] aims to
address replication of data. It proposed a dynamic replication algorithm and memory middleware that
was evaluated to improve the data access time.
The importance of data locality was first described by K. Rangnathan[4]. It suggested
replication strategies to reduce network bandwidth and access delay. Our system architecture is
similar to proposed in it few changes. H.Sato et al. [5] proposed a file replication algorithm that
improved simple replication methods by taking into consideration network capacity and file access
pattern. Similarly, R.S. Chang et al. [6] proposed the Latest Access Largest Weight(LALW) method,
which used data access history by applying a greater weight to a more recent access in data
replication.
In [9], a decentralized architecture for adaptive media dissemination was proposed. They
assumed that the popularity of the datasets satisfies the Zipf Distribution. Author defined the replica
weight based on popularity.
In [7] Dynamic Optimal Replication Strategy is proposed which is based on the File’s Access
History, Network Status and File’s Size. Performances show that it works better then LRU and LFU.
In [8] Dynamic strategy is proposed which tracks changes in the data access patterns and then applies
the relevant tradition replication strategies like LRU and LFU best for the data access pattern.
In our paper we have proposed strategy taking into account the File Access History to
determine popularity, Category of the Data and Location of job to be executed, which definitely
gives a hope for a better performance. Rest Of the paper is structured as follows: Section 3 proposes
System Architecture for the strategy, Section 4 specifies steps for dynamic replication and section 5
defines the scheduling strategy. Section 6 gives conclusion and Section 7 suggests references.

110
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME

III. SYSTEM ARCHITECTURE
System Architecture for our grid model is shown in Fig. 1
Components of our architecture are as follows:
• LS: Local Scheduler of Grid Site.
• DS: Dataset/ Data Scheduler of Grid site.
• RC: Replica Catalogue stores list of all replicas on the grid.
• CE: Computing Element: Each grid site contains 0 or more CEs for its computing capability
• SE: Storage element: Each site contains 0 or more Storage elements representing its storage
capacity.
• JB: Job Broker which receives jobs from users and submits it to appropriate grid site.
• Replication manager: A centralized server that stores replication information of the system. It
contains Active replicator to perform replication for the system. It is better to have a
decentralized RM.

Figure 1 System Architecture [12]
Let’s have a brief how process goes on. After every predefined interval, the replica manager
collects data usage information of the environment. The interval selected should not be too much
large as the information collected should be fresh. Equally interval should not be too much small.
Because if this happens too much rapidly then it increases bandwidth usage and system processing
power equally. Then it decides based on the data which files to replicate based on the strategy going
to be defined in section 3. In that it considers both distance and relation of the data to each other.
Jobs from various clients will be submitted to Job Broker. We can assume job broker as a
name node of a hadoop[11]. Job broker decides to which machine a job should be assigned to. In our
model LS and DS works in parallel. When LS executes job, DS will find the data required in the
local machine and in other machines one step ahead. So time is saved and system utilization is
increased. This is shown in detail in section 4.
IV. DYNAMIC REPLICATION
Here it is assumed that the data in Data grids belongs to a field of research, e.g. Biology,
Chemical, Meteorology, Medical, etc. [12] they are the first level of a hierarchical tree. Splitting
them further down, we can divide the biology to cell biology, molecular of biology, cell technology,
proteomics, etc. We can split this category further down. The reason behind this assumption is that
111
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME

data in one category is rarely or never used in another category. By doing so, we can form the
hierarchical tree of relationships between data of different categories. Each data entry is in one
category and has a close deal with data entries in the same category rather than other categories.
Because the replication takes place before the job execution it is better to put the replica nearer to the
site which frequently uses it. If we gather the data which has a high probability of getting used,
performance will be definitely increased. Our idea is to muster the data that are highly related to each
other into small regions so that the job which uses that data will be scheduled to run in that region.
As discussed previously main issues of the replication are:
• Which data to be replicated?
• Where to put the new replica?
Following sections answer to these issues:
A. REPLICA DECISION
In Order to decide which file needs to be copied we find popularity of the files and based on
that we choose the file. In the actual usage, data access patterns change over time, so any dynamic
replication strategy must keep track of file access histories to decide on when, what and where to
replicate. The “popularity” of the file is determined by finding out its access rate by various
clients/users. Thus to find out the popular file is the key and first step of our strategy. Here, it is
assumed that the recently popular file will be accessed more frequently in the near future. This
popularity record is maintained by every replication server. For replica decision data category is also
important. Replica decision will be made according to category of the data. Relevant data will be
placed together. Each unique file is assigned a unique identifier (FID). After regular interval our
algorithm is invoked to find out the popularity of files. Access history logs are cleared at the
beginning of each replication interval to capture the current access pattern dynamics. The interval is
chosen based on the arrival rate of data requests. Short interval will be chosen for high data requests
and vice versa. Interval is adopted dynamically. Data access for each unique file is aggregated and
summarized and Number of Access NOA (f) is stored in the server. Then the average amount of data
accessed is calculated and any file that has more data access then average amount, it needs to be
replicated. We are going to replicate the chosen file only if the number of replicas of the chosen file
is less than the threshold value. Threshold value can be decided by the following equation:
R=q/w
R is the relative capacity of the whole system. q is the sum of all node’s capacity and w is the
total size of all the files in the data grid.
B. REPLICA PLACEMENT
As stated above, our strategy tries to put the replica as much as close to the category it
belongs, so that the job belongs to that category will be executed nearby which in turn reduces the
time for file transfer at the time of job execution. For example, in an organization if we put data
related to HR department as much close as possible then it will be faster for a job to fetch all the data
required. We can even place job by taking into account its category. In the same manner we can put
data for different departments by considering their category. To put the files closer we find out the
distance. Distance is the time required to transfer the file from one node to other. So distance should
be as lower as possible. But for the replicas of the same file distance should be as greater as possible.
So, the two replicas of the same file don’t come in to same region. To chose a site to place the newly
created replica we evaluate the Distance for all the sites for selected file. The site which offers lowest
distance will be chosen to store a new replica. If data store is lesser than the required then we will

112
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME

again use the popularity of the files on the target site to find out the least popular files. Which will be
deleted and then new replica will be stored.
V. SCHEDULING STRATEGY
As discussed above we try to put the job as much closer to category which it belongs. Job
broker makes use of both the DS- Data Scheduler and LS- Local Scheduler to optimize the job
execution. Job broker will calculate the estimated time taken by all the sites and then will chose the
site which has minimum estimated time.
Estimated required time = ሼࡰࢀሺ࢐ሻ ൅ ࡽࢀሺ࢐ሻ ൅ ࡱࢀሽ
DT: Time required transferring the data from other nodes to site where job is being executed.
QT: Queuing Time
ET: Time required executing the job. ...... [12]
After this process, job is assigned to site with minimum Estimated Time. In the tradition
systems, firstly all the data needed is gathered and then only job’s execution is started. Now here in
our strategy Data fetching and job execution will be done in parallel. Local Scheduler will fetch the
files required for job execution on the local site and will put them in the queue as per their turn for
usage. Files which are not available will be brought to current site by Data Scheduler. When the job
is executing DS will fetch and bring the file to local site. For example, if the site is executing first
task then DS will try to bring the files needed for the second task of the job at the same time. If the
file required is yet not arrived while needed CE will wait. As soon as the file arrives it will resume its
execution. So this strategy minimizes the time required to execute the job.
Job Execution:
Receive Job(J);
CreateThread(LS)
{
Receivedata(d);
ExecuteJob();
}
CreateThread(DS)
{
Data d =FetchData();
SendData(d);
}
Return Result:
113
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME

VI. CONCLUSION
In this paper, we proposed dynamic optimal strategy which first calculates the popularity of
the files based on data access history. And then the most popular file is taken in to consideration.
Then of the number of replicas is less than the threshold value then replica is placed on the most
appropriate node based on the file’s category. Job execution is also suggested to improve the
performance. Traditional replication strategies don’t react to current status, so they are not as much
effective as dynamic replication strategies. But still there are many areas need to be considered for
the improvement of performance in the Data Grid Environment. More parameters needs to be
considered in future as Grid sizes are increasing drastically and complication are increased.
VII. REFERENCES
[1]
[2]
[3]

[4]

[5]
[6]
[7]
[8]
[9]
[10]

[11]

[12]

[13]

[14]

Foster, I., The grid: A new infrastructure for 21st century science. Physics Today. V55. 42-47,
2002, John Wiley & Sons.
P.Crosby.EDGSim.http://www.hep.ucl.ac.uk/~pac/EDGSim/
H. Lamehamedi, et al., Simulation of Dynamic Data Replication Strategies in Data Grids. In
Proc.Of 12th Heterogeneous Computing Workshop (HCW2003), Nice, France, Apr
2003.IEEE-CS Press.
K. Rangnathan, I. Foster, “Design and Evaluation of Dynamic Replication Strategies for a
High- Performance Data Grid”, International Conference on Conference on computing in
High Energy and Nuclear Physics, 2001.
H. Sato, et al., “Access-Pattern and Bandwidth Aware File Replication Algorithm in a Grid
Environment”, International Conference on Grid Computing, pp. 250-257, 2008.
R.S. Chang, H.p. Chang, “A Dynamic Data Replication Strategy Using Access-Weights in
Data Gtids” supercomputing, Vol. 45 No 3, pp. 277-295,2008.
Wquing Zhao, XianbinXu, Zhuowei Wang, Yuping Zhang, Shuibing He, “A Dynamic
Optimal Replication Strategy in Data Grid Environment”, @ 2010 IEEE.
MyunghoonJeon, Kwang-Ho Lim, Hyun Ahn, Byoung-Dai Lee, “Dynamic Data Replication
Scheme in cloud Computing Environment”, @2012 IEEE.
PhillippeCudre-Mauroux, and Karl Aberer, “A Decentralized Architecture for Adaptive
Media Dissemination”, ICME’-2 Proceedings, 2002, pp. 533-536.
Mohammad Shorfuzzaman, Peter Graham and RAsitEskicioglu, ”Popularity Driven Dynamic
Replica Placement in Hierarchical Data Grids”, 2008 Ninth international Conference on
Parallel and Distributed Computing, Applications and Technologies.
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo!, Sunnyvale,
California USA, {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com, ”The Hadoop
Distributed File System”
Nhan Nguyen Dang, Soonwook Hwang, Sang Boem Lim*,”Improvement of Data Grid’s
Performance by Combining Job Scheduling with Dynamic Replication Strategy”,@2007 The
Sixth International Conference on Grid and Cooperative Computing
A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke, “The Data Grid:
Towards an Architecture for the Distributed Management and Analysis of Large Scientific
Datasets,” Journal of Network and Computer Application,vol. 23, pages 187-200, 2000.
M. Pushpalatha, T. Ramarao, Revathi Venkataraman and Sorna Lakshmi, “Mobility Aware
Data Replication using Minimum Dominating Set in Mobile Ad Hoc Networks”,
International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 2,
2012, pp. 645 - 658, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

114

Contenu connexe

Tendances

Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
Alexander Decker
 

Tendances (17)

T AXONOMY OF O PTIMIZATION A PPROACHES OF R ESOURCE B ROKERS IN D ATA G RIDS
T AXONOMY OF  O PTIMIZATION  A PPROACHES OF R ESOURCE B ROKERS IN  D ATA  G RIDST AXONOMY OF  O PTIMIZATION  A PPROACHES OF R ESOURCE B ROKERS IN  D ATA  G RIDS
T AXONOMY OF O PTIMIZATION A PPROACHES OF R ESOURCE B ROKERS IN D ATA G RIDS
 
Grid computing for load balancing strategies
Grid computing for load balancing strategiesGrid computing for load balancing strategies
Grid computing for load balancing strategies
 
Use of genetic algorithm for
Use of genetic algorithm forUse of genetic algorithm for
Use of genetic algorithm for
 
Frequency and similarity aware partitioning for cloud storage based on space ...
Frequency and similarity aware partitioning for cloud storage based on space ...Frequency and similarity aware partitioning for cloud storage based on space ...
Frequency and similarity aware partitioning for cloud storage based on space ...
 
Hm2413291336
Hm2413291336Hm2413291336
Hm2413291336
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
 
Towards a low cost etl system
Towards a low cost etl systemTowards a low cost etl system
Towards a low cost etl system
 
IJSRED-V2I3P84
IJSRED-V2I3P84IJSRED-V2I3P84
IJSRED-V2I3P84
 
Fn3110961103
Fn3110961103Fn3110961103
Fn3110961103
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
 
An experimental evaluation of performance
An experimental evaluation of performanceAn experimental evaluation of performance
An experimental evaluation of performance
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Propose a Method to Improve Performance in Grid Environment, Using Multi-Crit...
Propose a Method to Improve Performance in Grid Environment, Using Multi-Crit...Propose a Method to Improve Performance in Grid Environment, Using Multi-Crit...
Propose a Method to Improve Performance in Grid Environment, Using Multi-Crit...
 
Task Scheduling methodology in cloud computing
Task Scheduling methodology in cloud computing Task Scheduling methodology in cloud computing
Task Scheduling methodology in cloud computing
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 

En vedette (6)

Dynamic selection of cluster head in in networks for energy management
Dynamic selection of cluster head in in networks for energy managementDynamic selection of cluster head in in networks for energy management
Dynamic selection of cluster head in in networks for energy management
 
Attacksonmobileadhocnetworks 120420092725-phpapp01
Attacksonmobileadhocnetworks 120420092725-phpapp01Attacksonmobileadhocnetworks 120420092725-phpapp01
Attacksonmobileadhocnetworks 120420092725-phpapp01
 
Revista Presencia Divina Volumen 5
Revista Presencia Divina Volumen 5Revista Presencia Divina Volumen 5
Revista Presencia Divina Volumen 5
 
Assets endowment determinant factor for stakeholder mobilization and retentio...
Assets endowment determinant factor for stakeholder mobilization and retentio...Assets endowment determinant factor for stakeholder mobilization and retentio...
Assets endowment determinant factor for stakeholder mobilization and retentio...
 
Iphone
IphoneIphone
Iphone
 
Bc2419681971
Bc2419681971Bc2419681971
Bc2419681971
 

Similaire à 50120130405014 2-3

An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...
Alexander Decker
 
Overlapped clustering approach for maximizing the service reliability of
Overlapped clustering approach for maximizing the service reliability ofOverlapped clustering approach for maximizing the service reliability of
Overlapped clustering approach for maximizing the service reliability of
IAEME Publication
 
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
1crore projects
 
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
1crore projects
 

Similaire à 50120130405014 2-3 (20)

REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTINGREPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
 
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTINGREPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
 
Data reduction techniques to analyze nsl kdd dataset
Data reduction techniques to analyze nsl kdd datasetData reduction techniques to analyze nsl kdd dataset
Data reduction techniques to analyze nsl kdd dataset
 
An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...
 
Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
Survey on Synchronizing File Operations Along with Storage Scalable MechanismSurvey on Synchronizing File Operations Along with Storage Scalable Mechanism
Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
 
An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...
 
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUPEVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
 
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
 
Overlapped clustering approach for maximizing the service reliability of
Overlapped clustering approach for maximizing the service reliability ofOverlapped clustering approach for maximizing the service reliability of
Overlapped clustering approach for maximizing the service reliability of
 
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
 
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENTQUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
 
ADAPTER
ADAPTERADAPTER
ADAPTER
 
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
 
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
 
B036407011
B036407011B036407011
B036407011
 
50120140504001
5012014050400150120140504001
50120140504001
 
Secure and Efficient Client and Server Side Data Deduplication to Reduce Stor...
Secure and Efficient Client and Server Side Data Deduplication to Reduce Stor...Secure and Efficient Client and Server Side Data Deduplication to Reduce Stor...
Secure and Efficient Client and Server Side Data Deduplication to Reduce Stor...
 

Plus de IAEME Publication

A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
IAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
IAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
IAEME Publication
 

Plus de IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

50120130405014 2-3

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 5, September – October (2013), pp. 109-114 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET ©IAEME DYNAMIC DATA REPLICATION AND JOB SCHEDULING BASED ON POPULARITY AND CATEGORY Priya Deshpande1, Brijesh Khundhawala2, Prasanna Joeg3 1 Assistant Professor, MITCOE, Pune 2 ME-Student, MITCOE, Pune 3 Professor MITCOE, Pune ABSTRACT Dealing with a huge amount of data puts the requirement for efficient data access more critical in data grids. Improving data access time is a one way of reducing the job execution time i.e. improving performance. To speed up the data access and reduce bandwidth consumption, data grids replicate data in multiple locations. This paper studies a new data replication strategy in data grid, which takes into account two important issues concerning replication: storage capability of different nodes and bandwidth consumption between nodes. It also considers the popularity of the file for replacement. Lesser popular files get less priority then the higher popular file. We also need to consider the limitation on storage. We can optimize the performance by putting the file as much close to client as possible. Our algorithm optimizes the replication with taking in to consideration popularity of the file, limited storage and category of the file. Keywords: Date Replication, Job Scheduling, Replica Strategy I. INTRODUCTION Large scale geographically distributed system are becoming very much popular in dataintensive applications, most importantly scientific applications. Life Sciences, astrophysics and bioinformatics research communities are deploying Grid Systems to process large amounts of datasets and which are stored at geographically dispersed locations. Millions of files are generated regularly which goes beyond the amount in terabytes. The volume of interesting data is measured in terabytes and will become in petabytes in short time because the development of technology and the ability of research are growing fast [13]. There is really a great need to ensure efficient access to such huge and widely dispersed data in a data grid. In Data Grid, performance is majorly influenced 109
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME by the data locality [1]. Data replication is a widely known method used to improve the performance of data access in distributed systems. By creating replicas we can efficiently reduce the bandwidth consumption and access latency. In particular increasing the data read performance from the perspective of clients is the motif of the data replication algorithm. Replication is a mechanism for creating and managing multiple copies of Files. Replica management service can be viewed as composed of following activities: creating new replica(s), registering these new replicas in a Replica Catalog and querying the catalog to find the location of the respective replica(s).The replication mechanism includes three main subjects: which file should be replicated, when to replicate and where to replicate. For improving the job execution time we have tried to consider job scheduling, too. First of all we are trying to put the data in the grid category wise, i.e. data with same category are placed as much close as possible. Then while replica replacement we replace the required data with the least popular file which is decided based on the access frequency of the file. We are trying to place the job as much close as possible to the required data. So, overall performances can be improved. II. RELATED WORKS In the Grid Computing environment Data replication and Scheduling is primary concern for performance optimization. Replica selection, Replica placement and Replica replacement has always been very much crucial for the performance. Replica placement should be done in such a way that there should be minimum file transfer time for job execution. Replica replacement has some strategies like LRU and LFU. There are many researches going on in these areas. EDGSim[2], a simulation implemented by the European Data Grid project, was designed to simulate the performance of European Data Grid but was focused on the scheduling algorithm optimization. Data location is important but no replication was considered. While, Gridnet[3] aims to address replication of data. It proposed a dynamic replication algorithm and memory middleware that was evaluated to improve the data access time. The importance of data locality was first described by K. Rangnathan[4]. It suggested replication strategies to reduce network bandwidth and access delay. Our system architecture is similar to proposed in it few changes. H.Sato et al. [5] proposed a file replication algorithm that improved simple replication methods by taking into consideration network capacity and file access pattern. Similarly, R.S. Chang et al. [6] proposed the Latest Access Largest Weight(LALW) method, which used data access history by applying a greater weight to a more recent access in data replication. In [9], a decentralized architecture for adaptive media dissemination was proposed. They assumed that the popularity of the datasets satisfies the Zipf Distribution. Author defined the replica weight based on popularity. In [7] Dynamic Optimal Replication Strategy is proposed which is based on the File’s Access History, Network Status and File’s Size. Performances show that it works better then LRU and LFU. In [8] Dynamic strategy is proposed which tracks changes in the data access patterns and then applies the relevant tradition replication strategies like LRU and LFU best for the data access pattern. In our paper we have proposed strategy taking into account the File Access History to determine popularity, Category of the Data and Location of job to be executed, which definitely gives a hope for a better performance. Rest Of the paper is structured as follows: Section 3 proposes System Architecture for the strategy, Section 4 specifies steps for dynamic replication and section 5 defines the scheduling strategy. Section 6 gives conclusion and Section 7 suggests references. 110
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME III. SYSTEM ARCHITECTURE System Architecture for our grid model is shown in Fig. 1 Components of our architecture are as follows: • LS: Local Scheduler of Grid Site. • DS: Dataset/ Data Scheduler of Grid site. • RC: Replica Catalogue stores list of all replicas on the grid. • CE: Computing Element: Each grid site contains 0 or more CEs for its computing capability • SE: Storage element: Each site contains 0 or more Storage elements representing its storage capacity. • JB: Job Broker which receives jobs from users and submits it to appropriate grid site. • Replication manager: A centralized server that stores replication information of the system. It contains Active replicator to perform replication for the system. It is better to have a decentralized RM. Figure 1 System Architecture [12] Let’s have a brief how process goes on. After every predefined interval, the replica manager collects data usage information of the environment. The interval selected should not be too much large as the information collected should be fresh. Equally interval should not be too much small. Because if this happens too much rapidly then it increases bandwidth usage and system processing power equally. Then it decides based on the data which files to replicate based on the strategy going to be defined in section 3. In that it considers both distance and relation of the data to each other. Jobs from various clients will be submitted to Job Broker. We can assume job broker as a name node of a hadoop[11]. Job broker decides to which machine a job should be assigned to. In our model LS and DS works in parallel. When LS executes job, DS will find the data required in the local machine and in other machines one step ahead. So time is saved and system utilization is increased. This is shown in detail in section 4. IV. DYNAMIC REPLICATION Here it is assumed that the data in Data grids belongs to a field of research, e.g. Biology, Chemical, Meteorology, Medical, etc. [12] they are the first level of a hierarchical tree. Splitting them further down, we can divide the biology to cell biology, molecular of biology, cell technology, proteomics, etc. We can split this category further down. The reason behind this assumption is that 111
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME data in one category is rarely or never used in another category. By doing so, we can form the hierarchical tree of relationships between data of different categories. Each data entry is in one category and has a close deal with data entries in the same category rather than other categories. Because the replication takes place before the job execution it is better to put the replica nearer to the site which frequently uses it. If we gather the data which has a high probability of getting used, performance will be definitely increased. Our idea is to muster the data that are highly related to each other into small regions so that the job which uses that data will be scheduled to run in that region. As discussed previously main issues of the replication are: • Which data to be replicated? • Where to put the new replica? Following sections answer to these issues: A. REPLICA DECISION In Order to decide which file needs to be copied we find popularity of the files and based on that we choose the file. In the actual usage, data access patterns change over time, so any dynamic replication strategy must keep track of file access histories to decide on when, what and where to replicate. The “popularity” of the file is determined by finding out its access rate by various clients/users. Thus to find out the popular file is the key and first step of our strategy. Here, it is assumed that the recently popular file will be accessed more frequently in the near future. This popularity record is maintained by every replication server. For replica decision data category is also important. Replica decision will be made according to category of the data. Relevant data will be placed together. Each unique file is assigned a unique identifier (FID). After regular interval our algorithm is invoked to find out the popularity of files. Access history logs are cleared at the beginning of each replication interval to capture the current access pattern dynamics. The interval is chosen based on the arrival rate of data requests. Short interval will be chosen for high data requests and vice versa. Interval is adopted dynamically. Data access for each unique file is aggregated and summarized and Number of Access NOA (f) is stored in the server. Then the average amount of data accessed is calculated and any file that has more data access then average amount, it needs to be replicated. We are going to replicate the chosen file only if the number of replicas of the chosen file is less than the threshold value. Threshold value can be decided by the following equation: R=q/w R is the relative capacity of the whole system. q is the sum of all node’s capacity and w is the total size of all the files in the data grid. B. REPLICA PLACEMENT As stated above, our strategy tries to put the replica as much as close to the category it belongs, so that the job belongs to that category will be executed nearby which in turn reduces the time for file transfer at the time of job execution. For example, in an organization if we put data related to HR department as much close as possible then it will be faster for a job to fetch all the data required. We can even place job by taking into account its category. In the same manner we can put data for different departments by considering their category. To put the files closer we find out the distance. Distance is the time required to transfer the file from one node to other. So distance should be as lower as possible. But for the replicas of the same file distance should be as greater as possible. So, the two replicas of the same file don’t come in to same region. To chose a site to place the newly created replica we evaluate the Distance for all the sites for selected file. The site which offers lowest distance will be chosen to store a new replica. If data store is lesser than the required then we will 112
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME again use the popularity of the files on the target site to find out the least popular files. Which will be deleted and then new replica will be stored. V. SCHEDULING STRATEGY As discussed above we try to put the job as much closer to category which it belongs. Job broker makes use of both the DS- Data Scheduler and LS- Local Scheduler to optimize the job execution. Job broker will calculate the estimated time taken by all the sites and then will chose the site which has minimum estimated time. Estimated required time = ሼࡰࢀሺ࢐ሻ ൅ ࡽࢀሺ࢐ሻ ൅ ࡱࢀሽ DT: Time required transferring the data from other nodes to site where job is being executed. QT: Queuing Time ET: Time required executing the job. ...... [12] After this process, job is assigned to site with minimum Estimated Time. In the tradition systems, firstly all the data needed is gathered and then only job’s execution is started. Now here in our strategy Data fetching and job execution will be done in parallel. Local Scheduler will fetch the files required for job execution on the local site and will put them in the queue as per their turn for usage. Files which are not available will be brought to current site by Data Scheduler. When the job is executing DS will fetch and bring the file to local site. For example, if the site is executing first task then DS will try to bring the files needed for the second task of the job at the same time. If the file required is yet not arrived while needed CE will wait. As soon as the file arrives it will resume its execution. So this strategy minimizes the time required to execute the job. Job Execution: Receive Job(J); CreateThread(LS) { Receivedata(d); ExecuteJob(); } CreateThread(DS) { Data d =FetchData(); SendData(d); } Return Result: 113
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 5, September - October (2013), © IAEME VI. CONCLUSION In this paper, we proposed dynamic optimal strategy which first calculates the popularity of the files based on data access history. And then the most popular file is taken in to consideration. Then of the number of replicas is less than the threshold value then replica is placed on the most appropriate node based on the file’s category. Job execution is also suggested to improve the performance. Traditional replication strategies don’t react to current status, so they are not as much effective as dynamic replication strategies. But still there are many areas need to be considered for the improvement of performance in the Data Grid Environment. More parameters needs to be considered in future as Grid sizes are increasing drastically and complication are increased. VII. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] Foster, I., The grid: A new infrastructure for 21st century science. Physics Today. V55. 42-47, 2002, John Wiley & Sons. P.Crosby.EDGSim.http://www.hep.ucl.ac.uk/~pac/EDGSim/ H. Lamehamedi, et al., Simulation of Dynamic Data Replication Strategies in Data Grids. In Proc.Of 12th Heterogeneous Computing Workshop (HCW2003), Nice, France, Apr 2003.IEEE-CS Press. K. Rangnathan, I. Foster, “Design and Evaluation of Dynamic Replication Strategies for a High- Performance Data Grid”, International Conference on Conference on computing in High Energy and Nuclear Physics, 2001. H. Sato, et al., “Access-Pattern and Bandwidth Aware File Replication Algorithm in a Grid Environment”, International Conference on Grid Computing, pp. 250-257, 2008. R.S. Chang, H.p. Chang, “A Dynamic Data Replication Strategy Using Access-Weights in Data Gtids” supercomputing, Vol. 45 No 3, pp. 277-295,2008. Wquing Zhao, XianbinXu, Zhuowei Wang, Yuping Zhang, Shuibing He, “A Dynamic Optimal Replication Strategy in Data Grid Environment”, @ 2010 IEEE. MyunghoonJeon, Kwang-Ho Lim, Hyun Ahn, Byoung-Dai Lee, “Dynamic Data Replication Scheme in cloud Computing Environment”, @2012 IEEE. PhillippeCudre-Mauroux, and Karl Aberer, “A Decentralized Architecture for Adaptive Media Dissemination”, ICME’-2 Proceedings, 2002, pp. 533-536. Mohammad Shorfuzzaman, Peter Graham and RAsitEskicioglu, ”Popularity Driven Dynamic Replica Placement in Hierarchical Data Grids”, 2008 Ninth international Conference on Parallel and Distributed Computing, Applications and Technologies. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo!, Sunnyvale, California USA, {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com, ”The Hadoop Distributed File System” Nhan Nguyen Dang, Soonwook Hwang, Sang Boem Lim*,”Improvement of Data Grid’s Performance by Combining Job Scheduling with Dynamic Replication Strategy”,@2007 The Sixth International Conference on Grid and Cooperative Computing A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke, “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets,” Journal of Network and Computer Application,vol. 23, pages 187-200, 2000. M. Pushpalatha, T. Ramarao, Revathi Venkataraman and Sorna Lakshmi, “Mobility Aware Data Replication using Minimum Dominating Set in Mobile Ad Hoc Networks”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 645 - 658, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. 114