SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
REPLICATION STRATEGY BASED ON DATA
RELATIONSHIP IN GRID COMPUTING
Yuhanis Yusof
School of Computing, Universiti Utara Malaysia,Malaysia
yuhanis@uum.edu.my

ABSTRACT
This study discusses the utilization of three types of relationships in performing data replication. As grid
computing offers the ability of sharing huge amount of resources, resource availability is an important issue
to be addressed. The undertaken approach combines the viewpoint of user, system and the grid itself in
ensuring resource availability. The realization of the proposed strategy is demonstrated via OptorSim and
evaluation is made based on execution time, storage usage, network bandwidth and computing element
usage. Results suggested that the proposed strategy produces a better outcome than an existing method even
though various job workload is introduced.

KEYWORDS
Data Grid, Data Replication, Grid Computing

1. INTRODUCTION
Over a number of recent years, the grid has become progressive information technology trend that
enables high performance computing for scientific applications. Such a technology offers researchers the availability of powerful resources which allows them to broaden their simulations
and experiments. As the grid infrastructure progress, issues are shifted towards resource management. This is realized in the Data grid where huge amount of data enables grid applications to
share data files in a coordinated manner. Such an approach is seen to provide fast, reliable and
transparent data access. Nevertheless, Data Grid creates a challenging problem in a grid environment because the volume of data to be shared is large despite the limited storage space and network bandwidth [1, 2]. Furthermore, resources involved are heterogeneous as they belong to
different administrative domains in a distributed environment. Operationally, it is infeasible for
various users to access the same data (e.g. a data file) from one single organization (e.g. site).
Such situation would lead to the increase of data access latency.
Motivated by these considerations, a commonly strategy used in distributed system is also employed in Data Grid, that is replication. Experience from distributed system design shows that
replication promotes high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability. In grid environment, replication is one of the major factors affecting performance of Data Grids [3]. With this, it is suggested that well-defined replication strategies will smooth data access, and reduce job execution cost [4]. However, replication isbounded
by two factors: the size of storage available at different sites within the Data Grid and the bandwidth between these sites [5]. Furthermore, the files in a Data Grid are mostly large [6, 7]; so,
replication to every site and hosting unlimited number of replicas would be unfeasible. Hence, we
need to carefully decide which file that requires replication. We propose a relationship based replication that integrates the viewpoint of three parties; user, system and grid environment. Existing
study either identify the required resource based solely on users’ perspective , i.e. number of
Sundarapandian et al. (Eds) : ICAITA, SAI, SEAS, CDKP, CMCA-2013
pp. 379–386, 2013. © CS & IT-CSCP 2013

DOI : 10.5121/csit.2013.3831
Computer Science & Information Technology (CS & IT)

380

access on the file [8, 9], or based on system’s perspective, i.e. storage cost and read cost of a file
[10-12]. As a result, there will be an insufficient utilizing of storage resource space, which in turn
will lead to less storage availability. According to [13], less storage availability would lead to
longer job execution time and larger network usage because only fewer replicas can be accommodated in the Data Grid, and most files will be read remotely.
The rest of this paper is structured as follows. Section 2 provides a brief description on existing
work in data replication in the data grid. We include the details of our proposed replication strategy in Section 3 and the performance evaluation is presented in Section 4. Finally, we summarize
the study in Section 5.

2. BACKGROUND
The replication algorithm proposed in [4] determines popularity of a file by analyzing data access
history. The researcher believes that the popular data in the past will remain popular in the near
future. Having analyzed data access history, the average number of access, NOA, is computed.
Files with NOA’s value that is greater than the computer average NOA will be replicated. Hence,
the order of which files to be replicated depends on the NOA. The larger the NOA, the more
popular the file is and will be given a higher priority during the replication process.
Nevertheless, such an approach did not consider time period of when the files were accessed. If a file was accessed for a number of times in the past, while none was made recently,
the file would still be considered popular and hence will be replicated. The algorithm proposed in [8] called Last Access Largest Weight (LALW) tries to solve this problem. The key
point of LALW is to give different weights to files having different age. The LALW algorithm is similar to other algorithms [4] by means of using information on access history to determine popularity of a file. But the innovation is included by adding a tag to each access history
record of a file.
The work in [14] suggested a model that helps to determine number of replicas needed to maintain the desired availability in P2P communities. With this, each site within the Data Grid is authorized to create replicas for the files. The availability of a file depends on the failure rate of
peers in the network. However such a model has its own disadvantage: the exact number of replicas is not determined; rather it depends on the location service accuracy which depends on the
existing number of replicas. The accuracy of the replica location service determines the percentage of accessible files, and thus if the location service is ineffective, more replicas are created to
ensure data availability.On the other hand, the work discussed in [9] proposed a replication strategy that makes replication decisions whether to increase number of replicas to face the high volume of requests, or to reduce the number of replicas to save more storage space. Evidently, increasing the number of replicas will decrease the response time, but the storage cost will be increased accordingly [9].

3. METHOD
In a data grid, when a resource (e.g a data file) is required by a job and is not available on a local
storage, it may either be replicated or read remotely. If a file has been replicated, in the future,
when it is requested, any job can accessed it quickly and the job execution time can be reduced.
Due to the limited storage capacity, replication decision should be made to conform users’ needs
so that high demanded files (popular replicas) are efficiently maintain and files that are rarely
utilized are removed. Our strategy (known as Relationship-based Replication, RBR) is designed
by utilizing three types of relationships:
381

Computer Science & Information Technology (CS & IT)

1) File-to-user (F2U) [15] - behavior of a file being requested by users, and notes the change
to this request(whether is a growth or decay change). The relationship is represented using the exponential model. The F2U provides us with the ‫( ݁݉݅ݐ݂݁݅ܮ݈݁݅ܨ‬FL),
2) File-to-file relationship (F2F) [15] - behavior of a file requesting other files and is noted
by ‫ܹ݈݃݅݁݁݅ܨ‬ℎ‫( ݐ‬FW)
3) File-to-grid (F2G) - lifetime of a file in the grid system and is represented by ‫݁݃ܣ ݈݁݅ܨ‬
(FA) .
Hence, the work presented in this study determines the importance of a resource (i.edata file) by
combining information from users (F2U), file (F2F) and the grid (F2G) itself. The File Value is
computed as in Eq. 1.

‫݁ݑ݈ܸ݈ܽ݁݅ܨ‬ሺ‫݂ ,ݐ‬ሻ =

‫݁݉݅ݐ݂݁݅ܮ݈݁݅ܨ‬ሺ‫݂ ,ݐ‬ሻ + ‫ܹ݈݃݅݁݁݅ܨ‬ℎ‫ݐ‬ሺ‫݂ ,ݐ‬ሻ
‫݁݃ܽ ݈݁݅ܨ‬ሺ‫݂ ,ݐ‬ሻ

The ‫ܹ݈݃݅݁݁݅ܨ ,݁݉݅ݐ݂݁݅ܮ݈݁݅ܨ‬ℎ‫ ݐ‬and ‫ ݁݃ܣ ݈݁݅ܨ‬are used to compute the ‫( ݁ݑ݈ܸܽ ݈݁݅ܨ‬FV) that is
used as an indicator for the volume of demand for a data file. The larger the value of FV, the
more important the file is to the grid system. Hence, it will then be replicated.
The operation of ‫ ݁݉݅ݐ݂݁݅ܮ݈݁݅ܨ‬and ‫ܹ݈݃݅݁݁݅ܨ‬ℎ‫ ݐ‬are provided in the work reported by Madi[15].
௧
௧ାଵ
If we use ܰ௙ to represent the number of accesses for file݂ at time ‫ ,ݐ‬and ܰ௙ to represent the
number of accesses at time t + 1, the exponential growth/decay model would be given by:
௧ାଵ
௧
ܰ௙ = ܰ௙ × ሺ1 + ‫ݎ‬ሻ

where r is the growth or decay rate in number of accesses of a file in one time interval. Therefore,
the value of r using the following formula can be calculated:
௧ାଵ
௧
‫ = ݎ‬൫ܰ௙ ൗܰ௙ ൯ − 1

௧
Assume ‫ ݐ‬is the number of passed intervals, and ܰ௙ indicates the number of access for the file ݂
at time interval ‫ ,ݐ‬then we get the sequence of access numbers:
଴ ଵ ଶ ଷ
௧ିଵ ௧
ܰ௙ ܰ௙ ܰ௙ ܰ௙ . … ܰ௙ ܰ௙

Therefore, there are ‫ 1 − ݐ‬time intervals, and each time interval has a growth or decay rate in
number of accesses of a file. So according to the exponential growth/decay model, the equation
can be written as in the following:
ଵ
଴
‫ݎ‬଴ = ൫ܰ௙ ൗܰ௙ ൯ − 1,

ଶ
ଵ
‫ݎ‬ଵ = ൫ܰ௙ ൗܰ௙ ൯ − 1,

ଷ
ଶ
‫ݎ‬ଶ = ൫ܰ௙ ൗܰ௙ ൯ − 1,

௧
௧ିଵ
‫ݎ‬௧ିଵ = ൫ܰ௙ ൗܰ௙ ൯ − 1

Therefore the average rate for all intervals is:
௧ିଵ

‫ = ݎ‬෍ ‫ݎ‬௜ ൘‫1 − ݐ‬
଴
Computer Science & Information Technology (CS & IT)

382

Having known the average accessed rate (growth or decay) for a file during the past intervals, the
number of access for the upcoming time interval can be estimated, which is termed as the File
Lifetime [15] :
௧
‫ܰ = ݁݉݅ݐ݂݁݅ܮ ݈݁݅ܨ‬௙ × ሺ1 + ‫ݎ‬ሻ

On the other hand, the File Weight as described in [15] is as follows:
௡

‫ܹ݃݅݁ ݈݁݅ܨ‬ℎ‫ = ݐ‬෍ ‫ܮܨ‬௜ × ‫ܮܦ‬௜
௜ୀଵ

where, ݊: total number of files in a grid system, ‫ :ܮܨ‬File Lifetime, and ‫ :ܮܦ‬dependency level of
other files on the underlying file, and if there is no dependency, DL is assumed to be zero.
Finally, the age of a file can be calculated as the time a filebeing included in the grid until the
current time. Hence, it is as follows:
‫݁݉݅ܶ = ݁݃ܣ ݈݁݅ܨ‬௖௨௥௥௘௡௧ − ܶ݅݉݁௔௧௧௔௖௛

4. EXPERIMENTS
In this research, the OptorSim[16-18] simulator was utilized to simulate the proposed replication
strategy. The main idea of OptorSim is when given a grid topology, resources, and a set of jobs
and optimization strategy, it can simulate data movement around these job runs and supply information on various factors that could be used to evaluate the performance of the optimization
strategy.
System scalability can be tested by the number of jobs running during the simulation. In this paper, to simulate different number of jobs, number of jobs that is considered in our evaluation varies between 200 and 4000 jobs. The evaluation is based on the following metrics:
Mean Job Execution Time
This is defined as the average time required executing a job starting from the time it is scheduled
to the Computing Element until it complete processing all of the required files. It is calculated by
accumulating the time taken by each job and divided by the number of jobs [8, 16, 17, 19-21], as
shown in the following formula:
MJET =

∑ ୘ీ౛౦౗౨౪౫౨౛ ି୘ఽ౨౨౟౬౛
୬

where,
ܶ୅୰୰୧୴ୣ : start time of job execution,
ܶ஽௘௣௔௥௧௨௥௘ : completion time of job execution, and
݊: total number of processed jobs in the simulation.
Efficient Network Usage (ENU)
ENU is a measure of how well the replication strategy uses the network [17]. It is computed as:
ܰ୰ୣ୫୭୲ୣ ϐ୧୪ୣ ୟୡୡୣୱୱ + ܰ௥௘௣௟௜௖௔௧௜௢௡௦
‫= ܷܰܧ‬
ܰ୰ୣ୫୭୲ୣ ϐ୧୪ୣ ୟୡୡୣୱୱ + ܰ௟௢௖௔௟ ௙௜௟௘ ௔௖௖௘௦௦
where ܰ୰ୣ୫୭୲ୣ ϐ୧୪ୣ ୟୡୡୣୱୱ is the number of accesses that Computing Element reads a file from
a remote site, ܰ௥௘௣௟௜௖௔௧௜௢௡௦ is the total number of file replication that occurs, and
383

Computer Science & Information Technology (CS & IT)

ሺܰ୰ୣ୫୭୲ୣ ϐ୧୪ୣ ୟୡୡୣୱୱ + ܰ௟௢௖௔௟௙௜௟௘௔௖௖௘௦௦ ሻ is the number of times that Computing Element reads a file
from a remote site or reads a file locally. A lower value indicates that the utilization of network
bandwidth is more efficient.
Storage Element Usage
The average of all storage reserve capacity in a data grid can reflect the total system storage cost
[16, 17]. The Average Storage Usage [22] metric is computed by the following equation [17]:
ASU =

౑
ి

∑౤ ሺୱ୧୲ୣ౟ ሻ
౟సభ
୒

where,

× 100%

ܷ: storage usage space that is reserved by the data files,
ܰ: number of sites in the data grid, and
‫ :ܥ‬total capacity of the storage medium.
Computing Element Usage (CE Usage)
This is defined as the percentage of time that a CE is active (transferring or processing data) during the simulation. The CE usage of the whole grid is computed by aggregating the CE usage of
each individual CE[17].

5. RESULTS
The results depicted in Table 1 show a linear increase in the MJET as the number of jobs on the
grid increases. This is because, as more jobs are submitted, the queue at the sites increases. If the
job submission rate is higher than the grid’s job processing rate, this build-up of queues is inevitable. Hence, a preferred replication strategy is a strategy that has less MJET. As shown in Table
1, for MJET, the RBR is the best among existing algorithms. Utilizing the RBR, the mean job
execution time is reduced and is noted to better by 24.25% over LALW. Referring to the Average
Storage Usage, by using RBR, the storage usage is reduced by outperforming LALW, 19.55%.
Table 1. : Simulation Results
Number of Jobs

RBR

MJET

3931

3545

37.87

31.92

ASU

34.13

27.73

CEU
500

LALW

ENU

200

Metrics

22.15

23.41

MJET

7839

7566

ENU

36.88

30.06

ASU

35.71

28.78

CEU

25.91

27.15
Computer Science & Information Technology (CS & IT)
1000

384
12311

34.25

26.88

ASU

37.12

29.97

CEU

30.25

34.27

MJET

54133

50361

ENU

32.45

24.36

ASU

38.63

30.54

CEU

25.74

33.83

MJET

104129

103396

ENU

30.89

22.73

ASU

40.11

31.54

CEU

4000

16241

ENU

2000

MJET

28.11

33.27

On the other hand, results of Efficient Network Usage (ENU) show a slight linear decrease as
number of jobs on the grid increases. This is because at the start of the simulation the queues are
short, but they build up while the files are replicated in the grid. Once the replication process has
established, the execution time are reduced and the queue is shorten. The ENU gradually decreases with the increment in number of jobs because the amount of replication decreases over time.
The RBR uses the lowest amount of network resources for the tested number of jobs because it is
able to make better decision in deciding file that requires replication.
Looking at the Computing Element Usage (CEU) metric, it can be seen that the CEU generally
grows as the number of jobs increases, reflecting the heavy workload. However, there is an obvious drop between 1000 and 2000, this is because with a higher number of jobs, the scheduling
algorithm is sending most of the extra jobs to a few sites from where the data are easily accessible, leading to more uneven distribution of jobs around the grid. The same trend, although less
marked, there is a slight drop seen with RBR. This indicates that RBR made a good balance in the
grid as it is able to determine the appropriate data files that require replication. Having the required data file (replicas) at the closest site reduces the computing element usage.

6. CONCLUSION
In this paper, we presented a replication strategy that incorporates three relationships in determining the importance of a file. In the simulation experiments, different scenarios of job workload
were considered in order to evaluate the proposed relationship based replication, RBR. Results
showed that by integrating information on F2U, F2F and F2G, a better result was obtained as
compared to an existing approach of LALW. This may suggest that the proposed strategy could
be a possible approach in data replication.
385

Computer Science & Information Technology (CS & IT)

ACKNOWLEDGEMENT
The author would like to thank Mohammed Madi for his assistant in completing the experiment.
Also, author expresses the gratitude to Universiti Utara Malaysia for providing the financial and
management support under the High Impact Individual research grant (s/o 12155).

REFERENCES
[1]
[2]

[3]
[4]
[5]
[6]
[7]
[8]

[9]
[10]

[11]

[12]

[13]
[14]

[15]
[16]

[17]

[18]

[19]

B. Wilkinson, Grid computing: techniques and applications: Chapman & Hall/CRC, 2009.
C. Nicholson, D. G. Cameron, A. T. Doyle, A. P. Millar, and K. Stockinger, "Dynamic data
replication in lcg 2008," Concurrency and Computation: Practice and Experience, vol. 20, pp. 12591271, 2008.
X. You, G. Chang, X. Chen, C. Tian, and C. Zhu, "Utility-Based Replication Strategies in Data
Grids," in Fifth International Conference on Grid and Cooperative Computing, 2006, pp. 500-507.
M. Tang, B. S. Lee, X. Tang, and C. K. Yeo, "The impact of data replication on job scheduling
performance in the Data Grid," Future Generation Computer Systems, vol. 22, pp. 254-268, 2006.
S. Venugopal, R. Buyya, and K. Ramamohanarao, "A taxonomy of data grids for distributed data
sharing, management, and processing," ACM Computing Surveys (CSUR), vol. 38, p. 3, 2006.
R. M. Rahman, K. Barker, and R. Alhajj, "Replica placement strategies in data grid," Journal of Grid
Computing, vol. 6, pp. 103-123, 2008.
R. M. Rahman, K. Barker, and R. Alhajj, "Performance evaluation of different replica placement
algorithms," International Journal of Grid and Utility Computing, vol. 1, pp. 121-133, 2009.
C. Ruay-Shiung, C. Hui-Ping, and W. Yun-Ting, "A dynamic weighted data replication strategy in
data grids," in AICCSA 2008: Proceedings of IEEE/ACS International Conference on computer
systems and applications, 2008, pp. 414-421.
H. H. E. Al Mistarihi and C. H. Yong, "Replica management in data grid," International Journal of
Computer Science and Network Security IJCSNS, vol. 8, p. 22, 2008.
L. Yi-Fang, L. Pangfeng, and W. Jan-Jan, "Optimal placement of replicas in data grid environments
with locality assurance," in Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International
Conference on, 2006, p. 8.
L. Pangfeng and W. Jan-Jan, "Optimal replica placement strategy for hierarchical data grid systems,"
in Cluster Computing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on,
2006, p. 4 pp.
Y. Mansouri, M. Garmehi, M. Sargolzaei, and M. Shadi, "Optimal Number of Replicas in Data Grid
Environment," in First International Conference on Distributed Framework and Applications, 2008.
DFmA 2008. , 2008, pp. 96-101.
David G. Cameron, "Replica management and optimisation for data grids," PhD. Thesis, University
of Glasgow, 2005.
K. Ranganathan, A. Iamnitchi, and I. Foster, "Improving data availability through dynamic modeldriven replication in large peer-to-peer communities," in Global and Peer-to-Peer Computing on
Large Scale Distributed Systems Workshop, 2002, pp. 376–381.
M. Madi, "Replica Creation Algorithm for Data Grid," PhD, School of Computing, UUM College of
Arts and Sciences, Universiti Utara Malaysia, Sintok, 2012.
D. G. Cameron, A. P. Millar, C. Nicholson, R. Carvajal-Schiaffino, K. Stockinger, and F. Zini,
"Analysis of scheduling and replica optimisation strategies for data grids using OptorSim," Journal of
Grid Computing, vol. 2, pp. 57-69, 2004.
W. H. Bell, D. G. Cameron, A. P. Millar, L. Capozza, K. Stockinger, and F. Zini, "Optorsim: A grid
simulator for studying dynamic data replication strategies," International Journal of High
Performance Computing Applications, vol. 17, pp. 403-416, 2003.
D. G. Cameron, R. Carvajal-Schiaffino, A. P. Millar, C. Nicholson, K. Stockinger, and F. Zini,
"Evaluating scheduling and replica optimisation strategies in OptorSim," Journal of Grid Computing,
pp. 57-69, March 2004.
W. H. Bell, D. G. Cameron, L. Capozza, P. Millar, K. Stockinger, and F. Zini, "Simulation of
Dynamic Grid Replication Strategies in OptorSim," Journal of High Performance Computing
Applications, vol. 17, 2003.
Computer Science & Information Technology (CS & IT)

386

[20] F. Ben Charrada, H. Ounelli, and H. Chettaoui, "An Efficient Replication Strategy for Dynamic Data
Grids," in Proceedings of International Conference on P2P, Parallel, Grid, Cloud and Internet
Computing (3PGCIC),, 2010, pp. 50-54.
,
50
[21] D. G. Cameron, R. Carvajal-Schiaffino, A. P. Millar, C. Nicholson, K. Stockinger, and F. Zini, "UK
Schiaffino,
Millar,
grid simulation with OptorSim," in Proceedings of UK e-Science All Hands Meeting Nottingham,
Science
Meeting,
UK, 2003.
[22] S. Sivasubramanian, M. Szymaniak, G. Pierre, and M. Van Steen, "Replication for web hosting
systems," ACM Computing Surveys (CSUR), vol. 36, pp. 291-334, 2004.

AUTHOR
Yuhanis Yusof is a senior lecturer in School of Computing, Universiti Utara Malaysia
(UUM), Malaysia. She obtained her PhD in 2007 in the area of Computer Science from
Cardiff University, UK. She also holds MSc degree in Computer Science from UniversitiUniversit
Sains Malaysia and Bachelor of Information Technology from UUM.
Her research interest is broadly in data analysis and management for large scale compucomp
ting. This includes computational intelligence, data mining (discovering patterns of interest from data), data
warehousing, informationretrieval and grid computing. Her current focus is in the area of nature
ing,
nature-inspired
computing, in particularly the Artificial Bee Colony and Firefly algorithm. Such algorithms are later reare
lized in various domains such financial analysis and grid computing.

Contenu connexe

Tendances

Distributeddatabasesforchallengednet
DistributeddatabasesforchallengednetDistributeddatabasesforchallengednet
Distributeddatabasesforchallengednet
Vinoth Chandar
 
A general weighted_fuzzy_clustering_algorithm
A general weighted_fuzzy_clustering_algorithmA general weighted_fuzzy_clustering_algorithm
A general weighted_fuzzy_clustering_algorithm
TA Minh Thuy
 
A Framework for Design of Partially Replicated Distributed Database Systems w...
A Framework for Design of Partially Replicated Distributed Database Systems w...A Framework for Design of Partially Replicated Distributed Database Systems w...
A Framework for Design of Partially Replicated Distributed Database Systems w...
ijdms
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in Clouds
IJERA Editor
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
raj_vij
 
Dynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File SystemsDynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File Systems
IJERA Editor
 

Tendances (17)

Distributeddatabasesforchallengednet
DistributeddatabasesforchallengednetDistributeddatabasesforchallengednet
Distributeddatabasesforchallengednet
 
Analyse the performance of mobile peer to Peer network using ant colony optim...
Analyse the performance of mobile peer to Peer network using ant colony optim...Analyse the performance of mobile peer to Peer network using ant colony optim...
Analyse the performance of mobile peer to Peer network using ant colony optim...
 
A general weighted_fuzzy_clustering_algorithm
A general weighted_fuzzy_clustering_algorithmA general weighted_fuzzy_clustering_algorithm
A general weighted_fuzzy_clustering_algorithm
 
A Framework for Design of Partially Replicated Distributed Database Systems w...
A Framework for Design of Partially Replicated Distributed Database Systems w...A Framework for Design of Partially Replicated Distributed Database Systems w...
A Framework for Design of Partially Replicated Distributed Database Systems w...
 
ANALYSE THE PERFORMANCE OF MOBILE PEER TO PEER NETWORK USING ANT COLONY OPTIM...
ANALYSE THE PERFORMANCE OF MOBILE PEER TO PEER NETWORK USING ANT COLONY OPTIM...ANALYSE THE PERFORMANCE OF MOBILE PEER TO PEER NETWORK USING ANT COLONY OPTIM...
ANALYSE THE PERFORMANCE OF MOBILE PEER TO PEER NETWORK USING ANT COLONY OPTIM...
 
Exploration of Call Transcripts with MapReduce and Zipf’s Law
Exploration of Call Transcripts with MapReduce and Zipf’s LawExploration of Call Transcripts with MapReduce and Zipf’s Law
Exploration of Call Transcripts with MapReduce and Zipf’s Law
 
A PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASES
A PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASESA PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASES
A PERMISSION BASED TREE-STRUCTURED APPROACH FOR REPLICATED DATABASES
 
Architecture and Evaluation on Cooperative Caching In Wireless P2P
Architecture and Evaluation on Cooperative Caching In Wireless  P2PArchitecture and Evaluation on Cooperative Caching In Wireless  P2P
Architecture and Evaluation on Cooperative Caching In Wireless P2P
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in Clouds
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
Data Mining for XML Query-Answering Support
Data Mining for XML Query-Answering SupportData Mining for XML Query-Answering Support
Data Mining for XML Query-Answering Support
 
Advance DBMS
Advance DBMSAdvance DBMS
Advance DBMS
 
Dynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File SystemsDynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File Systems
 
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENT
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENTDYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENT
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENT
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
 
A novel cache resolution technique for cooperative caching in wireless mobile...
A novel cache resolution technique for cooperative caching in wireless mobile...A novel cache resolution technique for cooperative caching in wireless mobile...
A novel cache resolution technique for cooperative caching in wireless mobile...
 

En vedette

Jim Alvino Interview of Dr Amit Goswami Part 2 Transcript
Jim Alvino Interview of Dr Amit Goswami Part 2 TranscriptJim Alvino Interview of Dr Amit Goswami Part 2 Transcript
Jim Alvino Interview of Dr Amit Goswami Part 2 Transcript
Dr Jim Alvino
 
Wartefolie webinar agep
Wartefolie webinar agepWartefolie webinar agep
Wartefolie webinar agep
bfnd
 
la hiperactividad por Raquel Galindo
la hiperactividad por Raquel Galindola hiperactividad por Raquel Galindo
la hiperactividad por Raquel Galindo
GalindoKaren
 
Posturas filosoficas
Posturas filosoficasPosturas filosoficas
Posturas filosoficas
pirulino000
 
The best vacations2
The best vacations2The best vacations2
The best vacations2
Coconolizo
 
Presentación1 014
Presentación1 014Presentación1 014
Presentación1 014
nidacriollo
 
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
csandit
 
PERFORMANCE COMPARISON BETWEEN THE ORIGINAL FORMS OF BIOGEOGRAPHY-BASED OPTIM...
PERFORMANCE COMPARISON BETWEEN THE ORIGINAL FORMS OF BIOGEOGRAPHY-BASED OPTIM...PERFORMANCE COMPARISON BETWEEN THE ORIGINAL FORMS OF BIOGEOGRAPHY-BASED OPTIM...
PERFORMANCE COMPARISON BETWEEN THE ORIGINAL FORMS OF BIOGEOGRAPHY-BASED OPTIM...
csandit
 
Tipos de memoria ram 1
Tipos de memoria ram 1Tipos de memoria ram 1
Tipos de memoria ram 1
Carito2205
 

En vedette (20)

REAL TIME DROWSY DRIVER DETECTION USING HAARCASCADE SAMPLES
REAL TIME DROWSY DRIVER DETECTION USING HAARCASCADE SAMPLESREAL TIME DROWSY DRIVER DETECTION USING HAARCASCADE SAMPLES
REAL TIME DROWSY DRIVER DETECTION USING HAARCASCADE SAMPLES
 
Film Magazine Analysis
Film Magazine AnalysisFilm Magazine Analysis
Film Magazine Analysis
 
Revision 4
Revision 4Revision 4
Revision 4
 
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLSSBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
 
Jim Alvino Interview of Dr Amit Goswami Part 2 Transcript
Jim Alvino Interview of Dr Amit Goswami Part 2 TranscriptJim Alvino Interview of Dr Amit Goswami Part 2 Transcript
Jim Alvino Interview of Dr Amit Goswami Part 2 Transcript
 
Fotografía Digital
Fotografía DigitalFotografía Digital
Fotografía Digital
 
Wartefolie webinar agep
Wartefolie webinar agepWartefolie webinar agep
Wartefolie webinar agep
 
Stage 9 uploaded
Stage 9   uploadedStage 9   uploaded
Stage 9 uploaded
 
Resolución 3681 de 2013- Requerimientos Censo de pacientes con Enfermedades H...
Resolución 3681 de 2013- Requerimientos Censo de pacientes con Enfermedades H...Resolución 3681 de 2013- Requerimientos Censo de pacientes con Enfermedades H...
Resolución 3681 de 2013- Requerimientos Censo de pacientes con Enfermedades H...
 
la hiperactividad por Raquel Galindo
la hiperactividad por Raquel Galindola hiperactividad por Raquel Galindo
la hiperactividad por Raquel Galindo
 
Caratula inf.
Caratula inf.Caratula inf.
Caratula inf.
 
Posturas filosoficas
Posturas filosoficasPosturas filosoficas
Posturas filosoficas
 
The best vacations2
The best vacations2The best vacations2
The best vacations2
 
Presentación1 014
Presentación1 014Presentación1 014
Presentación1 014
 
Staff seminar sxsw
Staff seminar   sxswStaff seminar   sxsw
Staff seminar sxsw
 
VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...
VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...
VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...
 
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
THE RELATIONSHIP BETWEEN THE USE OF BLACKBERRY WITH THE STUDENTS’ DEMAND FULF...
 
PERFORMANCE COMPARISON BETWEEN THE ORIGINAL FORMS OF BIOGEOGRAPHY-BASED OPTIM...
PERFORMANCE COMPARISON BETWEEN THE ORIGINAL FORMS OF BIOGEOGRAPHY-BASED OPTIM...PERFORMANCE COMPARISON BETWEEN THE ORIGINAL FORMS OF BIOGEOGRAPHY-BASED OPTIM...
PERFORMANCE COMPARISON BETWEEN THE ORIGINAL FORMS OF BIOGEOGRAPHY-BASED OPTIM...
 
Tipos de memoria ram 1
Tipos de memoria ram 1Tipos de memoria ram 1
Tipos de memoria ram 1
 
Guia de-precios-compugreiff-09-de-diciembre-de-2015
Guia de-precios-compugreiff-09-de-diciembre-de-2015Guia de-precios-compugreiff-09-de-diciembre-de-2015
Guia de-precios-compugreiff-09-de-diciembre-de-2015
 

Similaire à REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING

Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity Algorithms
Shivansh Gaur
 
An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...
Alexander Decker
 

Similaire à REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING (20)

REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTINGREPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
 
Ijcatr04071003
Ijcatr04071003Ijcatr04071003
Ijcatr04071003
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
 
A New Architecture for Group Replication in Data Grid
A New Architecture for Group Replication in Data GridA New Architecture for Group Replication in Data Grid
A New Architecture for Group Replication in Data Grid
 
50120130406035
5012013040603550120130406035
50120130406035
 
SiDe Enabled Reliable Replica Optimization
SiDe Enabled Reliable Replica OptimizationSiDe Enabled Reliable Replica Optimization
SiDe Enabled Reliable Replica Optimization
 
Ax34298305
Ax34298305Ax34298305
Ax34298305
 
Efficient File Sharing Scheme in Mobile Adhoc Network
Efficient File Sharing Scheme in Mobile Adhoc NetworkEfficient File Sharing Scheme in Mobile Adhoc Network
Efficient File Sharing Scheme in Mobile Adhoc Network
 
An Efficient Approach to Manage Small Files in Distributed File Systems
An Efficient Approach to Manage Small Files in Distributed File SystemsAn Efficient Approach to Manage Small Files in Distributed File Systems
An Efficient Approach to Manage Small Files in Distributed File Systems
 
Towards a low cost etl system
Towards a low cost etl systemTowards a low cost etl system
Towards a low cost etl system
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
 
T AXONOMY OF O PTIMIZATION A PPROACHES OF R ESOURCE B ROKERS IN D ATA G RIDS
T AXONOMY OF  O PTIMIZATION  A PPROACHES OF R ESOURCE B ROKERS IN  D ATA  G RIDST AXONOMY OF  O PTIMIZATION  A PPROACHES OF R ESOURCE B ROKERS IN  D ATA  G RIDS
T AXONOMY OF O PTIMIZATION A PPROACHES OF R ESOURCE B ROKERS IN D ATA G RIDS
 
Hm2413291336
Hm2413291336Hm2413291336
Hm2413291336
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity Algorithms
 
An Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageAn Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud Storage
 
An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...
 
Scalable and adaptive data replica placement for geo distributed cloud storages
Scalable and adaptive data replica placement for geo distributed cloud storagesScalable and adaptive data replica placement for geo distributed cloud storages
Scalable and adaptive data replica placement for geo distributed cloud storages
 
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
 
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING

  • 1. REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING Yuhanis Yusof School of Computing, Universiti Utara Malaysia,Malaysia yuhanis@uum.edu.my ABSTRACT This study discusses the utilization of three types of relationships in performing data replication. As grid computing offers the ability of sharing huge amount of resources, resource availability is an important issue to be addressed. The undertaken approach combines the viewpoint of user, system and the grid itself in ensuring resource availability. The realization of the proposed strategy is demonstrated via OptorSim and evaluation is made based on execution time, storage usage, network bandwidth and computing element usage. Results suggested that the proposed strategy produces a better outcome than an existing method even though various job workload is introduced. KEYWORDS Data Grid, Data Replication, Grid Computing 1. INTRODUCTION Over a number of recent years, the grid has become progressive information technology trend that enables high performance computing for scientific applications. Such a technology offers researchers the availability of powerful resources which allows them to broaden their simulations and experiments. As the grid infrastructure progress, issues are shifted towards resource management. This is realized in the Data grid where huge amount of data enables grid applications to share data files in a coordinated manner. Such an approach is seen to provide fast, reliable and transparent data access. Nevertheless, Data Grid creates a challenging problem in a grid environment because the volume of data to be shared is large despite the limited storage space and network bandwidth [1, 2]. Furthermore, resources involved are heterogeneous as they belong to different administrative domains in a distributed environment. Operationally, it is infeasible for various users to access the same data (e.g. a data file) from one single organization (e.g. site). Such situation would lead to the increase of data access latency. Motivated by these considerations, a commonly strategy used in distributed system is also employed in Data Grid, that is replication. Experience from distributed system design shows that replication promotes high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability. In grid environment, replication is one of the major factors affecting performance of Data Grids [3]. With this, it is suggested that well-defined replication strategies will smooth data access, and reduce job execution cost [4]. However, replication isbounded by two factors: the size of storage available at different sites within the Data Grid and the bandwidth between these sites [5]. Furthermore, the files in a Data Grid are mostly large [6, 7]; so, replication to every site and hosting unlimited number of replicas would be unfeasible. Hence, we need to carefully decide which file that requires replication. We propose a relationship based replication that integrates the viewpoint of three parties; user, system and grid environment. Existing study either identify the required resource based solely on users’ perspective , i.e. number of Sundarapandian et al. (Eds) : ICAITA, SAI, SEAS, CDKP, CMCA-2013 pp. 379–386, 2013. © CS & IT-CSCP 2013 DOI : 10.5121/csit.2013.3831
  • 2. Computer Science & Information Technology (CS & IT) 380 access on the file [8, 9], or based on system’s perspective, i.e. storage cost and read cost of a file [10-12]. As a result, there will be an insufficient utilizing of storage resource space, which in turn will lead to less storage availability. According to [13], less storage availability would lead to longer job execution time and larger network usage because only fewer replicas can be accommodated in the Data Grid, and most files will be read remotely. The rest of this paper is structured as follows. Section 2 provides a brief description on existing work in data replication in the data grid. We include the details of our proposed replication strategy in Section 3 and the performance evaluation is presented in Section 4. Finally, we summarize the study in Section 5. 2. BACKGROUND The replication algorithm proposed in [4] determines popularity of a file by analyzing data access history. The researcher believes that the popular data in the past will remain popular in the near future. Having analyzed data access history, the average number of access, NOA, is computed. Files with NOA’s value that is greater than the computer average NOA will be replicated. Hence, the order of which files to be replicated depends on the NOA. The larger the NOA, the more popular the file is and will be given a higher priority during the replication process. Nevertheless, such an approach did not consider time period of when the files were accessed. If a file was accessed for a number of times in the past, while none was made recently, the file would still be considered popular and hence will be replicated. The algorithm proposed in [8] called Last Access Largest Weight (LALW) tries to solve this problem. The key point of LALW is to give different weights to files having different age. The LALW algorithm is similar to other algorithms [4] by means of using information on access history to determine popularity of a file. But the innovation is included by adding a tag to each access history record of a file. The work in [14] suggested a model that helps to determine number of replicas needed to maintain the desired availability in P2P communities. With this, each site within the Data Grid is authorized to create replicas for the files. The availability of a file depends on the failure rate of peers in the network. However such a model has its own disadvantage: the exact number of replicas is not determined; rather it depends on the location service accuracy which depends on the existing number of replicas. The accuracy of the replica location service determines the percentage of accessible files, and thus if the location service is ineffective, more replicas are created to ensure data availability.On the other hand, the work discussed in [9] proposed a replication strategy that makes replication decisions whether to increase number of replicas to face the high volume of requests, or to reduce the number of replicas to save more storage space. Evidently, increasing the number of replicas will decrease the response time, but the storage cost will be increased accordingly [9]. 3. METHOD In a data grid, when a resource (e.g a data file) is required by a job and is not available on a local storage, it may either be replicated or read remotely. If a file has been replicated, in the future, when it is requested, any job can accessed it quickly and the job execution time can be reduced. Due to the limited storage capacity, replication decision should be made to conform users’ needs so that high demanded files (popular replicas) are efficiently maintain and files that are rarely utilized are removed. Our strategy (known as Relationship-based Replication, RBR) is designed by utilizing three types of relationships:
  • 3. 381 Computer Science & Information Technology (CS & IT) 1) File-to-user (F2U) [15] - behavior of a file being requested by users, and notes the change to this request(whether is a growth or decay change). The relationship is represented using the exponential model. The F2U provides us with the ‫( ݁݉݅ݐ݂݁݅ܮ݈݁݅ܨ‬FL), 2) File-to-file relationship (F2F) [15] - behavior of a file requesting other files and is noted by ‫ܹ݈݃݅݁݁݅ܨ‬ℎ‫( ݐ‬FW) 3) File-to-grid (F2G) - lifetime of a file in the grid system and is represented by ‫݁݃ܣ ݈݁݅ܨ‬ (FA) . Hence, the work presented in this study determines the importance of a resource (i.edata file) by combining information from users (F2U), file (F2F) and the grid (F2G) itself. The File Value is computed as in Eq. 1. ‫݁ݑ݈ܸ݈ܽ݁݅ܨ‬ሺ‫݂ ,ݐ‬ሻ = ‫݁݉݅ݐ݂݁݅ܮ݈݁݅ܨ‬ሺ‫݂ ,ݐ‬ሻ + ‫ܹ݈݃݅݁݁݅ܨ‬ℎ‫ݐ‬ሺ‫݂ ,ݐ‬ሻ ‫݁݃ܽ ݈݁݅ܨ‬ሺ‫݂ ,ݐ‬ሻ The ‫ܹ݈݃݅݁݁݅ܨ ,݁݉݅ݐ݂݁݅ܮ݈݁݅ܨ‬ℎ‫ ݐ‬and ‫ ݁݃ܣ ݈݁݅ܨ‬are used to compute the ‫( ݁ݑ݈ܸܽ ݈݁݅ܨ‬FV) that is used as an indicator for the volume of demand for a data file. The larger the value of FV, the more important the file is to the grid system. Hence, it will then be replicated. The operation of ‫ ݁݉݅ݐ݂݁݅ܮ݈݁݅ܨ‬and ‫ܹ݈݃݅݁݁݅ܨ‬ℎ‫ ݐ‬are provided in the work reported by Madi[15]. ௧ ௧ାଵ If we use ܰ௙ to represent the number of accesses for file݂ at time ‫ ,ݐ‬and ܰ௙ to represent the number of accesses at time t + 1, the exponential growth/decay model would be given by: ௧ାଵ ௧ ܰ௙ = ܰ௙ × ሺ1 + ‫ݎ‬ሻ where r is the growth or decay rate in number of accesses of a file in one time interval. Therefore, the value of r using the following formula can be calculated: ௧ାଵ ௧ ‫ = ݎ‬൫ܰ௙ ൗܰ௙ ൯ − 1 ௧ Assume ‫ ݐ‬is the number of passed intervals, and ܰ௙ indicates the number of access for the file ݂ at time interval ‫ ,ݐ‬then we get the sequence of access numbers: ଴ ଵ ଶ ଷ ௧ିଵ ௧ ܰ௙ ܰ௙ ܰ௙ ܰ௙ . … ܰ௙ ܰ௙ Therefore, there are ‫ 1 − ݐ‬time intervals, and each time interval has a growth or decay rate in number of accesses of a file. So according to the exponential growth/decay model, the equation can be written as in the following: ଵ ଴ ‫ݎ‬଴ = ൫ܰ௙ ൗܰ௙ ൯ − 1, ଶ ଵ ‫ݎ‬ଵ = ൫ܰ௙ ൗܰ௙ ൯ − 1, ଷ ଶ ‫ݎ‬ଶ = ൫ܰ௙ ൗܰ௙ ൯ − 1, ௧ ௧ିଵ ‫ݎ‬௧ିଵ = ൫ܰ௙ ൗܰ௙ ൯ − 1 Therefore the average rate for all intervals is: ௧ିଵ ‫ = ݎ‬෍ ‫ݎ‬௜ ൘‫1 − ݐ‬ ଴
  • 4. Computer Science & Information Technology (CS & IT) 382 Having known the average accessed rate (growth or decay) for a file during the past intervals, the number of access for the upcoming time interval can be estimated, which is termed as the File Lifetime [15] : ௧ ‫ܰ = ݁݉݅ݐ݂݁݅ܮ ݈݁݅ܨ‬௙ × ሺ1 + ‫ݎ‬ሻ On the other hand, the File Weight as described in [15] is as follows: ௡ ‫ܹ݃݅݁ ݈݁݅ܨ‬ℎ‫ = ݐ‬෍ ‫ܮܨ‬௜ × ‫ܮܦ‬௜ ௜ୀଵ where, ݊: total number of files in a grid system, ‫ :ܮܨ‬File Lifetime, and ‫ :ܮܦ‬dependency level of other files on the underlying file, and if there is no dependency, DL is assumed to be zero. Finally, the age of a file can be calculated as the time a filebeing included in the grid until the current time. Hence, it is as follows: ‫݁݉݅ܶ = ݁݃ܣ ݈݁݅ܨ‬௖௨௥௥௘௡௧ − ܶ݅݉݁௔௧௧௔௖௛ 4. EXPERIMENTS In this research, the OptorSim[16-18] simulator was utilized to simulate the proposed replication strategy. The main idea of OptorSim is when given a grid topology, resources, and a set of jobs and optimization strategy, it can simulate data movement around these job runs and supply information on various factors that could be used to evaluate the performance of the optimization strategy. System scalability can be tested by the number of jobs running during the simulation. In this paper, to simulate different number of jobs, number of jobs that is considered in our evaluation varies between 200 and 4000 jobs. The evaluation is based on the following metrics: Mean Job Execution Time This is defined as the average time required executing a job starting from the time it is scheduled to the Computing Element until it complete processing all of the required files. It is calculated by accumulating the time taken by each job and divided by the number of jobs [8, 16, 17, 19-21], as shown in the following formula: MJET = ∑ ୘ీ౛౦౗౨౪౫౨౛ ି୘ఽ౨౨౟౬౛ ୬ where, ܶ୅୰୰୧୴ୣ : start time of job execution, ܶ஽௘௣௔௥௧௨௥௘ : completion time of job execution, and ݊: total number of processed jobs in the simulation. Efficient Network Usage (ENU) ENU is a measure of how well the replication strategy uses the network [17]. It is computed as: ܰ୰ୣ୫୭୲ୣ ϐ୧୪ୣ ୟୡୡୣୱୱ + ܰ௥௘௣௟௜௖௔௧௜௢௡௦ ‫= ܷܰܧ‬ ܰ୰ୣ୫୭୲ୣ ϐ୧୪ୣ ୟୡୡୣୱୱ + ܰ௟௢௖௔௟ ௙௜௟௘ ௔௖௖௘௦௦ where ܰ୰ୣ୫୭୲ୣ ϐ୧୪ୣ ୟୡୡୣୱୱ is the number of accesses that Computing Element reads a file from a remote site, ܰ௥௘௣௟௜௖௔௧௜௢௡௦ is the total number of file replication that occurs, and
  • 5. 383 Computer Science & Information Technology (CS & IT) ሺܰ୰ୣ୫୭୲ୣ ϐ୧୪ୣ ୟୡୡୣୱୱ + ܰ௟௢௖௔௟௙௜௟௘௔௖௖௘௦௦ ሻ is the number of times that Computing Element reads a file from a remote site or reads a file locally. A lower value indicates that the utilization of network bandwidth is more efficient. Storage Element Usage The average of all storage reserve capacity in a data grid can reflect the total system storage cost [16, 17]. The Average Storage Usage [22] metric is computed by the following equation [17]: ASU = ౑ ి ∑౤ ሺୱ୧୲ୣ౟ ሻ ౟సభ ୒ where, × 100% ܷ: storage usage space that is reserved by the data files, ܰ: number of sites in the data grid, and ‫ :ܥ‬total capacity of the storage medium. Computing Element Usage (CE Usage) This is defined as the percentage of time that a CE is active (transferring or processing data) during the simulation. The CE usage of the whole grid is computed by aggregating the CE usage of each individual CE[17]. 5. RESULTS The results depicted in Table 1 show a linear increase in the MJET as the number of jobs on the grid increases. This is because, as more jobs are submitted, the queue at the sites increases. If the job submission rate is higher than the grid’s job processing rate, this build-up of queues is inevitable. Hence, a preferred replication strategy is a strategy that has less MJET. As shown in Table 1, for MJET, the RBR is the best among existing algorithms. Utilizing the RBR, the mean job execution time is reduced and is noted to better by 24.25% over LALW. Referring to the Average Storage Usage, by using RBR, the storage usage is reduced by outperforming LALW, 19.55%. Table 1. : Simulation Results Number of Jobs RBR MJET 3931 3545 37.87 31.92 ASU 34.13 27.73 CEU 500 LALW ENU 200 Metrics 22.15 23.41 MJET 7839 7566 ENU 36.88 30.06 ASU 35.71 28.78 CEU 25.91 27.15
  • 6. Computer Science & Information Technology (CS & IT) 1000 384 12311 34.25 26.88 ASU 37.12 29.97 CEU 30.25 34.27 MJET 54133 50361 ENU 32.45 24.36 ASU 38.63 30.54 CEU 25.74 33.83 MJET 104129 103396 ENU 30.89 22.73 ASU 40.11 31.54 CEU 4000 16241 ENU 2000 MJET 28.11 33.27 On the other hand, results of Efficient Network Usage (ENU) show a slight linear decrease as number of jobs on the grid increases. This is because at the start of the simulation the queues are short, but they build up while the files are replicated in the grid. Once the replication process has established, the execution time are reduced and the queue is shorten. The ENU gradually decreases with the increment in number of jobs because the amount of replication decreases over time. The RBR uses the lowest amount of network resources for the tested number of jobs because it is able to make better decision in deciding file that requires replication. Looking at the Computing Element Usage (CEU) metric, it can be seen that the CEU generally grows as the number of jobs increases, reflecting the heavy workload. However, there is an obvious drop between 1000 and 2000, this is because with a higher number of jobs, the scheduling algorithm is sending most of the extra jobs to a few sites from where the data are easily accessible, leading to more uneven distribution of jobs around the grid. The same trend, although less marked, there is a slight drop seen with RBR. This indicates that RBR made a good balance in the grid as it is able to determine the appropriate data files that require replication. Having the required data file (replicas) at the closest site reduces the computing element usage. 6. CONCLUSION In this paper, we presented a replication strategy that incorporates three relationships in determining the importance of a file. In the simulation experiments, different scenarios of job workload were considered in order to evaluate the proposed relationship based replication, RBR. Results showed that by integrating information on F2U, F2F and F2G, a better result was obtained as compared to an existing approach of LALW. This may suggest that the proposed strategy could be a possible approach in data replication.
  • 7. 385 Computer Science & Information Technology (CS & IT) ACKNOWLEDGEMENT The author would like to thank Mohammed Madi for his assistant in completing the experiment. Also, author expresses the gratitude to Universiti Utara Malaysia for providing the financial and management support under the High Impact Individual research grant (s/o 12155). REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] B. Wilkinson, Grid computing: techniques and applications: Chapman & Hall/CRC, 2009. C. Nicholson, D. G. Cameron, A. T. Doyle, A. P. Millar, and K. Stockinger, "Dynamic data replication in lcg 2008," Concurrency and Computation: Practice and Experience, vol. 20, pp. 12591271, 2008. X. You, G. Chang, X. Chen, C. Tian, and C. Zhu, "Utility-Based Replication Strategies in Data Grids," in Fifth International Conference on Grid and Cooperative Computing, 2006, pp. 500-507. M. Tang, B. S. Lee, X. Tang, and C. K. Yeo, "The impact of data replication on job scheduling performance in the Data Grid," Future Generation Computer Systems, vol. 22, pp. 254-268, 2006. S. Venugopal, R. Buyya, and K. Ramamohanarao, "A taxonomy of data grids for distributed data sharing, management, and processing," ACM Computing Surveys (CSUR), vol. 38, p. 3, 2006. R. M. Rahman, K. Barker, and R. Alhajj, "Replica placement strategies in data grid," Journal of Grid Computing, vol. 6, pp. 103-123, 2008. R. M. Rahman, K. Barker, and R. Alhajj, "Performance evaluation of different replica placement algorithms," International Journal of Grid and Utility Computing, vol. 1, pp. 121-133, 2009. C. Ruay-Shiung, C. Hui-Ping, and W. Yun-Ting, "A dynamic weighted data replication strategy in data grids," in AICCSA 2008: Proceedings of IEEE/ACS International Conference on computer systems and applications, 2008, pp. 414-421. H. H. E. Al Mistarihi and C. H. Yong, "Replica management in data grid," International Journal of Computer Science and Network Security IJCSNS, vol. 8, p. 22, 2008. L. Yi-Fang, L. Pangfeng, and W. Jan-Jan, "Optimal placement of replicas in data grid environments with locality assurance," in Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on, 2006, p. 8. L. Pangfeng and W. Jan-Jan, "Optimal replica placement strategy for hierarchical data grid systems," in Cluster Computing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on, 2006, p. 4 pp. Y. Mansouri, M. Garmehi, M. Sargolzaei, and M. Shadi, "Optimal Number of Replicas in Data Grid Environment," in First International Conference on Distributed Framework and Applications, 2008. DFmA 2008. , 2008, pp. 96-101. David G. Cameron, "Replica management and optimisation for data grids," PhD. Thesis, University of Glasgow, 2005. K. Ranganathan, A. Iamnitchi, and I. Foster, "Improving data availability through dynamic modeldriven replication in large peer-to-peer communities," in Global and Peer-to-Peer Computing on Large Scale Distributed Systems Workshop, 2002, pp. 376–381. M. Madi, "Replica Creation Algorithm for Data Grid," PhD, School of Computing, UUM College of Arts and Sciences, Universiti Utara Malaysia, Sintok, 2012. D. G. Cameron, A. P. Millar, C. Nicholson, R. Carvajal-Schiaffino, K. Stockinger, and F. Zini, "Analysis of scheduling and replica optimisation strategies for data grids using OptorSim," Journal of Grid Computing, vol. 2, pp. 57-69, 2004. W. H. Bell, D. G. Cameron, A. P. Millar, L. Capozza, K. Stockinger, and F. Zini, "Optorsim: A grid simulator for studying dynamic data replication strategies," International Journal of High Performance Computing Applications, vol. 17, pp. 403-416, 2003. D. G. Cameron, R. Carvajal-Schiaffino, A. P. Millar, C. Nicholson, K. Stockinger, and F. Zini, "Evaluating scheduling and replica optimisation strategies in OptorSim," Journal of Grid Computing, pp. 57-69, March 2004. W. H. Bell, D. G. Cameron, L. Capozza, P. Millar, K. Stockinger, and F. Zini, "Simulation of Dynamic Grid Replication Strategies in OptorSim," Journal of High Performance Computing Applications, vol. 17, 2003.
  • 8. Computer Science & Information Technology (CS & IT) 386 [20] F. Ben Charrada, H. Ounelli, and H. Chettaoui, "An Efficient Replication Strategy for Dynamic Data Grids," in Proceedings of International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC),, 2010, pp. 50-54. , 50 [21] D. G. Cameron, R. Carvajal-Schiaffino, A. P. Millar, C. Nicholson, K. Stockinger, and F. Zini, "UK Schiaffino, Millar, grid simulation with OptorSim," in Proceedings of UK e-Science All Hands Meeting Nottingham, Science Meeting, UK, 2003. [22] S. Sivasubramanian, M. Szymaniak, G. Pierre, and M. Van Steen, "Replication for web hosting systems," ACM Computing Surveys (CSUR), vol. 36, pp. 291-334, 2004. AUTHOR Yuhanis Yusof is a senior lecturer in School of Computing, Universiti Utara Malaysia (UUM), Malaysia. She obtained her PhD in 2007 in the area of Computer Science from Cardiff University, UK. She also holds MSc degree in Computer Science from UniversitiUniversit Sains Malaysia and Bachelor of Information Technology from UUM. Her research interest is broadly in data analysis and management for large scale compucomp ting. This includes computational intelligence, data mining (discovering patterns of interest from data), data warehousing, informationretrieval and grid computing. Her current focus is in the area of nature ing, nature-inspired computing, in particularly the Artificial Bee Colony and Firefly algorithm. Such algorithms are later reare lized in various domains such financial analysis and grid computing.