070105618001 070105618006-070105618015-070105618021

PREVENTING HACKING OF DATA

IN DATA GRID

A PROJECT REPORT

Submitted by

AARTHI.S.M (070105618001)
BALGANI.S (070105618006)
JEEVITHA.Y (070105618015)
MANGALAPRIYA.Y (070105618021)
In partial fulfillment for the award of the degree

of

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE AND ENGINEERING

VIDYAA VIKAS COLLEGE OF ENGINEERING AND TECHNOLOGY

TIRUCHENGODE – 637 214

ANNA UNIVERSITY OF TECHNOLOGY COIMBATORE 641047

APRIL 2011
1

ANNA UNIVERSITY OF TECHNOLOGY

COIMBATORE -641047

BONAFIDE CERTIFICATE

Certified that this project report “PREVENTING HACKING OF DATA
IN DATA GRID” is the bonafide work of

AARTHI.S.M ( 070105618001)

BALGANI.S ( 070105618006)

JEEVITHA.Y ( 070105618015)

MANGALAPRIYA.Y ( 070105618021)

who carried out the project work under my supervision.

MR.S.T.LENIN, PROF. N. K. KUPPUCHAMY
SUPERVISOR, HEAD OF THE DEPARTMENT,
Lecturer, Department of CSE,
Department of CSE, Vidyaa Vikas College of
Vidyaa Vikas College of Engineering &Technology,
Engineering &Technology, Tiruchengode-637 214.
Tiruchengode-637 214.

Submitted for the university examination held on 06.04.11

INTERNAL EXAMINER EXTERNAL EXAMINER

2

ACKNOWLEDGEMENT

First and foremost, we would like to express our deep sense of gratitude to our
honorable secretary Dr.S.Gunasekaran,MSc.,M.Ed.,M.phil.,Ph.D., and our beloved
correspondent Dr.T.O.Singaravel,MSc.,M.Ed.,M.Phil.,Ph.D., and all the Trustees
of Vidyaa Vikas Educational Institution for providing us with necessary facilities
during the course of study.

We feel immensely pleased to express our sincere thanks to our Principal
Dr.S.Sundaram Ph.D., for encouragement and support extended by him.

We extend our solemn gratitude to Prof. N. K. Kuppuchamy M.E., (Ph.D..,)
Head of the Department of computer Science and Engineering ,for his timely support
to all our activities.

We sincerely thank our Project Coordinator, Mr.M.Manikandan, M.E., who
has been the source of constant guidance and inspiration throughout our project work.

We sincerely thank our project guide Mr.M.Manikandan, M.E., for his
valuable suggestions that paved us the way to carry out our project work successfully.

We express our sincere thanks to the Faculty members non-teaching staff
members and my dear friends for their moral support, help and encouragement
towards the successful completion of the project.

We are most indebted to our parents, with whose support resulted in making
our dreams of becoming successful graduates, a reality.

We are quite confident that our project works stands testimony to the fact that
hard work will bear enjoyable fruits not only to the individuals concerned but to the
entire community, as a whole as we have witnessed many inventions of the scientists
have made the lives of the brethren more comfortable.
3

ABSTRACT

Secret sharing and erasure coding-based approaches have been used in distributed storage

systems to ensure the confidentiality, integrity, and availability of critical information. To achieve

performance goals in data accesses, these data fragmentation approaches can be combined with dynamic

replication. In this paper, we consider data partitioning (both secret sharing and erasure coding) and

dynamic replication in data grids, in which security and data access performance are critical issues. More

specifically, we investigate the problem of optimal allocation of sensitive data objects that are partitioned

by using secret sharing scheme or erasure coding scheme and/or replicated. The grid topology we

consider consists of two layers. In the upper layer, multiple clusters form a network topology that can be

represented by a general graph. The topology within each cluster is represented by a tree graph. We

decompose the share replica allocation problem into two subproblems: the Optimal Intercluster Resident

Set Problem (OIRSP) that determines which clusters need share replicas and the Optimal Intercluster

Share Allocation Problem (OISAP) that determines the number of share replicas needed in a cluster and

their placements. We develop two heuristic algorithms for the two sub problems. Experimental studies

show that the heuristic algorithms achieve good performance in reducing communication cost and are

close to optimal solutions. Success of this project can help to achieve significant advantages in business,

medical treatment, disaster relief, research, and military and can result in dramatic benefits to the society.

4

CONTENT

CHAPTER NO TITLE PAGE NO

List of Figures i

List of Abbreviations ii

1. Introduction 1

1.1 Objective 4

2 System Analysis 5

2.1 Existing System 5

2.1.1Drawbacks 5

2.2 Proposed System 5

2.3 Feasibility Study 6

2.3.1 Economical Feasibility 6

2.3.2 Operational Feasibility 6

2.3.3 Technical Feasibility 6

3 System Specification 7

3.1 Hardware Requirements 7

3.2 Software Requirements 7

4 Software Description 8

4.1 Front End 8

4.2 Features 8

5 Project Description 12

5.1 Problem Definition 12
5

5.2 Overview of the Project 15

5.3 Module Description 15

5.3.1 Modules 15

5.4 Data Flow Diagram 28

5.5 Input Design 28

5.6 Output Design 28

6 System Testing 29

6.1 Unit Testing 30

6.2 Acceptance Testing 31

6.3 Test Cases 32

7 System Implementation 33

8 Conclusion & Future Enhancements 34

8.1 Conclusion 34

8.2 Future Enhancements 34

9 Appendix 35

1 Source Code 35

2 Screen Shots 51

10 References 57

6

LIST OF FIGURES

FIGURE NO NAME PAGE NO

5.1 Master Server Cluster 14

5.2 The original GC with RC={H1,H2,H3}. 19

5.3 Super node S and SPT(GC,RC) constructed by
Build_SPT 19
5.4 The performance impact of graph size. 22

5.5 The performance impact of graph degree. 22

5.6 The impact of update/read ratio. 23

5.7 The impact of l with read/update=3. 25

5.8 The impact of l with read/update= 30. 25

5.9 The impact of m with read/update=3. 26

5.10 The impact of m with read/update=30. 26
5.11 Data Flow Diagram 28

7

ABBREVATIONS

GIG Global Information Grid

GMESS Grid for Medical Services

RLS Replica Location Service

OIRSP Optimal Intercluster Resident Set Problem

OISAP Optimal Intracluster Share Allocation Problem

MSC Master Server Cluster

MST Minimum Spanning Tree

SRS Software Requirements Specification

UML Unified Modeling Language

8

CHAPTER 1
INTRODUCTION

Data grid is a distributed computing architecture that integrates a large number of data
and computing resources into a single virtual data management system . It enables the sharing
and coordinated use of data from various resources and provides various services to fit the needs
of high-performance distributed and data-intensive computing. Many data grid applications are
being developed or proposed, such as DoD’s Global Information Grid (GIG) for both business
and military domains, NASA’s Information Power Grid , GMESS Health-Grid for medical
services, data grids for Federal Disaster Relief , etc. These data grid applications are designed to
support global collaborations that may involve large amount of information, intensive
computation, real time, or nonreal time communication. Success of these projects can help to
achieve significant advances in business, medical treatment, disaster relief, research, and military
and can result in dramatic benefits to the society. There are several important requirements for
data grids, including information survivability, security, and access performance. For example,
consider a first responder team responding to a fire in a building with explosive chemicals. The
data grid that hosts building safety information, such as the building layout and locations of
dangerous chemicals and hazard containment devices, can help draw relatively safe and effective
rescue plans. Delayed accesses to these data can endanger the responders as well as increase the
risk to the victims or cause severe damages to the property. At the same time, the information
such as location of hazardous chemicals is highly sensitive and, if falls in the hands of terrorists,
could cause severe consequences. Thus, confidentiality of the critical information should be
carefully protected. The above example indicates the importance of data grids and their
availability, reliability, accuracy, and responsiveness.
Replication is frequently used to achieve access efficiency, availability, and information
survivability. The underlying infrastructure for data grids can generally be classified into two
types: cluster based and peer-to-peer systems. In pure peer-to-peer storage systems, there is no
dedicated node for grid applications (in some systems, some servers are dedicated). Replication
can bring data objects to the peers that are close to the accessing clients and, hence, improve
access efficiency. Having multiple replicas directly implies higher information survivability. In
9

cluster-based systems, dedicated servers are clustered together to offer storage and services.
However, the number of clusters is generally limited and, thus, they may be far from most
clients. To improve both access performance and availability, it is necessary to replicate data and
place them close to the clients, such as peer-to-peer data caching. As can be seen, replication is
an effective technique for all types of data grids.
Existing research works on replication in data grids investigate replica access protocols,
resource management and discovery techniques, replica location and discovery algorithms, and
replica placement issues. Though replication can greatly help with information survivability and
access efficiency, it does not address security requirements. Having more replicas implies a
higher potential of compromising one of the replicas. One solution is to encrypt the sensitive data
and their replicas. However, it pushes the responsibility of protecting data to protecting
encryption keys and brings a nontrivial key management problem. If a simple centralized key
server system is used, then it is vulnerable to single point of failure and denial of service attacks.
Also, the centralized key server may be compromised and, hence, reveal all keys. Replication of
keys can increase its access efficiency as well as avoiding the single-point failure problem and
reducing the risk of denial of service attacks, but would increase the risk of having some
compromised key servers. If one of the key servers is compromised, all the critical data are
essentially compromised. Beside key management issues, information leakage is another
problem with the replica encryption approach. Generally, a key is used to access many data
objects. When a client leaves the system or its privilege for some accesses is revoked, those data
objects have to be re-encrypted using a new key and the new key has to be distributed to other
clients. If one of the data storage servers is compromised, the storage server could retain a copy
of the data encrypted using the old key. Thus, the content of long-lived data may leak over time.
Therefore, additional security mechanisms are needed for sensitive data protection. There are
other methods proposed for providing survivable and secure storage in untrustworthy
environment. The most effective approach is to provide intrusion tolerance. Most intrusion-
tolerant systems partition the sensitive data and distribute the shares across the storage sites to
achieve both confidentiality and survivability . By doing so, a data object can remain secure even
if a partial set of its shares (below a threshold) are compromised by the adversaries.

10

This scheme can be used to protect critical data directly or to protect the keys of the
encrypted data. The intrusion tolerance concept and data partitioning techniques can be used to
achieve data survivability as well as security. The most commonly used schemes for data
partitioning include secret sharing and erasure coding. Both schemes partition data into shares
and distribute them to different processors to achieve availability and integrity. Secret sharing
schemes assure confidentiality even if some shares (less than a threshold) are compromised. In
erasure coding, data shares can be encrypted and the encryption key can be secret shared and
distributed with the data shares to assure confidentiality. However, changing the number of
shares in a data partitioning scheme is generally costly. When it is necessary to add additional
shares close to a group of clients to reduce the communication cost and access latency, it is easier
to add share replicas. Thus, it is most effective to combine the data partitioning and replication
techniques for high-performance secure storage design. To actually achieve improved
performance, it is essential to place the replicated data partitions in strategic locations to
maximize the gain. There are extensive works in replica placement.
However, existing placement algorithms focus on the placement of independent data objects
(generally, only a single data or a single data set is considered). The placement problem for
partitioned data is more complex since the replicas of the data partitions need to be considered
together. Moreover, client access patterns for partitioned data are more complicated. Thus, it is
necessary to investigate the schemes for allocating partitioned data. The research on replica
placement of partitioned data is limited. In, the authors attempt to measure the security assurance
probabilities and use it to guide allocation. They consider data objects that are secret shared but
no replication is considered. Also, the share allocation algorithm they propose does not consider
performance issues such as communication cost and response latency. The work in considers a
secure data storage system that survives even if some nodes in the system are compromised. It
assumes that data are secret shared and the full set of shares are replicated and statically
distributed over the network. The major focus of this work is to guarantee confidentiality and
integrity requirements of the storage system. The communication cost and response latency are
not considered. Also, it does not address how to allocate the share replicas. In this paper, we
consider combining data partitioning and replication to support secure, survivable, and high

11

performance storage systems. Our goal is to develop placement algorithms to allocate share
replicas such that the communication cost and access latency are minimized.
The remainder of this paper is organized as follows: Section describes a data grid system
model and the problem definitions. Section introduces a heuristic algorithm for determining the
clusters that should host shares. A heuristic algorithm for share allocation within a cluster is
presented in Section. In Section, the results of the experimental studies are discussed. Section
discusses some research works that are related to this research.
The last decade has seen a substantial increase in commodity computer and network
performance, mainly as a result of faster hardware and more sophisticated software.
Nevertheless, there are still problems, in the fields of science, engineering, and business, which
cannot be effectively dealt with using the current generation of supercomputers. In fact, due to
their size and complexity, these problems are often very numerically and/or data intensive and
consequently require a variety of heterogeneous resources that are not available on a single
machine. The early efforts in Grid computing started as a project to link supercomputing sites,
but have now grown far beyond their original intent. We have combined data partitioning
schemes (secret sharing scheme or erasure coding scheme) with dynamic replication to achieve
data survivability, security, and access performance in data grids. The replicas of the partitioned
data need to be properly allocated to achieve the actual performance gains. Experimental studies
show that the heuristic algorithms achieve good performance in reducing communication cost
and are close to optimal solutions.

1.1 OBJECTIVE

systems to ensure the confidentiality, integrity, and availability of critical information.
To achieve performance goals in data accesses, these data fragmentation approaches can

be combined with dynamic replication.

12

CHAPTER 2
SYSTEM ANALYSIS

2.1 EXISTING SYSTEM
In the existing system they use only the general Encryption algorithm.
Here the severe problem is optimal allocation of sensitive data objects.
Existing system doesn’t achieve data survivability, security, and access performance.

2.1.1. DRAWBACKS
Security and data access performance are critical issues in existing system.
The severe problem in existing system is optimal allocation of sensitive data objects.
Existing system doesn’t achieve data survivability, security, and access performance.
The communication cost is higher.

2.2 PROPOSED SYSTEM:
In this paper, we consider data partitioning (both secret sharing and erasure coding) and
dynamic replication in data grids, in which security and data access performance are
critical issues.
We investigate the problem of optimal allocation of sensitive data objects that are
partitioned by using secret sharing scheme or erasure coding scheme and/or replicated.
We develop two heuristic algorithms for the two sub problems.
The OIRSP determines which clusters need to maintain share replicas.
And the OISAP determines the number of share replicas needed in a cluster and their
placements.
Experimental studies show that the heuristic algorithms achieve good performance in
reducing communication cost and are close to optimal solutions.

13

2.3 FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out.For feasibility analysis, some
understanding of the major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are
ECONOMICAL FEASIBILITY
SOCIAL FEASIBILITY
TECHNICAL FEASIBILITY

2.3.1. ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies used are freely exist.

2.3.2. SOCIAL FEASIBILITY
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system and
to make him familiar with it. His level of confidence must be raised so that he is also able to
make some constructive criticism.

2.3.3. TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the available
technical resources. This will lead to high demands on the available technical resources. This
will lead to high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.
14

CHAPTER 3
SYSTEM SPECIFICATION

3.1 HARDWARE REQUIREMENTS
The hardware used for the development of the project is

• Hard disk : 40 GB
• RAM : 512mb
• Processor : Pentium IV
• Monitor : 17’’Color Monitor

3.2 SOFTWARE REQUIREMENTS
The software used for the development of the project is

Front End : Java
Version : Jdk1.6
Operating System : Windows XP.

15

CHAPTER 4
SOFTWARE DESCRIPTION

4.1 FRONT END
JAVA
Java is a computer programming language. It enables programmers to write computer
instructions using English based commands, instead of having to write in numeric codes. It’s
known as a “high-level” language because it can be read and written easily by humans. Like
English, Java has a set of rules that determine how the instructions are written. These rules are
known as its “syntax”. Once a program has been written, the high-level instructions are
translated into numeric codes that computers can understand and execute.
Java was designed to meet all the real world requirements with its key features, which are
explained in the following paragraph.

4.2 FEATURES
SIMPLE AND POWERFUL
Java was designed to be easy for the professional programmer to learn and use
efficiently. Java makes itself simple by not having surprising features. Since it exposes the inner
working of a machine, the programmer can perform his desired actions without fear. Unlike
other programming systems that provide dozens of complicated ways to perform a simple task,
Java provides a small number of clear ways to achieve a given task.

SECURE
Today everyone is worried about safety and security. People feel that conducting
commerce over the Internet is a safe as printing the credit card number on the first page of a
Newspaper. Threatening of viruses and system hackers also exists. To overcome all these fears
java has safety and security as its key design principle.
Using Java Compatible Browser, anyone can safely download java applets without the
fear of viral infection or malicious intent. Java achieves this protection by confining a java

16

program to the java execution environment and by making it inaccessible to other parts of the
computer. We can download applets with confidence that no harm will be done and no security
will be breached.

PORTABLE
In java, the same mechanism that gives security also helps in portability. Many types of
computers and operating systems are in use throughout the world and are connected to the
internet. For downloading programs through different platforms connected to the internet, some
portable, executable code is needed. Java’s answer to these problems is its well designed
architecture.

OBJECT-ORIENTED
Java was designed to be source-code compatible with any other language. Java team
gave a clean, usable, realistic approach to objects. The object model in java is simple and easy to
extend, while simple types, such as integers, are kept as high-performance non -objects.

DYNAMIC
Java programs carry with them extensive amounts of run-time information that is used to
verify and resolve accesses to objects at run-time. Using this concept it is possible to
dynamically link code. Dynamic property of java adds strength to the applet environment, in
which small fragments of byte code may be dynamically updated on a running system.

RELIABILITY
Java needed to reduce the likelihood of fatal errors from programmer mistakes. With this
in mind, object-oriented programming was introduced. Once data and its manipulation were
packaged together in one place, it increased Java’s robustness.

17

PLATFORM INDEPENDENT
Programs needed to work regardless of the machine they were being executed on. Java
was written to be a portable language that doesn't care about the operating system or the hardware
of the computer.

NEWLY ADDDED FEATURES IN JAVA 2
• SWING is a set of user interface components that is entirely implemented in java the user
can use a look and feel that is either specific to a particular operating system or uniform
across operating systems.
• Collections are a group of objects. Java provides several types of collection, such as
linked lists, dynamic arrays, and hash tables, for our use. Collections offer a new way to
solve several common programming problems.
• Various tools such as javac, java and javadoc have been enhanced. Debugger and profiler
interfaces for the JVM are available.
• Performance improvements have been made in several areas. A JUST-IN-TIME (JIT)
compiler is included in the JDK.
• Digital certificates provide a mechanism to establish the identity of a user, which can be
referred as electronic passports.
• Various security tools are available that enable the user to create the user to create and
store cryptographic keys ad digital certificates, sign Java Archive(JAR) files, and check
the signature of a JAR file.

SWING
Swing components facilitate efficient graphical user interface (GUI) development. These
components are a collection of lightweight visual components. Swing components contain a
replacement for the heavyweight AWT components as well as complex user interface
components such as Trees and Tables.
Swing components contain a pluggable look and feel (PL & F). This allows all applications
to run with the native look and feel on different platforms. PL & F allows applications to have
the same behaviour on various platforms. JFC contains operating system neutral look and feel.
18

Swing components do not contain peers. Swing components allow mixing AWT heavyweight
and Swing lightweight components in an application. The major difference between lightweight
and heavyweight components is that lightweight components can have transparent pixels while
heavyweight components are always opaque. Lightweight components can be non-rectangular
while heavyweight components are always rectangular.
Swing components are JavaBean compliant. This allows components to be used easily in a
Bean aware application building program. The root of the majority of the Swing hierarchy is the
JComponent class. This class is an extension of the AWT Container class.
Swing components comprise of a large percentage of the JFC release. The Swing component
toolkit consists of over 250 pure Java classes and 75 Interfaces contained in about 10 Packages.
They are used to build lightweight user interfaces. Swing consists of User Interface (UI) classes
and non- User Interface classes. The non-User Interface classes provide services and other
operations for the UI classes.

Swing offers a number of advantages, which include

• Wide variety of Components
• Pluggable Look and Feel
• MVC Architecture
• Keystroke Handling
• Action Objects
• Nested Containers
• Virtual Desktops
• Compound Borders
• Customized Dialogues
• Standard Dialog Classes
• Structured Table and Tree Components
• Powerful Text Manipulation
• Generic Undo Capabilities
• Accessibility Support
19

CHAPTER 5
PROJECT DESCRIPTION

5.1 PROBLEM DEFINITION
In this paper, we consider achieving secure, survivable, and high-performance data
storage in data grids. To facilitate scalability, we model the peer-to-peer data grid as a two level
topology. Studies show that the internet can be decomposed into interconnected autonomous
systems. One or several such autonomous systems that are geographically close to each other
can be considered as a cluster. Each edge represents a logical link which may be multiple hops
of the physical links. It is likely that the clusters are linked to the backbone and should be
modeled by a general graph.
Within each cluster, there may be many subnets from the same or multiple institutions.
Among all the physical nodes in the cluster, some nodes, such as servers, proxies, and other
individual nodes, may be committed to contribute its storage and/or computation resources for
some data grid applications. These nodes are connected via logical links. According to, internet
message routing is relatively stable in days or even weeks and the multiple routing paths
generally form a tree. Thus, for simplicity, we model the topology inside a cluster as a tree.
Consider a cluster Hx. Let Gx ¼ ðVx; ExÞ represents the topology graph within the cluster,
where Vx ¼ fPx;1; Px;2; . . . ; Px;Nxg denotes the set of Nx (N if only considering cluster Hx)
nodes in cluster Hx, and Ex is the set of edges connecting nodes in Hx. Also, let Proof x denote
the root node in Hx (e.g., Proot x 2 Vx). We assume that all traffic in Hx goes through the
network where Proot x resides. Let _ðPx;i; Px;jÞ denote the shortest path between Px;i and Px;j
in Hx, and j_ðPx;i; Px;jÞj denote the distance of _ðPx;i; Px;jÞ. Also, let _ðHx;HyÞ denote the
shortest path between Hx and Hy (actually, between Proot X and Proot y ), and j_ðHx;HyÞj
denote the distance of _ðHx;HyÞ. We assume that j_ðPx;i; Px;jÞj for any i, j, and x is much less
than j_ðHy;HzÞj for any y and z, where y 6¼ z (i.e., the distance between any two nodes within
a cluster is less than the distance between any two clusters). The data grid (represented by the
set of clusters HC) hosts a set of data objects D (D can contain the application data or keys).
One of the clusters is selected as the Master Server Cluster (MSC) for some data objects in D,
denoted as HMSC (different data objects may have different HMSC).
20

HMSC hosts these data objects permanently (it may be the original data source). These
data objects may be partially replicated in other clusters in HC to achieve better access
performance. Due to the increasing attacks on Internet, a node hosting some data objects in D has
a significant chance of being compromised. If a node is compromised, all the plaintext data
objects stored on it are compromised. If a storage node storing some encrypted data is
compromised and the nodes maintaining the corresponding encryption keys are also
compromised (note that they may be the same nodes), then the data are compromised.
We assume that the probabilities of compromising two different storage nodes are not
correlated. This is true for many new attacks, such as viruses that are spread through emails and
the major buffer overflow attacks. To cope with potential threats, the data partitioning technique
is used. Each data object d ðd 2 DÞ is partitioned into m shares. Major data partitioning schemes
include secret sharing and erasure coding. In an ðm; kÞ secret sharing scheme, m shares are
computed from a data object d using a polynomial with k _ 1 randomly chosen coefficients and
distributed over m servers. d can be determined uniquely with any k shares and no information
about d can be inferred with fewer than k shares. Secret sharing schemes incur an m=ðm _ kÞ
fold of storage waste (the rest are for improving availability and performance). If the storage
space is a concern, then erasure coding schemes can be used. An erasure coding scheme uses the
same mathematics except that the k _ 1 coefficients of the polynomial are dependent to d. Thus,
partial information may be inferred with fewer than k shares, and hence, encryption is needed for
confidentiality assurance. Generally, the encryption keys are secret shared and distributed with
the data. Erasure coding schemes achieve best storage efficiency, even when compared with
replication. The access performance in secret sharing, erasure coding, and replication (with
secret shared keys) schemes are approximately the same. Here, we do not limit the data
partitioning schemes (as long as it is secure). To ensure that the secret data can be reconstructed
even when k _ 1 nodes are compromised, we require m _ 2k _ 1. Let l denote the number of
distinct shares to be accessed for each read request (it is fixed for all read requests). We have k _
l _ m. If l > k, the original data can be reconstructed and validity of the shares can be checked.
The parameter l can be determined for each specific application system depending on its needs.
In many applications, data could be read as well as updated by clients from geographically
distributed areas. For example, in a major rescue mission or a disaster relief act, the problem

21

areas and resources need to be updated in real time to facilitate dynamic and effective planning.
Net-centric command and control systems rely on GIG for its dynamic information flow, and
frequently, critical data need to be updated on-the-fly to support agility. Also, updates in storage
systems for encryption keys can be quite frequent due to the changes in membership and access
privileges of the individuals. In our model, for each update request, all shares and share replicas
need to be updated using a primary lazy update protocol. Generally, eager update is not feasible
in widely distributed systems since it takes too long to finish the updates. Also, the large-scale
network may be partitioned and some clusters may be temporarily unreachable. Thus, a lazy
update is more suitable.
Furthermore, a primary copy is frequently used to avoid system delusion when the
system size is large or the update ratio is high. Based on the primary lazy update protocol, all
update requests are first forwarded to HMSC for execution and the updates are then propagated
to other clusters along a minimum spanning tree (MST) as described in . Consistency can also be
maintained periodically using a distributed vector clock without concerning node failures or
network partitioning. More details of the read and update protocols and their costs will be
discussed in the next section. In this paper, we assume secure communication channels for the
delivery of data shares. Standard encryption algorithms, such as SSL, can be used to achieve this.

Fig. 5.1 Master Server Cluster

22

5.2 OVERVIEW OF THE PROJECT
systems to ensure the confidentiality, integrity, and availability of critical information. To
achieve performance goals in data accesses, these data fragmentation approaches can be
combined with dynamic replication. In this paper, we consider data partitioning (both secret
sharing and erasure coding) and dynamic replication in data grids, in which security and data
access performance are critical issues. More specifically, we investigate the problem of optimal
allocation of sensitive data objects that are partitioned by using secret sharing scheme or erasure
coding scheme and/or replicated. The grid topology we consider consists of two layers. In the
upper layer, multiple clusters form a network topology that can be represented by a general
graph. The topology within each cluster is represented by a tree graph. We decompose the share
replica allocation problem into two subproblems: the Optimal Intercluster Resident Set Problem
(OIRSP) that determines which clusters need share replicas and the Optimal Intracluster Share
Allocation Problem (OISAP) that determines the number of share replicas needed in a cluster
and their placements. We develop two heuristic algorithms for the two sub problems.
Experimental studies show that the heuristic algorithms achieve good performance in reducing
communication cost and are close to optimal solutions.

5.3 MODULE DESCRIPTION
5.3.1 MODULES
1. Source
OIRSP Specification
OISAP Specification
2. Routers
3. Destination

23

SOURCE
Selecting the data and fragmenting it.
The OISAP determines the number of share replicas needed in a cluster.
OIRSP that determines which clusters need share replicas.
Sharing the data in secured manner in distributed system.
Using AES (Advance Encryption standard) we secure our data.

OIRSP Specification:
We define the first problem, OIRSP, as the optimal resident set problem in a general
graph (intercluster level graph) with an MSC HMSC. Our goal is to determine the optimal RC
that yields minimum access cost at the cluster level. For a cluster Hx 2 RC with jRxj _ l, all read
request from Hx are served locally and the cost is 0 at the cluster level. For a cluster Hx with
jRxj < l, it always transmits all read access requests in Hx to the closest cluster Hy 2 RC to
access l distinct shares, with jRyj _ l.

Thus, the total access cost in GC, denoted as CostðGC;RCÞ is defined as follows:
CostCðGC;RCÞ ¼ UpdateCostCðGC;RCÞ þ ReadCostCðGC;RCÞ: The problem here is to
decide the share replica resident set RC inGC, such that the communication cost CostCðGC;RCÞ
is minimized. The optimal resident set problem in general graph is proven to be NP-complete .
We propose a heuristic algorithm to compute a near-optimal resident set for a general graph. We
will show that the heuristic algorithm has an OðM3Þ complexity, whereM is the number of
24

clusters in the system. We will also prove that our heuristic algorithm is optimal in a tree
network, with time complexity OðMÞ.

OISAP Specification:
When we consider allocation problem within a cluster Hx, we can isolate the cluster and
consider the problem independently. As discussed earlier, all read requests from remote clusters
can be viewed as read requests from the root node. Also, the wC updates in the entire system can
be considered as updates done at the root node of the cluster. Thus, we can simplify the notation
when discussing allocation within Hx by referring to everything in the cluster without the cluster
subscript. For example, we use G ¼ ðP;EÞ to represent the topology graph of Hx, where
Summary of the Frequently Used Notation node of Hx, _ðPi; PjÞ represents the shortest path
between two nodes inside Hx, and R represents the resident set of Hx. Note that the
simplification will be used below and in Section. In the situation where multiple clusters are
considered, the original notation is used. Note that we only need to consider clusters with l or
more share replicas for this subproblem OISAP. Let ReadCost (R) denote the total read cost from
all the nodes in cluster Hx: ReadCostðRÞ ¼XPi2Hx j_ðPi;R; lÞj _ ArðPiÞ: For each update in
the system, the root node Proot needs to propagate the update to all other share holders inside
Hx. Let WriteCost(R) denote the total update cost in Hx. Then Let Cost(R) denote the total cost
of all nodes in Hx, then CostðRÞ ¼ WriteCostðRÞ þ ReadCostðRÞ: Our goal is to determine an
optimal resident set R to allocate the shares in Hx, such that Cost(R) is minimized. Note that m _
jRj _ l (we will prove this in the next section). In Section , we propose a heuristic algorithm with
a complexity of OðN3Þ to find the near-optimal solution for this problem, where N is the
number of nodes in the cluster.

OIRSP SOLUTIONS:
In this section, we present a heuristic algorithm for OIRSP. First, we discuss some
properties that are very useful for the design of the heuristic algorithm. In Section, we present the
heuristic algorithm that decides which cluster should hold share replicas to minimize access cost.

25

Some Useful Properties:
We first show that if a cluster Hx is in RC (an optimal resident set), then Hx should hold
at least l share replicas (l is the number of shares to be accessed by a read request). If Hx is in RC
and Hx has less than l shares, then read accesses from Hx will anyway need to go to another
cluster to get the remaining shares. If Hx holds no share replicas, then read accesses from Hx
may need to get the l shares from multiple clusters. These may result in unnecessary
communication overhead. The formal proof is given in Theorem. Based on this property, the
computation of the update and read costs can be simplified. Essentially, for a cluster that is in
RC, all read requests can be served locally. For a cluster that is not in RC, all read requests can
be forwarded to one single cluster in RC and all l shares can be obtained from that cluster.
Assume that there exists one cluster Hx in RC, such that jRxj < l. When the resident set is RC, a
read request from Hx cannot be served locally and the remaining shares have to be obtained from
at least one other cluster in GC that holds those shares.
Thus, j_ðHx;RCj > 0. Let us construct another resident set RC0. RC0 is the same as RC
except that in RC0, Hx holds l distinct shares. Thus, in RC0, j_ðHx;RC0Þj ¼ 0. So, the read cost
for read requests from Hx becomes zero. Also, in GC, there may be clusters that read from Hx.
Assume that Hx is the closest cluster in RC of Hy (Hy is not in RC). If the optimal resident set is
RC, then Hy needs to read from Hx and some other clusters since Hx has less than l shares. Thus,
we can conclude ReadCostCðGC;RCÞ _ ReadCostCðGC;RC0Þ _ ArðHxÞ _ j_ðHx;RC0Þj and;
hence; ReadCostCðGC;RC0Þ < ReadCostCðGC;RCÞ: Now let us consider the update cost. Note
that we have UpdateCostCðGC;RCÞ ¼ wC_j_CðRCÞj. Because RC0 and RC are actually
composed of the same set of clusters, so j_CðRC0Þj ¼ j_CðRCÞj. Also, wC is independent of
the resident set. So, we have UpdateCostCðGC;RC0Þ ¼ UpdateCostCðGC;RCÞ. Since RC0 has
the same update cost as, but lower read cost than RC, so CostCðGC;RC0Þ < CostCðGC;RCÞ.
RC, thus, cannot be an optimal residence set. It follows that Hx in RC, jRxj _ l. tu We also
observe that the clusters in the resident set RC form a connected graph (which is a subgraph in
GC). This property is formally proven in Theorem 3.2. From this property, we can see that for
resident set expansion (considering allocating share replicas to new clusters), only neighboring
clusters of the current resident set need to be considered. Thus, we can have a greedy approach to
obtain a solution. Note that, in Theorem 3.2, we assume that for each cluster Hx in GC, Ar > 0.

26

Fig. 5.2 The original GC with RC={H1,H2,H3}

Fig. 5.3 Super node S and SPT(GC,RC) constructed by Build_SPT

A Heuristic Algorithm for the OIRSP:
The goal of OIRSP is to determine the optimal resident set RC in GC. GC is a general
graph. Each edge in GC is considered as one hop. The optimal resident set problem in a general
graph is an instance of the problem discussed. It has been shown that the problem is NP-
complete. Thus, we develop a heuristic algorithm to find a near-optimal solution. Our approach
is to first build a minimal spanning tree in GC with RC being the root and then identify the
cluster to be added to RC based on the tree structure. The clusters in GC access data hosted in
RC along the shortest paths, and these paths and the clusters form a set of the shortest path trees.
Since all the nodes in RC are connected, we view them as one virtual node S. Then, S, all
clusters that are not in RC, and all the shortest access paths form a tree rooted at S, which is
denoted as SPTðGC;RCÞ. We develop an efficient algorithm Build_SPT to construct
SPTðGC;RCÞ based on the current resident set RC. To facilitate the identification of a new
resident cluster, we also define V CðGC;RCÞ as the vicinity set of S, where 8Hx 2 V
CðGC;RCÞ, we have Hx 62 RC and Hx is a neighboring cluster of S.

27

Note that from Theorem, we know that the clusters in RC are connected. Thus, we only
need to consider clusters in V CðGC;RCÞ when looking for a potential cluster to be added to RC
Build SPTðGC;RCÞ first constructs V CðGC;RCÞ by visiting all neighboring clusters of RC. If
a cluster Hx in V CðGC;RCÞ has more than one neighbor in RC, then one of them is chosen to
be the parent cluster. Next, Build SPTðGC;RCÞ traverses GC starting from clusters in V
CðGC;RCÞ. From a cluster Hx, it visits all Hx’s neighboring clusters. Assume that Hy is a
neighboring cluster of Hx.
When Build_SPT visits Hy from Hx, it assigns Hx as Hy’s parent if Hy does not have a
parent. In this case, Hy is in the same tree as Hx, and Hy’s tree root is set to Hx’s (which is a
cluster in RC). Since all read requests from Hy go through the root, say Hz, ArðHyÞ is added to
ArðHzÞ0 for later use (for new resident cluster identification). In case Hy already has a parent,
the distances to S via the original parent and via Hx are compared. If Hx offers a shorter path to
S, then Hy’s parent is reset to Hx and the corresponding adjustments are made.
To achieve a faster convergence for new RC identification, Hy’s parent is also changed to
Hx if Hx’s tree root Hz has a higher value of ArðHzÞ0, when the distances to S via Hy’s original
parent and via Hx are equal. The detailed algorithm for Build_SPT is given in the following
(assume that V CðGC;RCÞ is already identified). In the algorithm, each node Hx has several
fields. Hx:root and Hx:parent are the root and parent clusters of Hx, respectively. Hx:dist is the
distance from Hx to Hx’s root (at the end of the algorithm, it is the shortest distance). We also
use NextðHxÞ to denote the set of Hx’s neighbors.
Since all read requests from Hy go through the root, say Hz, ArðHyÞ is added to
ArðHzÞ0 for later use (for new resident cluster identification). In case Hy already has a parent,
the distances to S via the original parent and via Hx are compared. If Hx offers a shorter path to
S, then Hy’s parent is reset to Hx and the corresponding adjustments are made. The clusters in
GC access data hosted in RC along the shortest paths, and these paths and the clusters form a set
of the shortest path trees.

28

Actually, the check for Hy:dist > Hx:dist þ 1 in the algorithm is not necessary since a
queue is used (a node is always visited from a neighbor with the shortest distance to S). A
sample general graph GC with current resident set RC ¼ fH1;H2;H3g is shown in Figure. The
corresponding SPTðGC;RCÞ is shown in figure, where RC is represented by the super node
labeled as S. When constructing SPTðGC;RCÞ, S’s immediate neighbors, including H4, H5, H6,
H7, H8, and H9, are visited first. H4 is visited twice but H1 is selected as the parent since H4 is
visited from H1 first and there is no need for adjustment when it is visited the second time. From
the clusters nearest to S, the clusters that are two hops away from S, including H10, H11, H12,
H13, H14, and H15, are visited. Finally, the nodes that are further away from S are visited.
We develop a heuristic algorithm to find the new resident set for GC in a greedy manner.
We try to find a new resident cluster in V CðGC;RCÞ and, once found, update RC accordingly.
The algorithm is shown below. RC is initialized to fHMSCg. The algorithm first constructs
SPTðGC;RCÞ and identifies V CðGC;RCÞ. Then, a cluster Hy with the highest ArðHyÞ0 is
selected. If ArðHyÞ0 > wC, then Hy is added to RC. If ArðHyÞ0 _ wC, then the algorithm
terminates since no other nodes can be added to RC while reducing the access communication
cost. Note that, in each step, only one cluster can be added into RC because SPTðGC;RCÞ and
29

ArðHxÞ0 changes when RC changes. RC fHMSCg; Repeat { Build SPTðGC;RCÞ; Select a
cluster Hy, where Hy has the maximum ArðHyÞ0 among all clusters in V CðGC;RCÞ; If
ArðHyÞ0 > wC RC RC [ fHyg; } Until ðArðHyÞ0 _ wCÞ Now, we analyze the complexity of
the heuristic resident set algorithm. Build_SPT has a time complexity OðP _ degÞ, where P is the
number of clusters in GC and deg is the maximal degree of vertexes (clusters) in GC. Finding a
cluster Hy in V CðGC;RCÞ with the highest ArðHyÞ0 can be done when building
SPTðGC;RCÞ. Thus, the time complexity for the heuristic resident set algorithm is OðjRCj _ P _
degÞ. Note that the final resident set RC computed by Build_Tree is not always optimal [32].
The heuristic resident set algorithm works by adding a candidate cluster Hx in GC into RC at
each step, with ArðHxÞ0 > wC. By adding cluster Hx, the read cost is reduced by at least
ArðHxÞ0, and the update cost is increased by wC. Thus, the total cost of GC with resident RC is
less than that of GC with initial resident set fHMSCg, if jRCj >

Fig. 5.4 The performance impact of graph size.

Fig. 5.5 The performance impact of graph degree.

30

Fig. 5.6 The impact of update/read ratio.

OISAP SOLUTIONS:
Now, we only consider the cost inside a single cluster Hx. As discussed in Section , the
topology of Hx is a tree, denoted as T. For simplicity, we define the distance of each edge in T
uniformly as one hop. In the following, we first show two important properties of the OISAP
problem with a tree topology. Then, we give a heuristic algorithm to decide the numbers of
shares needed in Hx and where to place them. If the Hx’s resident set R is not connected, then R
consists of multiple disconnected subresident set R1;R2; . . .;Rn, where n > 1, and each
subresident set is connected. We say R is j þ connected in Hx, if and only if minðjRijÞ _ j, where
j > 0, Ri, for all i _ n, are subgraphs in Hx, and jRij is the number of server nodes in Ri. We
define Ri <pos Rj as follows: I f Ri <pos Rj, then 9Py, Pz 2 Hx, where Py 2 Ri ^ Pz 2 Rj, such
that Pz is an ancestor of Py in T.
Informally, nodes in Rj are closer to the root than nodes in Ri. Otherwise, Ri _pos Rj. 5.1
Performance of the OIRSP Heuristic Algorithm In this section, we compare the performance of
the OIRSP heuristic algorithm with the randomized K-replication, noreplication allocation, and
complete replication strategies. We study the impacts of three factors: 1) the graph size, which is
the number of clusters in the system) the graph degree, which is the average number of neighbors
of a cluster; and 3) the update/read ratio, which is the ratio of the total number of update requests
in the entire system to the average number of read requests issued from a single cluster (these are
the requests each cluster needs to process). The results are shown in Figure, in which HEU,
RKR, NR, and FR denote the OIRSP heuristic algorithm, the randomized K-replication, the no-
replication allocation, and the complete replication algorithms, respectively.

31

Figure shows the impact of graph size on the performance of the four algorithms. The
parameters are set as follows: cluster size ¼ 100, which means that there are 100 nodes in each
cluster, graph degree ¼ 5, and update=read ¼ 2. The results show that the OIRSP heuristic
algorithm incurs much lower communication cost than other replication strategies. Also, it can
be seen that with a larger graph size, the OIRSP heuristic algorithm achieves better performance
compared to the no-replication allocation strategy. The reason is obvious. With larger graph size,
the number of clusters that need replicas increases. Allocating share replicas to these clusters will
reduce communication cost, and hence, the heuristic algorithm shows a better performance than
the noreplication allocation strategy. The effect of graph degree is shown in Figure. The other
parameters are set as follows: cluster size ¼ 100, graph size ¼ 80, and update=read ¼ 2. From
the results, we can see that the performance gain of the heuristic algorithm becomes less
significant with the increasing graph degree. This is because the graph becomes more complete,
and the distances between nodes become smaller, when the graph degree increases. When the
graph degree becomes large, the communication cost for the complete replication strategy drops
more significantly than that for other algorithms. When graph degree _ 20, the communication
cost for complete replication becomes stable and it stays as twice of that of the heuristic
algorithm. With update=read ¼ 2, most clusters will not be allocated with share replicas, thus the
result of the heuristic algorithm is closer to (but is still better than) the other two strategies.
Compared to the complete replication strategy, the performance of our heuristic algorithm is
much better when the graph degree is small. This is because more clusters get unneeded replicas
in the complete replication strategy and the update cost increases significantly.
The effect of update/read ratio is shown in Figure. The parameters are set as follows: cluster size
¼ 100, graph size ¼ 80, and graph degree ¼ 5. With increasing update/read ratio, fewer clusters
should get replicas. So, the communication cost of the complete replication strategy increases
rapidly and becomes far worse than the other two replication strategies.

32

ROUTERS
It will automatically load the values and split the content into the unstructured tree in
undesired format.
Sending the data in three different routers randomly.
Here the data are generated randomly by using the topology generator.

The Efficiency of the OISAP SDP-Tree Algorithm:
The performance of SDP-tree algorithm is compared with the optimal allocation
algorithm and the randomize M-replication algorithm. In the experiments, the trees are generated
randomly by using the topology generator with changing N, D, and read/update ratio, where N is
the total number of nodes in the cluster, D is the maximum node degree, and read/update is the
ratio of the average number of read requests in the cluster to the total number of update requests
in the system. Two configurations are considered:

Fig.5.7 The impact of l with read/update=3.

Fig. 5.8 The impact of l with read/update= 30.

33

Fig. 5.9 The impact of m with read/update=3.

Fig. 5.10 The impact of m with read/update=30.

N ¼ 30, D ¼ 5, read=update ¼ 3 and 2) N ¼ 30, D ¼ 5, read=update ¼ 10. We vary m
and l to evaluate their impact on the performance of the algorithms. A larger m value results in
higher availability if some of the share holders are not available or compromised, where a larger
l value achieves better data confidentiality. Thus, different m and l values could be chosen based
on the requirements of the data. Note that we only show the results with N ¼ 30, because the
computation cost for obtaining the optimal solutions is prohibitive. The results are shown in
Figure, in which HEU denotes the heuristic algorithm, OPT denotes the optimal solution, and
RMR denotes the random M-replication algorithm. From Figure, we can see that the
communication costs using RMR are always the highest. By using RMR, the share replicas are
randomly allocated. So, the shares may not be close to clients with most frequent accesses. For
all configurations, the heuristic algorithm obtains near-optimal or optimal solutions. In fact, for
the worst individual case we have observed, the cost obtained by the heuristic algorithm is only
about 10 percent higher than the optimal algorithm. In most cases (about 75 percent), the
heuristic algorithm can obtain the optimal solutions. From Figures, it can be seen that with
increasing l, the communication costs of all solutions increase sharply. With higher l, a read or
update request needs to access a larger number of nodes that host shares, and, hence, incurs a
higher access cost. As can be seen, in general, m has little impact to the access performance.
Only when the extreme case is considered, with read=update ¼ 30 (most requests are read
34

requests) and m changes from 30 to 3, we can see the effect of increasing m to the access
performance. This is because the total number of read access frequency of each subtree (i.e., the
total number of read requests issued by the nodes inside the subtree) is higher than the total
number of update frequency in the entire cluster. Thus, in average, a certain number of shares
hosted in the cluster are sufficient to minimize the communication cost, with a specific read/
update ratio. With a reasonable read/update ratio, the number of shares required to minimize
access cost inside a cluster is usually small (note that a small number of shares can partition the
cluster into a much larger number of subtrees such that all of them are rooted at the neighboring
nodes of the resident set, i.e., the nodes host shares). Thus, a small value of m would be big
enough to provide sufficient shares to minimize the communication cost inside a cluster. In other
words, the value of m has little impact on the access cost inside a cluster.

DESTINATION
Performing the decryption function.
It is used to decrypt the ciphertext , to plaintext without any loss of data.
It is an inverse technique of the AES .
Receiving the original content.
Displaying the data.

5.4 DATAFLOW DIAGRAM
A data flow diagram (DFD) is a graphical representation of the "flow" of data through
an information system. DFDs can also be used for the visualization of data processing (structured
design). Decompose the context level diagrams to determine the functional requirements. A DFD
provides no information about the timing of processes, or about whether processes will operate
in sequence or in parallel. It is therefore quite different from a flowchart, which shows the flow
of control through an algorithm, allowing a reader to determine what operations will be
performed, in what order, and under what circumstances, but not what kinds of data will be input
to and output from the system, nor where the data will come from and go to, nor where the data
will be stored (all of which are shown on a DFD).

35

5.11 Data flow diagram

5.5 Input Design
Input design is the process of converting user-originated inputs to a computer-
based format to the application forms. Input design is one of the most expensive phases of the
operation of computerized system and is often the major problem of a system.

5.6 Output Design
Output design generally refers to the results and information that are generated by the
system for many end-users; output is the main reason for developing the system and the basis on
which they evaluate the usefulness of the application.
As the outputs are the most important sources of information to the users, better design
should improve the system’s relationships with us and also will help in decision-making. Form
design elaborates the way output is presented and the layout available for capturing information.

36

CHAPTER 6
SYSTEM TESTING

System Testing tests all components and modules that are new, changed, affected by a
change, or needed to form the complete application. The system test may require involvement of
other systems but this should be minimized as much as possible to reduce the risk of externally-
induced problems. Testing the interaction with other parts of the complete system comes in
Integration Testing. The emphasis in system testing is validating and verifying the functional
design specification and seeing how all the modules work together.
The first system test is often a smoke test. This is an informal quick-and-dirty run through
of the application’s major functions without bothering with details. The term comes from the
hardware testing practice of turning on a new piece of equipment for the first time and
considering it a success if it doesn’t start smoking or burst into flame.
System testing requires many test runs because it entails feature by feature validation of
behavior using a wide range of both normal and erroneous test inputs and data. The Test Plan is
critical here because it contains descriptions of the test cases, the sequence in which the tests
must be executed, and the documentation needed to be collected in each run.
When an error or defect is discovered, previously executed system tests must be rerun
after the repair is made to make sure that the modifications didn’t cause other problems. This
will be covered in more detail in the section on regression testing.

Sample Entry and Exit Criteria for System Testing
Entry Criteria
• Unit Testing for each module has been completed and approved; each module is under
version control
• An incident tracking plan has been approved
• A system testing environment has been established
• The system testing schedule is approved and in place

37

Exit Criteria
• Application meets all documented business and functional requirements
• No known critical defects prevent moving to the Integration Testing
• All appropriate parties have approved the completed tests
• A testing transition meeting has be held and the developers signed off

6.1 UNIT TESTING
A series of stand-alone tests are conducted during Unit Testing. Each test examines an
individual component that is new or has been modified. A unit test is also called a module test
because it tests the individual units of code that comprise the application.
Each test validates a single module that, based on the technical design documents, was
built to perform a certain task with the expectation that it will behave in a specific way or
produce specific results. Unit tests focus on functionality and reliability, and the entry and exit
criteria can be the same for each module or specific to a particular module. Unit testing is done
in a test environment prior to system integration. If a defect is discovered during a unit test, the
severity of the defect will dictate whether or not it will be fixed before the module is approved.

Sample Entry and Exit Criteria for Unit Testing
Entry Criteria
• Business Requirements are at least 80% complete and have been approved to-date
• Technical Design has been finalized and approved
• Development environment has been established and is stable
• Code development for the module is complete

Exit Criteria
• Code has version control in place
• No known major or critical defects prevents any modules from moving to System Testing
• A testing transition meeting has be held and the developers signed off
38

6.2 ACCEPTANCE TESTING
User Acceptance Testing is also called Beta testing, application testing, and end-user
testing. Whatever you choose to call it, it’s where testing moves from the hands of the IT
department into those of the business users. Software vendors often make extensive use of Beta
testing, some more formally than others, because they can get users to do it for free.
By the time UAT is ready to start, the IT staff has resolved in one way or another all the
defects they identified. Regardless of their best efforts, though, they probably don’t find all the
flaws in the application. A general rule of thumb is that no matter how bulletproof an application
seems when it goes into UAT, a user somewhere can still find a sequence of commands that will
produce an error.
To be of real use, UAT cannot be random users playing with the application. A mix of
business users with varying degrees of experience and subject matter expertise need to actively
participate in a controlled environment. Representatives from the group work with Testing
Coordinators to design and conduct tests that reflect activities and conditions seen in normal
business usage. Business users also participate in evaluating the results. This insures that the
application is tested in real-world situations and that the tests cover the full range of business
usage. The goal of UAT is to simulate realistic business activity and processes in the test
environment.
A phase of UAT called “Unstructured Testing” will be conducted whether or not it’s in
the Test Plan. Also known as guerilla testing, this is when business users bash away at the
keyboard to find the weakest parts of the application. In effect, they try to break it. Although it’s
a free-form test, it’s important that users who participate understand that they have to be able to
reproduce the steps that led to any errors they find. Otherwise it’s of no use.
A common occurrence in UAT is that once the business users start working with the
application they find that it doesn’t do exactly what they want it to do or that it does something
that, although correct, is not quite optimal. Investigation finds that the root cause is in the
Business Requirements, so the users will ask for a change. During UAT is when change control
must be most seriously enforced, but change control is beyond the scope of this paper. Suffice to
say that scope creep is especially dangerous in this late phase and must be avoided.

39

Sample Entry and Exit Criteria for User Acceptance Testing
Entry Criteria
• Integration testing signoff was obtained
• Business requirements have been met or renegotiated with the Business Sponsor or
representative
• UAT test scripts are ready for execution
• The testing environment is established
• Security requirements have been documented and necessary user access obtained

Exit Criteria
• UAT has been completed and approved by the user community in a transition meeting
• Change control is managing requested modifications and enhancements
• Business sponsor agrees that known defects do not impact a production release—no remaining
defects are rated 3, 2, or 1

6.3 TEST CASES
A test case is usually a single step, or occasionally a sequence of steps, to test the correct
behaviour/functionalities, features of an application. An expected result or expected outcome is
usually given.
A test case in software engineering is a set of conditions or variables under which a tester
will determine whether an application or software system is working correctly or not. The
mechanism for determining whether a software program or system has passed or failed such a
test is known as a test oracle. In some settings, an oracle could be a requirement or use case,
while in others it could be a heuristic. It may take many test cases to determine that a software
program or system is considered sufficiently scrutinized to be released. Test cases are often
referred to as test scripts, particularly when written. Written test cases are usually collected
into test suites.

40

CHAPTER 7
SYSTEM IMPLEMENTATION

Implementation is the most crucial stage in achieving a successful system and
giving the user’s confidence that the new system is workable and effective. Implementation of a
modified application to replace an existing one. This type of conversation is relatively easy to
handle, provide there are no major changes in the system. Each program is tested individually at
the time of development using the data and has verified that this program linked together in the
way specified in the programs specification, the computer system and its environment is tested to
the satisfaction of the user. The system that has been developed is accepted and proved to be
satisfactory for the user. And so the system is going to be implemented very soon. A simple
operating procedure is included so that the user can understand the different functions clearly and
quickly.
Initially as a first step the executable form of the application is to be created and loaded
in the common server machine which is accessible to the entire user and the server is to be
connected to a network. The final stage is to document the entire system which provides
components and the operating procedures of the system.
Making the new system available to a prepared set of users (the deployment), and
positioning on-going support and maintenance of the system within the Performing Organization
(the transition). At a finer level of detail, deploying the system consists of executing all steps
necessary to educate the Consumers on the use of the new system, placing the newly developed
system into production, confirming that all data required at the start of operations is available and
accurate, and validating that business functions that interact with the system are functioning
properly. Transitioning the system support responsibilities involves changing from a system
development to a system support and maintenance mode of operation, with ownership of the
new system moving from the Project Team to the Performing Organization. A key difference
between System Implementation and all other phases of the lifecycle is that all project activities
up to this point have been performed in safe, protected, and secure environments, where project
issues that arise have little or no impact on day-to-day business operations.

41

CHAPTER 8
CONCLUSION & FUTURE ENHANCEMENTS

8.1 CONCLUSION
We have combined data partitioning schemes (secret sharing scheme or erasure coding
scheme) with dynamic replication to achieve data survivability, security in data grids. The
replicas of the partitioned data need to be properly allocated to achieve the actual performance
gains. We have developed algorithms to allocate correlated data shares in large-scale peer-to-
peer data grids. To support scalability, we represent the data grid as a two-level clusterbased
topology and decompose the allocation problem into two subproblems: the OIRSP and OISAP.
The OIRSP determines which clusters need to maintain share replicas, and the OISAP
determines the number of share replicas needed in a cluster and their placements. Heuristic
algorithms are developed for the two subproblems. Experimental studies show that the heuristic
algorithms achieve good performance in reducing communication cost and are close to optimal
solutions.

8.2 FUTURE ENHANCEMENT
Several future research directions can be investigated. First, the secure storage
mechanisms developed in this paper can also be used for key storage. In this alternate scheme,
critical data objects are encrypted and replicated. The encryption keys are partitioned and the key
shares are replicated and distributed. To minimize the access cost, allocation of the replicas of a
data object and the replicas of its key shares should be considered together. We plan to construct
the cost model for this approach and expand our algorithm to find best placement solutions.
Also, the two approaches (partitioning data or partitioning keys) have pros and cons in terms of
storage and access cost and have different security and availability implications. Moreover, it
may be desirable to consider multiple factors for the allocation of secret shares and their replicas.
Replicating data shares improves access performance but degrades security. Having more share
replicas may increase the chance of shares being compromised. Thus, it is desirable to determine
the placement solutions based on multiple objectives, including performance, availability, and
security.
42

CHAPTER 9
APPENDIX

1. SOURCE CODE

// Source File Name: test.java

import java.io.PrintStream;
public class test
{
public test()
{
}
public static void main(String args[])
{
test t = new test();
String pltxt = "Hello";
pltxt.replace('r', '_');
System.out.println((new StringBuilder(("PLTXT:")).append(pltxt).append("Length:").
append(pltxt.length()).toString());
int bits = 192;
String kk = "1";
int dips;
if((dips = kk.length()) < bits)
{
for(int i = 0; i < bits - dips; i++)
kk = (new StringBuilder(String.valueOf(kk))).append("0").toString();

}
AES Ins = new AES(bits / 32);
43

String cptxt = Ins.encrypt(pltxt, kk, bits);
System.out.println((new StringBuilder("Encrptor")).append(cptxt).toString());
bits = 192;
{
}
Ins = new AES(bits / 32);
pltxt = Ins.decrypt(cptxt, kk, bits);
System.out.println((new StringBuilder("Decrptor")).append(pltxt.toString()).toString());
}
}

// Source File Name: Routers.java

import java.awt.*;
import java.awt.event.*;
import java.io.*;
import java.net.*;
import java.util.Random;
import javax.swing.*;

public class Routers implements ActionListener
{
class PortListener implements Runnable
{

public void run()
{
44

if(port != 1001)
break MISSING_BLOCK_LABEL_2495;
server = new ServerSocket(port);
_L3:
StringBuffer buffer;
Socket client;
connection = server.accept();
br=new BufferedReader(new InputStreamReader(new BufferedInputStream(connection.
getInputStream())));
buffer = new StringBuffer();
System.out.println("hi");
String strLine;
while((strLine = br.readLine()) != null)
{
System.out.println(strLine);
buffer.append((new StringBuilder(String.valueOf(strLine))).append("n").toString());
}
br.close();
connection.close();
tf.setText(buffer.toString());
int len = buffer.length();
String bytes = Integer.toString(len);
Random r = new Random();
Random rm = new Random();
Random ran = new Random();
int p1 = Math.abs(r.nextInt()) % 5 + 1;

45

int q1 = Math.abs(r.nextInt()) % 5 + 1;
int r1 = Math.abs(r.nextInt()) % 5 + 1;
String aString = Integer.toString(p1);
String bString = Integer.toString(p2);
String cString = Integer.toString(p3);
String dString = Integer.toString(q1);
Tl1.setText(aString.toString());
Tl2.setText(bString.toString());
Tl3.setText(cString.toString());
Tl4.setText(dString.toString());
Tl5.setText(eString.toString());
Tl6.setText(fString.toString());
Tl7.setText(gString.toString());
System.out.println((new StringBuilder(" 1t:")).append(p1).toString());
if(p1 < p2 && p1 < p3)
{
if(p1 == p11 || p1 == p21 || p1 == p31)
System.out.println("hello1");
JOptionPane.showMessageDialog(jf, "2C accessing Same Node");
System.out.println((new StringBuilder("First Integer")).append(p1).toString());
ImageIcon v1 = new ImageIcon(getClass().getResource("green.png"));
imageLl.setIcon(v1);
imageLl.setBounds(80, 120, 120, 90);

46

imageLl.setToolTipText(bytes);
c.add(imageLl, 1);
}
else if(p2 < p1 && p2 < p3)
{
System.out.println((new StringBuilder("Second Integer")).append(p2).toString());
c.add(imageLl, 1);
}
else if(p3 < p1 && p3 < p2)
{
System.out.println((new StringBuilder("Third Integer")).append(p3).toString());
c.add(imageLl, 1);
}
if(q1 < q2 && q1 < q3)
{
if(p1 == p11 || p1 == p21 || p1 == p31)
ill.setIcon(v1);
ill.setBounds(340, 120, 120, 90);

47

c.add(ill, 1);
ill.setToolTipText(bytes);
}
else if(q2 < q3 && q2 < q1)
{
ill.setIcon(v1);
ill.setBounds(340, 300, 120, 90);
c.add(ill, 1);
}
else if(q3 < q2 && q3 < q1)
{
ill.setIcon(v1);
ill.setBounds(340, 480, 120, 90);
c.add(ill, 1);
}
if(r1 < r2 && r1 < r3)
{
if(p1 == p11 || p1 == p21 || p1 == p31)
imageLabel.setIcon(v1);
imageLabel.setBounds(600, 120, 120, 90);

48

c.add(imageLabel, 1);
imageLabel.setToolTipText(bytes);
}
else if(r2 < r3 && r2 < r1)
{
}
else if(r3 < r1 && r3 < r2)
{
}
test t = new test();
String pltxt = buffer.toString();
pltxt.replace('r', '_');
System.out.println((new StringBuilder("PLTXT:")).append(pltxt).append("Length:").
append(pltxt.length()).toString());
int bits = 192;
String kk = "1";
int dips;

49

{

}
AES Ins = new AES(bits / 32);
String cptxt = Ins.encrypt(pltxt, kk, bits);
System.out.println((new StringBuilder("Encrptor")).append(cptxt).toString());
tf.setText(cptxt.toString());
cptxt = tf.getText();
bits = 192;
{

}
Ins = new AES(bits / 32);
pltxt = Ins.decrypt(cptxt, kk, bits);
System.out.println((new StringBuilder("Decrptor")).append(pltxt.toString()).toString());
client = null;
client = new Socket("127.0.0.1", 8889);
bos = new BufferedOutputStream(client.getOutputStream());
byte byteArray[] = buffer.toString().getBytes();
bos.write(byteArray, 0, byteArray.length);
bos.flush();
bos.close();
client.close();
goto _L1
UnknownHostException e1;

50

e1;
e1.printStackTrace();
if(bos != null)
try
{
bos.close();
}
catch(IOException e1)
{
}
try
{
client.close();
}
{
}
continue; /* Loop/switch isn't completed */
e1;
if(bos != null)
try
{
bos.close();
}
{

51

}
try
{
client.close();
}
{
}
continue; /* Loop/switch isn't completed */
Exception exception;
exception;
if(bos != null)
try
{
bos.close();
}
{
}
try
{
client.close();
}
{
}
throw exception;

52

_L1:
if(bos != null)
try
{
bos.close();
}
{
}
try
{
client.close();
}
{
}
if(true) goto _L3; else goto _L2
_L2:
IOException ioexception;
ioexception;
}

Routers ip1;
BufferedOutputStream bos;
ServerSocket server;
Socket connection;
BufferedReader br;
int port;

53

final Routers this$0;

public PortListener(int port)
{
this$0 = Routers.this;
super();
bos = null;
br = null;
this.port = port;
}
}

Routers()
{
f0 = new Font("Verdana", 1, 35);
f = new Font("Times New roman", 1, 18);
f1 = new Font("Calibrie", 3, 25);
l = new JLabel("Received File");
c1 = new JLabel("UnStructed P2P");
l1 = new JLabel("Energy :");
l2 = new JLabel("Packet Size:");
l3 = new JLabel("Number Of Packet :");
Tl1 = new JLabel("");
Tl2 = new JLabel("");
T1 = new JTextField("");
pane = new JScrollPane();
tf = new JTextArea();
graph = new JButton("Graphical");
Sub = new JButton("Submit");

54

Exit = new JButton("Exit");
imageLabel = new JLabel();
imageLl = new JLabel();
ill = new JLabel();
image1 = new JLabel();
image2 = new JLabel();
ImageIcon b1 = new ImageIcon(getClass().getResource("cement.png"));
image1.setIcon(b1);
image1.setBounds(80, 120, 120, 100);
image2.setBounds(340, 120, 120, 100);
jf = new JFrame("Router");
c = jf.getContentPane();
c.setLayout(null);
jf.setSize(800, 670);
c.setBackground(new Color(250, 215, 150));
l.setBounds(650, 100, 200, 50);
l1.setBounds(30, 170, 250, 50);
l2.setBounds(30, 270, 250, 50);
l3.setBounds(30, 370, 250, 50);
c1.setFont(f0);
c1.setBounds(370, 30, 450, 50);
l1.setFont(f);
l2.setFont(f);
l3.setFont(f);
l.setForeground(Color.GREEN);
l1.setForeground(Color.CYAN);
Tl1.setBounds(110, 210, 50, 25);
Tl1.setBackground(new Color(250, 215, 150));

55

pane.setBounds(700, 170, 300, 360);
tf.setColumns(20);
tf.setRows(10);
tf.setForeground(new Color(120, 0, 0));
tf.setFont(f);
tf.setBackground(new Color(246, 233, 191));
tf.setName("tf");
pane.setName("pane");
pane.setViewportView(tf);
l.setFont(f);
T1.setFont(f);
Sub.setFont(f);
Exit.setFont(f);
graph.setFont(f);
T1.setBounds(200, 100, 350, 50);
Sub.setBounds(700, 80, 120, 35);
Exit.setBounds(410, 640, 200, 40);
Exit.setBackground(new Color(151, 232, 158));
graph.setBounds(140, 520, 200, 40);
graph.setBackground(new Color(151, 232, 158));
T1.setBackground(Color.white);
T1.setForeground(Color.white);
Exit.setForeground(Color.BLACK);
c.add(pane, "Center");
c.add(Tl1);
Sub.addActionListener(this);
c1.setForeground(Color.RED);
Sub.setBackground(new Color(151, 232, 158));
jf.show();
c.add(c1);

56

Sub.addActionListener(this);
Exit.addActionListener(this);
jf.addWindowListener(new WindowAdapter())

{
public void windowClosing(WindowEvent win)
{
System.exit(0) ;
}
final Routers this$0;

{
this$0 = Routers.this;
super();
}
}
);
int ports[] = {
1001
};
for(int i = 0; i < 1; i++)
{
Thread t = new Thread(new PortListener(ports[i]));
t.setName((new StringBuilder("Listener-")).append(ports[i]).toString());
t.start();
}

c.add(image1);
c.add(image2);

57

}
public static void main(String args[])
{
new Routers();
}

public void actionPerformed(ActionEvent e)
{
if(e.getSource() == Exit)
{
jf.setVisible(false);
System.exit(0);
}
}

public void actionPerformed(Object e)
{
if(((Image)e).getSource() == imageLl)
}

public void setInvisible(boolean flag)
{
}
public Font f0;
public Font f;
public Font f1;
}

58

2. SCREEN SHOTS

Source to get the input data

59

Source with the input data

60

Router with unstructured peer to peer

61

Router with unstructured peer to peer randomly generating encrypted data

62

Destination node before receiving the data

63

Destination node after receiving the original data

64

CHAPTER 10

REFERENCES

• S. Arora, P. Raghavan, and S. Rao, “Approximation Schemes for Euclidean k-Medians
and Related Problems,” Proc. 30th ACM Symp. Theory of Computing (STOC), 1998.
• M. Baker, R. Buyya, and D. Laforenza, “Grids and Grid Technology for Wide-Area
Distributed Computing,” Software- Practice and Experience, 2002.
• Chervenak, E. Deelman, I. Foster, L. Guy, W. Hoschek, C. Kesselman, P. Kunszt, M.
Ripeanu, B. Schwartzkopf, H. Stockinger, and B. Tierney, “Giggle: A Framework for
Constructing Scalable Replica Location Services,” Proc. ACM/IEEE Conf.
Supercomputing (SC), 2002.
• Y. Deswarte, L. Blain, and J.C. Fabre, “Intrusion Tolerance in Distributed Computing
Systems,” Proc. IEEE Symp. Research in Security and Privacy, 1991.
http://csepi.utdallas.edu/epc_center.htm, 2008.
• Foster and A. Lamnitche, “On Death, Taxes, and Convergence of Peer-to-Peer and Grid
Computing,” Proc. Second Int’l Workshop Peer-to-Peer Systems (IPTPS), 2003.

65

070105618001 070105618006-070105618015-070105618021

Recommandé

Recommandé

Contenu connexe

Similaire à 070105618001 070105618006-070105618015-070105618021

Similaire à 070105618001 070105618006-070105618015-070105618021 (20)

070105618001 070105618006-070105618015-070105618021