SlideShare une entreprise Scribd logo
1  sur  106
Télécharger pour lire hors ligne
TThhee
SSuuppeerr CCoommppuutteerr
PPAARRAALLLLEEXX –– TTHHEE SSUUPPEERR CCOOMMPPUUTTEERR
A PROJECT REPORT
Submitted by
Mr. AMIT KUMAR
Mr. ANKIT SINGH
Mr. SUSHANT BHADKAMKAR
in partial fulfillment for the award of the degree
Of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE
GUIDE: MR. ANIL KADAM
AISSMS’S COLLEGE OF ENGINEERING, PUNE
UNIVERSITY OF PUNE
2007 - 2008
CERTIFICATE
Certified that this project report “Parallex - The Super Computer” is
the bonafide work of
Mr. AMIT KUMAR (Seat No. :: B3*****7)
Mr. ANKIT SINGH (Seat No. :: B3*****8)
Mr. SUSHANT BHADKAMKAR (Seat No. :: B3*****2)
who carried out the project work under my supervision.
Prof. M. A. Pradhan Prof. Anil Kadam
HEAD OF DEPARTMENT GUIDE
Acknowledgment
The success of any project is never limited to an individual undertaking
the project. It is the collective effort of people around the individual that
spell success. There are some key personalities involved whose role has
been very vital to pave way for the success of the project. We take the
opportunity to express our sincere thanks and gratitude to them.
We would like to thank all the faculties (teaching & non-teaching) of
Computer Engineering Department of AISSMS College of Engineering,
Pune. Our project guide Prof. Anil Kadam was very generous in his
time and knowledge with us. We are grateful to Mr. Shasikant
Athavale who was the source of constant motivation and inspiration for
us. We are very thankful and obliged by the valuable suggestions
constantly given by Prof. Nitin Talhar and Ms. Sonali Nalamwar
which proved to be very helpful for the success of our project. Our
deepest gratitude to Prof. M. A. Pradhan for her thoughtful comments
accompanied with her gentle support during the academics.
We would like to thank the college authorities for providing us with full
support regarding lab, network and related software.
Abstract
Parallex is a parallel processing cluster consisting of control nodes and
execution nodes. Our implementation removes all the requirements of kernel level
modification and kernel patches to run a Beowulf cluster system. There can be many
control nodes in a typical Parallex cluster. The many control nodes will no longer just
monitor but will also take part in execution if resources permit. We have removed all
the restrictions of kernel, architecture and platform dependencies making out cluster
system work with completely different sets of CPU powers, operating systems, and
architectures, that too without the use of any existing parallel libraries, such as MPI
and PVM.
With a radically new perspective of how parallel system is supposed to be, we
have implemented our own distribution algorithms and parallel algorithms aimed at
ease of administration and simplicity of usage, without compromising the efficiency.
With a fully modular 7-step design we attack the traditional complications and
deficiencies in existing parallel system, such as redundancy, scheduling, cluster
accounting and parallel monitoring.
A typical Parallex cluster may consist of a few old-386 running NetBSD,
some ultra modern Intel – Dual Core running Linux, and some server class MIPS
processor running IRIX, all working in parallel with full homogeneity.
Table of Contents
Chapter No. Title Page No.
LIST OF FIGURES I
LIST OF TABLES II
1. A General Introduction
1.1 Basic concepts 1
1.2 Promises and Challenges 5
1.2.1 Processing technology 6
1.2.2 Networking technology 6
1.2.3 Software tools and technology 7
1.3 Current scenario 8
1.3.1 End user perspectives 8
1.3.2 Industrial perspective 8
1.3.3 Developers, researchers & scientists perspective 9
1.4 Obstacles and Why we don’t have 10 GHz today 9
1.5 Myths and Realities: 2 x 3 GHz < 6GHz 10
1.6 The problem statement 11
1.7 About PARALLEX 11
1.8 Motivation 12
1.9 Feature of PARALLEX 13
1.10 Why our design is “alternative” to parallel system 13
1.11 Innovation 14
2. REQURIREMENT ANALYSIS 16
2.1 Determining the overall mission of Parallex 16
2.2 Functional requirement for Parallex system 16
2.3 Non-functional requirement for system 17
3. PROJECT PLAN 19
4. SYSTEM DESIGN 21
5. IMPLEMENTATION DETAIL 24
5.1 Hardware architecture 24
5.2 Software architecture 26
5.3 Description for software behavior 28
5.3.1 Events 32
5.3.2 States 32
6. TECNOLOGIES USED 33
6.1 General terms 33
7. TESTING 35
8. COST ESTIMATION 44
9. USER MANUAL 45
9.1 Dedicated cluster setup 45
9.1.1 BProc Configuration 45
9.1.2 Bringing up BProc 47
9.1.3 Build phase 2 image 48
9.1.4 Loading phase 2 image 48
9.1.5 Using the cluster 49
9.1.6 Managing the cluster 50
9.1.7 Troubleshooting techniques 51
9.2 Share cluster setup 52
9.2.1 DHCP 52
9.2.2 NFS 54
9.2.2.1 Running NFS 55
9.2.3 SSH 57
9.2.3.1 Using SSH 60
9.2.4 Host file and name service 65
9.3 Working with PARALLEX 65
10. CONCLUSION 67
11. FUTURE ENHANCEMENT 68
12. REFERENCE 69
APPENDIX A 70 – 77
APPENDIX B 78 – 88
GLOSSARY 89 – 92
MEMORABLE JOURNEY (PHOTOS) 93 – 95
PARALLEX ACHIEVEMENTS 96 - 97
I. LIST OF FIGURES:
1.1 High-performance distributed system.
1.2 Transistor vs. Clock Speed
4.1 Design Framework
4.2 Parallex Design
5.1 Parallel System H/W Architecture
5.2 Parallel System S/W Architecture
7.1 Cyclomatic Diagram for the system
7.2 System Usage pattern
7.3 Histogram
7.4 One frame from Complex Rendering on Parallex: Simulation of an
explosion
II. LIST OF TABLES:
1.1 Project Plan
7.1 Logic/ coverage/decidion Testing
7.2 Functional Test
7.3 Console Test cases
7.4 Black box Testing
7.5 Benchmark Results
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 1 -
Chapter 1. A General Introduction
1.1 BASIC CONCEPTS
The last two decades spawned a revolution in the world of computing; a move away
from central mainframe-based computing to network-based computing. Today,
servers are fast achieving the levels of CPU performance, memory capacity, and I/O
bandwidth once available only in mainframes, at cost orders of magnitude below that
of a mainframe. Servers are being used to solve computationally intensive problems
in science and engineering that once belonged exclusively to the domain of
supercomputers. A distributed computing system is the system architecture that makes
a collection of heterogeneous computers, workstations, or servers act and behave as a
single computing system. In such a computing environment, users can uniformly
access and name local or remote resources, and run processes from anywhere in the
system, without being aware of which computers their processes are running on.
Distributed computing systems have been studied extensively by researchers, and a
great many claims and benefits have been made for using such systems. In fact, it is
hard to rule out any desirable feature of a computing system that has not been claimed
to be offered by a distributed system [24]. However, the current advances in
processing and networking technology and software tools make it feasible to achieve
the following advantages:
• Increased performance. The existence of multiple computers in a distributed system
allows applications to be processed in parallel and thus improves application and
system performance. For example, the performance of a file system can be improved
by replicating its functions over several computers; the file replication allows several
applications to access that file system in parallel. Furthermore, file replication
distributes network traffic associated with file access across the various sites and thus
reduces network contention and queuing delays.
• Sharing of resources. Distributed systems are cost-effective and enable efficient
access to all system resources. Users can share special purpose and sometimes
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 2 -
expensive hardware and software resources such as database servers, compute servers,
virtual reality servers, multimedia information servers, and printer servers, to name
just a few.
• Increased extendibility. Distributed systems can be designed to be modular and
adaptive so that for certain computations, the system will configure itself to include a
large number of computers and resources, while in other instances, it will just consist
of a few resources. Furthermore, limitations in file system capacity and computing
power can be overcome by adding more computers and file servers to the system
incrementally.
• Increased reliability, availability, and fault tolerance. The existence of multiple
computing and storage resources in a system makes it attractive and cost-effective to
introduce fault tolerance to distributed systems. The system can tolerate the failure in
one computer by allocating its tasks to another available computer. Furthermore, by
replicating system functions and/or resources, the system can tolerate one or more
component failures.
• Cost-effectiveness. The performance of computers has been approximately doubling
every two years, while their cost has decreased by half every year during the last
decade. Furthermore, the emerging high speed network technology [e.g., wave-
division multiplexing, asynchronous transfer mode (ATM)] will make the
development of distributed systems attractive in terms of the price/performance ratio
compared to that of parallel computers. These advantages cannot be achieved easily
because designing a general purpose distributed computing system is several orders of
magnitude more difficult than designing centralized computing systems—designing a
reliable general-purpose distributed system involves a large number of options and
decisions, such as the physical system configuration, communication network and
computing platform characteristics, task scheduling and resource allocation policies
and mechanisms, consistency control, concurrency control, and security, to name just
a few. The difficulties can be attributed to many factors related to the lack of maturity
in the distributed computing field, the asynchronous and independent behavior of the
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 3 -
systems, and the geographic dispersion of the system resources. These are
summarized in the following points:
• There is a lack of a proper understanding of distributed computing theory—the field
is relatively new and we need to design and experiment with a large number of
general-purpose reliable distributed systems with different architectures before we can
master the theory of designing such computing systems. One interesting explanation
for the lack of understanding of the design process of distributed systems was given
by Mullender. Mullender compared the design of a distributed system to the design of
a reliable national railway system that took a century and half to be fully understood
and mature. Similarly, distributed systems (which have been around for
approximately two decades) need to evolve into several generations of different
design architectures before their designs, structures, and programming techniques can
be fully understood and mature.
• The asynchronous and independent behavior of the system resources and/or
(hardware and software) components complicate the control software that aims at
making them operate as one centralized computing system. If the computers are
structured in a master–slave relationship, the control software is easier to develop and
system behavior is more predictable. However, this structure is in conflict with the
distributed system property that requires computers to operate independently and
asynchronously.
• The use of a communication network to interconnect the computers introduces
another level of complexity. Distributed system designers not only have to master the
design of the computing systems and system software and services, but also have to
master the design of reliable communication networks, how to achieve
synchronization and consistency, and how to handle faults in a system composed of
geographically dispersed heterogeneous computers. The number of resources
involved in a system can vary from a few to hundreds, thousands, or even hundreds of
thousands of computing and storage resources.
Despite these difficulties, there has been limited success in designing special-purpose
distributed systems such as banking systems, online transaction systems, and point-of-
sale systems. However, the design of a general purpose reliable distributed system
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 4 -
that has the advantages of both centralized systems (accessibility, management, and
coherence) and networked systems (sharing, growth, cost, and autonomy) is still a
challenging task. Kleinrock makes an interesting analogy between the human-made
computing systems and the brain. He points out that the brain is organized and
structured very differently from our present computing machines. Nature has been
extremely successful in implementing distributed systems that are far more intelligent
and impressive than any computing machines humans have yet devised. We have
succeeded in manufacturing highly complex devices capable of high speed
computation and massive accurate memory, but we have not gained sufficient
understanding of distributed systems; our systems are still highly constrained and
rigid in their construction and behavior. The gap between natural and man-made
systems is huge, and more research is required to bridge this gap and to design better
distributed systems. In the next section we present a design framework to better
understand the architectural design issues involved in developing and implementing
high performance distributed computing systems. A high-performance distributed
system (HPDS) (Figure 1.1) includes a wide range of computing resources, such as
workstations, PCs, minicomputers, mainframes, supercomputers, and other special-
purpose hardware units. The underlying network interconnecting the system resources
can span LANs, MANs, and even WANs, can have different topologies (e.g., bus,
ring, full connectivity, random interconnect), and can support a wide range of
communication protocols.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 5 -
Fig. 1.1 High-performance distributed system.
1.2 PROMISES AND CHALLENGES OF PARALLEL AND
DISTRIBUTED SYSTEMS
The proliferation of high-performance systems and the emergence of high speed
networks (terabit networks) have attracted a lot of interest in parallel and distributed
computing. The driving forces toward this end will be
(1) The advances in processing technology,
(2) The availability of high-speed network, and
(3) The increasing research efforts directed toward the development of software
support and programming environments for distributed computing.
Further, with the increasing requirements for computing power and the diversity in
the computing requirements, it is apparent that no single computing platform will
meet all these requirements. Consequently, future computing environments need to
capitalize on and effectively utilize the existing heterogeneous computing resources.
Only parallel and distributed systems provide the potential of achieving such an
integration of resources and technologies in a feasible manner while retaining desired
usability and flexibility. Realization of this potential, however, requires advances on a
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 6 -
number of fronts: processing technology, network technology, and software tools and
environments.
1.2.1 Processing Technology
Distributed computing relies to a large extent on the processing power of the
individual nodes of the network. Microprocessor performance has been growing at a
rate of 35 to 70 percent during the last decade, and this trend shows no indication of
slowing down in the current decade. The enormous power of the future generations of
microprocessors, however, cannot be utilized without corresponding improvements in
memory and I/O systems. Research in main-memory technologies, high-performance
disk arrays, and high-speed I/O channels are, therefore, critical to utilize efficiently
the advances in processing technology and the development of cost-effective high
performance distributed computing.
1.2.2 Networking Technology
The performance of distributed algorithms depends to a large extent on the bandwidth
and latency of communication among work nodes. Achieving high bandwidth and
low latency involves not only fast hardware, but also efficient communication
protocols that minimize the software overhead. Developments in high-speed networks
provide gigabit bandwidths over local area networks as well as wide area networks at
moderate cost, thus increasing the geographical scope of high-performance distributed
systems.
The problem of providing the required communication bandwidth for distributed
computational algorithms is now relatively easy to solve given the mature state of
fiber-optic and optoelectronic device technologies. Achieving the low latencies
necessary, however, remains a challenge. Reducing latency requires progress on a
number of fronts. First, current communication protocols do not scale well to a high-
speed environment. To keep latencies low, it is desirable to execute the entire protocol
stack, up to the transport layer, in hardware. Second, the communication interface of
the operating system must be streamlined to allow direct transfer of data from the
network interface to the memory space of the application program. Finally, the speed
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 7 -
of light (approximately 5 microseconds per kilometer) poses the ultimate limit to
latency. In general, achieving low latency requires a two-pronged approach:
1. Latency reduction. Minimize protocol-processing overhead by using streamlined
protocols executed in hardware and by improving the network interface of the
operating system.
2. Latency hiding. Modify the computational algorithm to hide latency by pipelining
communication and computation. These problems are now perhaps most fundamental
to the success of parallel and distributed computing, a fact that is increasingly being
recognized by the research community.
1.2.3 Software Tools and Environments
The development of parallel and distributed applications is a nontrivial process and
requires a thorough understanding of the application and the architecture. Although a
parallel and distributed system provides the user with enormous computing power and
a great deal of flexibility, this flexibility implies increased degrees of freedom which
have to be optimized in order to fully exploit the benefits of the distributed system.
For example, during software development, the developer is required to select the
optimal hardware configuration for the particular application, the best decomposition
of the problem on the hardware configuration selected, and the best communication
and synchronization strategy to be used, and so on. The set of reasonable alternatives
that have to be evaluated in such an environment is very large, and selecting the best
alternative among these is a nontrivial task. Consequently, there is a need for a set of
simple and portable software development tools that can assist the developer in
appropriately distributing the application computations to make efficient use of the
underlying computing resources. Such a set of tools should span the software life
cycle and must support the developer during each stage of application development,
starting from the specification and design formulation stages, through the
programming, mapping, distribution, scheduling phases, tuning, and debugging
stages, up to the evaluation and maintenance stages.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 8 -
1.3 Current Scenario
The current scenario of the Parallel Systems can be viewed under three
perspectives. A common concept that applies to all of the following is the idea of
Total Ownership Cost (TOC). By far TOC is a common scale on which level of
computer processing is assessed worldwide. TOC is defined by the ratio of Total Cost
of Implementation and maintenance by the net throughput the parallel cluster delivers.
TOTAL COST OF IMPLEMENTATION AND MAINTENANCE
TOC = ------------------------------------------------------------------------------------
NETSYSTEM THROUGHPUT (IN FLOATING POINT / SEC)
1.3.1 End user perspectives
Various activities such as rendering, adobe Photoshop applications and
different processes come under this category. As there is increase in need of
processing power day by day it thereby increases hardware cost. From the end user
prospective the Parallel Systems aims to reduce the expenses and avoid the
complexities. At this stage we are trying to implement a Parallel System which is
more cost effective and user friendly. However, as the end user, TOC is less important
in most cases because Parallel Clusters could rarely be owned by a single user, and in
that case the net throughput of the Parallel System becomes the most crucial factor.
1.3.2 Industrial Perspective
In Corporate Sectors Parallel Systems are extensively implemented. Such a
Parallel Systems consist of machines that have to handle millions of nodes
theoretically not practically. From the industrial point of view the Parallel System
aims at resource isolation, replacing large scale dedicated commodity hardware and
Mainframes. Corporate sectors often place TOC as the primary criteria at which a
Parallel Cluster is judged. With increase in scalability, the cost of owing Parallel
Clusters shoot up to unmanageable heights and our primary aim is this area is to bring
down the TOC as much as possible.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 9 -
1.3.3 Developers, Researchers & Scientists Perspective
Scientific applications such as 3D simulations, high scale scientific rendering,
intense numerical calculations, complex programming logic, and large scale
implementation of algorithms (BLAS and FFT Libraries) require levels of processing
and calculation that no modern day dedicated vector CPU could possibly meet.
Consequently, the Parallel Systems are proven to be the only and the most efficient
alternative in order to keep pace with modern day scientific advancements and
research. TOC is rarely a matter of concern here.
1.4 Obstacles and Why we don’t have 10 GHz today…
Fig 1.2 Transistor vs. Clock Speed
CPU performance growth as we have known it hit a wall
Figure graphs the history of Intel chip introductions by clock speed and number of
transistors. The number of transistors continues to climb, at least for now. Clock
speed, however, is a different story.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 10 -
Around the beginning of 2003, you’ll note a disturbing sharp turn in the previous
trend toward ever-faster CPU clock speeds. We have added lines to show the limit
trends in maximum clock speed; instead of continuing on the previous path, as
indicated by the thin dotted line, there is a sharp flattening. It has become harder and
harder to exploit higher clock speeds due to not just one but several physical issues,
notably heat (too much of it and too hard to dissipate), power consumption (too high),
and current leakage problems.
Sure, Intel has samples of their chips running at even higher speeds in the
lab—but only by heroic efforts, such as attaching hideously impractical quantities of
cooling equipment. You won’t have that kind of cooling hardware in your office any
day soon, let alone on your lap while computing on the plane.
1.5 Myths and Realities: 2 x 3GHz < 6 GHz
So a dual-core CPU that combines two 3GHz cores practically offers 6GHz of
processing power. Right?
Wrong. Even having two threads running on two physical processors doesn’t
mean getting two times the performance. Similarly, most multi-threaded applications
won’t run twice as fast on a dual-core box. They should run faster than on a single-
core CPU; the performance gain just isn’t linear, that’s all.
Why not? First, there is coordination overhead between the cores to ensure
cache coherency (a consistent view of cache, and of main memory) and to perform
other handshaking. Today, a two- or four-processor machine isn’t really two or four
times as fast as a single CPU even for multi-threaded applications. The problem
remains essentially the same even when the CPUs in question sit on the same die.
Second, unless the two cores are running different processes, or different
threads of a single process that are well-written to run independently and almost never
wait for each other, they won’t be well utilized. (Despite this, we will speculate that
today’s single-threaded applications as actually used in the field could actually see a
performance boost for most users by going to a dual-core chip, not because the extra
core is actually doing anything useful, but because it is running the ad ware and spy
ware that infest many users’ systems and are otherwise slowing down the single CPU
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 11 -
that user has today. We leave it up to you to decide whether adding a CPU to run your
spy ware is the best solution to that problem.)
If you’re running a single-threaded application, then the application can only
make use of one core. There should be some speedup as the operating system and the
application can run on separate cores, but typically the OS isn’t going to be maxing
out the CPU anyway so one of the cores will be mostly idle. (Again, the spy ware can
share the OS’s core most of the time.)
1.6 The problem statement
So now let us summarize and define the problem statement:
• Since the growth of requirements of processing is far greater than the growth
of CPU power, and since the silicon chip is fast approaching its full capacity,
the implementation of parallel processing at every level of computing becomes
inevitable.
• There is a need to have a single and complete clustering solution which
requires minimum user interference but at the same time supports
editing/modifications to suit the user’s requirements.
• There should be no need to modify the existing applications.
• The parallel system must be able to support different platforms
• The system should be able to fully utilize all the available hardware resources
without the need of buying any extra/special kind of hardware.
1.7 About PARALLEX
While the term parallel is often used to describe clusters, they are more
correctly described as a type of distributed computing. Typically, the term parallel
computing refers to tightly coupled sets of computation. Distributed computing is
usually used to describe computing that spans multiple machines or multiple
locations. When several pieces of data are being processed simultaneously in the same
CPU, this might be called a parallel computation, but would never be described as a
distributed computation. Multiple CPUs within a single enclosure might be used for
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 12 -
parallel computing, but would not be an example of distributed computing. When
talking about systems of computers, the term parallel usually implies a homogenous
collection of computers, while distributed computing typically implies a more
heterogeneous collection. Computations that are done asynchronously are more likely
to be called distributed than parallel. Clearly, the terms parallel and distributed lie at
either end of a continuum of possible meanings. In any given instance, the exact
meanings depend upon the context. The distinction is more one of connotations than
of clearly established usage.
Parallex is both a parallel and distributed cluster because it supports both ideas
of multiple CPUs within a single enclosure as well as a heterogeneous collection
of computers.
1.8 Motivation
The motivation behind this project is to provide a cheap and easy to use
solution to cater to the high performance computing requirements of organizations
without the need to install any expensive hardware.
In many organizations including our college, we have observed that when old
systems are replaced by newer ones the older ones are generally dumped or sold at
throw away prices. We also wanted to find a solution to effectively use this “silicon
waste”. These wasted resources can be easily added to our system as the processing
need increases, because the parallel system is linearly scalable and hardware
independent. Thus the intent is to have an environment friendly and effective
solution that utilizes all the available CPU power to execute applications faster.
1.9 Features of Parallex
• Parallex simplifies the cluster setup, configuration and management process.
• It supports machines with hard disks as well as diskless machines running at
the same time.
• It is flexible in design and easily adaptable.
• Parallex does not require any special kind of hardware.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 13 -
• It is multi platform compatible.
• It ensures efficient utilization of silicon waste (old unused hardware).
• Parallex is scalable.
How these features are achieved and details of design will be discussed in subsequent
chapters.
1.10 Why our design is “Alternative” to parallel system?
Every renowned technology needs to evolve after a particular time as new
generation enhances the sort come of the technology used earlier. So what we
achieved is a bare bone line semantic of parallel system.
When we were studying about the parallel and distributed system, the
advantage is that we were working on the latest technology. The parallel system
designed by scientist, no doubt were far more genius and intelligent than us. Our
system is unique because we are actually splitting up the task according to processing
power of nodes instead of just load balancing. Hence a slow processing node will get
a smaller task compared to a faster one and all nodes will show the output the same
calculated time on master node.
We found some difficulties that how much task should be given to the
heterogeneous system in order to get result at same time. We worked on this problem
to find the solution and developed mathematical distribution algorithm which was
successfully implemented and functional. This algorithm breaks the task according to
the speed of the CPUs by sending a test application to all nodes and storing the return
time of each node into a file. Then we further worked on the automation of the entire
system. We were using password less secure shell login and network file system. We
were successful up to some extent but atomization was not possible to ssh and NFS
configuration. Hence manually setting up of new nodes every time is a demerit of ssh
and NFS. To overcome this demerit we sorted the alternative solution which is
Beowulf cluster, but after studying we concluded that it considered all nodes of same
configuration and send tasks equally to all nodes.
To improve our system we think differently from Beowulf cluster. We tried to
make system more cost effective. We thought of diskless cluster concept in order get
reed of hard disk to cut the cost and enhance the reliability of machine. The storage
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 14 -
device will affect the performance of entire system and increase the cost (due to
replacement of the disks) and increase the waste of time in searching the faults. So,
we studied & patched the Beowulf server & Beowulf distributed process space
according to our need for our system. We made a kernel images for running diskless
clusters using RARP protocol. When clusters runs kernel image in its memory, it
demands for IP from master node or can also be called as server. The server assigns
IP & node number of the clusters. By this, our diskless clusters system stands & ready
to use for parallel computing. Then we modified our various codes including our own
distribution algorithm, according to our new design. The best part of our system was
that there is no need for any authorization setup. Every thing is now automatic.
Till now, we were working on CODE LEVEL PARALLELISM. In this, we
little bit modify code to run on our system just like MPI libraries are used to make
code parallely executable. Now, the challenge with us was that what if we didn’t get
source code instead of which we will get binary file to execute it on our parallel
system. So, now we need to enhance our system by adding BINARY LEVEL
PARALLELISM. We studied Open Mosix. Once open Mosix is installed & all the
nodes are booted, the Open Mosix nodes see each other in the cluster and start
exchanging information about their load level and resource usage. Once the load
increases beyond the defined level, the process migrates to any other nodes on the
network. There might be a situation where process demands heavy resource usage, it
may happen that the process may keep migrating from node to node without been
serviced. This is the major design flaw of the Open Mosix. And we are working out to
find the solution.
So, Our Design is ALTERNATIVE to all problems in the world of parallel
computing.
1.11 Innovation
Firstly our system does not require any additional hardware if the existing
machines are well connected in a network. Secondly, even in a heterogeneous
environment, with few fast CPUs and a few slower ones, the efficiency of the system
does not drop by more than 1 to 5%, still maintaining an efficiency of around 80% for
suitably adapted applications. This is because the mathematical distribution algorithm
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 15 -
considers relative processing powers of the node distributing only the amount of load
that a node can process in the calculated optimal time of the system. All the nodes
will process respective tasks and produce output at this calculated time. The most
important point about our system is the ability to use diskless nodes in cluster, thereby
reducing hardware costs and space and the required maintenance. Also in case of
binary executables (when source code is not available) our system exhibits almost
20% performance gains.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 16 -
Chapter 2. Requirement Analysis
2.1 Determining the overall mission of Parallex
• User base: Students, educational institutes, small to medium business
organizations.
• Cluster usage: There will be one part of the cluster fully dedicated to solve the
problem at hand and an optional part where computing resources from
individual workstations are used. In the latter part, the parallel problems will
be having lower priorities.
• Software to be run on cluster: Depends upon the user base. At the cluster
management level, the system software will be Linux.
• Dedicated or shared cluster: As mentioned above it will be both.
• Extent of the cluster: Computers that are all on the same subnet
2.2 Functional Requirements for Parallex system
Functional Requirement 1
The PC’s must be connected in LAN so as to enable the system to be use without any
obstacles.
Functional Requirement 2
There will one master or controlling node which will distribute the task according to
the processing speed of the node.
Services
Three services are to be provided on the master.
1. There is a Network Monitoring tool for resource discovery (e.g. IP address,
MAC addresses, UP/DOWN Status etc.)
2. The Distribution Algorithm will distribute the task according to the current
processing speed of the nodes.
3. Parallex Master Script that will send the distributed task to the nodes and get
back the result and integrate it and gives out the output.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 17 -
Functional Requirement 3
The final size of the executable code so be such that it should reside in the limited
memory constraints on the machine.
Functional Requirement 4
This product will only be used to speed up the applications which are preexisting in
the enterprise.
2.3 Non-Functional Requirements for system
- Performance
Even in a heterogeneous environment, with few fast CPUs and a few slower ones, the
efficiency of the system does not drop by more than 1 to 5%, still maintaining an
efficiency of around 80% for suitably adapted applications. This is because the
mathematical distribution algorithm considers relative processing powers of the node
distributing only the amount of load that a node can process in the calculated optimal
time of the system. All the nodes will process respective tasks and produce output at
this calculated time. The most important point about our system is the ability to use
diskless nodes in cluster, thereby reducing hardware costs and space and the required
maintenance. Also in case of binary executables (when source code is not available)
our system exhibits almost 20% performance gains.
- Cost
While a system of n parallel processors is less efficient than one n times faster
processor, the Parallel System is often cheaper to build. Parallel computation is used
for tasks which require very large amounts of computation, take a lot of time, and can
be divided into n independent subtasks. In recent years, most high performance
computing systems, also known as supercomputers, have parallel architectures.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 18 -
- Manufacturing costs
No extra hardware required. Cost of setting up LAN.
- Benchmarks
There are at least three reasons for running benchmarks. First, a benchmark will
provide us with a baseline. If we make changes to our cluster or if we suspect
problems with our cluster, we can rerun the benchmark to see if performance is really
any different. Second, benchmarks are useful when comparing systems or cluster
configurations. They can provide a reasonable basis for selecting between
alternatives. Finally, benchmarks can be helpful with planning.
For benchmarking we will use a 3D rendering tool named Povray (Persistence Of
Vision Ray tracer, please see the Appendix for more details).
- Hardware required
x686 Class PCs (Linux (2.6x Kernels) installed with intranet connection)
Switch (100/10T)
Serial port connectors
100 BASE T LAN cable, RJ 45 connectors.
- Software Resources Required
Linux (2.6.x kernel)
Intel Compiler suite (Noncommercial)
LSB (Linux Standard Base) Set of GNU Kits with GNU CC/C++/F77/LD/AS
GNU Krell monitor
Number of PC’s connected in LAN
8 NODES in the LAN.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 19 -
Chapter 3. Project Plan
Plan of execution for the project was as follows:
Serial
No.
Activity Software
Used
Number Of
Days
1 Project Planning
a) Choosing domain
b) Identifying Key areas of
work
c) Requirement analysis
- 10
2 Basic Installation of LINUX. LINUX (2.6x
Kernel)
3
3 Brushing up on C programming Skills - 5
4 Shell Scripting LINUX (2.6x
Kernel), GNU
BASH
12
5 C Programming in LINUX Environment GNU C
Compiler
Suite
5
6 A Demo Project (Universal Sudoku
Solver)
To familiarize with LINUX
programming environment.
GNU C
Compiler
Suite , INTEL
Compiler suite
(Non-
commercial)
16
7 Study Advanced LINUX tools and
Installation of Packages & RED HAT
RPMs.
Iptraf, mc, tar,
rpm, awk, sed,
GNU plot,
strace, gdb, etc.
10
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 20 -
8 Studying Networking Basics & Network
configuration in LINUX.
- 8
9 Recompiling, Patching and
analyzing the system kernel
LINUX (Kernel
2.6x.x), GNU c
compiler
3
10 Study & implementation of Advanced
Networking Tools : SSH & NFS
ssh & Openssh,
nfs
7
11 a) Preparing the preliminary design of
the total workflow of the project.
b) Deciding the modules for overall
execution, and dividing the areas of the
concentration among the project group.
c) Build Stage I prototype
All of the above 17
12 Build Stage II prototype
(Replacing ssh by custom made
application)
All of the above 15
13 Build Stage III prototype
(Making Diskless Cluster)
All of the above 10
14 Testing & Building Final Packages All of the above 10
Table 1.1 Project Plan
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 21 -
Chapter 4. System Design
Generally speaking, the design process of a distributed system involves three main
activities:
(1) designing the communication system that enables the distributed system resources
and objects to exchange information,
(2) defining the system structure (architecture) and the system services that enable
multiple computers to act as a system rather than as a collection of computers, and
(3) defining the distributed computing programming techniques to develop parallel
and distributed applications.
Based on this notion of the design process, the distributed system design framework
can be described in terms of three layers:
(1) network, protocol, and interface (NPI) layer,
(2) system architecture and services (SAS) layer, and
(3) distributed computing paradigms (DCP) layer. In what follows, we describe the
main design issues to be addressed in each layer.
Fig. 4.1 Design Framework
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 22 -
• Communication network, protocol, and interface layer. This layer describes the
main components of the communication system that will be used for passing control
and information among the distributed system resources. This layer is decomposed
into three sub layers: network type, communication protocols, and network interfaces.
• Distributed system architecture and services layer. This layer represents the
designer’s and system manager’s view of the system. SAS layer defines the structure
and architecture and the system services (distributed file system, concurrency control,
redundancy management, load sharing and balancing, security service, etc.) that must
be supported by the distributed system in order to provide a single-image computing
System.
• Distributed computing paradigms layer. This layer represents the programmer
(user) perception of the distributed system. This layer focuses on the programming
paradigms that can be used to develop distributed applications. Distributed computing
paradigms can be broadly characterized based on the computation and communication
models. Parallel and distributed computations can be described in terms of two
paradigms: functional parallel and data parallel paradigms. In functional parallel
paradigm, the computations are divided into distinct functions which are then
assigned to different computers. In data parallel paradigm, all the computers run the
same program, the same program multiple data (SPMD) stream, but each computer
operates on different data streams.
With reference to Fig. 4.1, Parallex can be described as follows:
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 23 -
Fig. 4.2 Parallex Design
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 24 -
Chapter 5. Implementation Details
The goal of the project is to provide an efficient system that will handle process
parallelism with the help of Clusters. This parallelism will thereby reduce the time of
execution. Currently we form a cluster of 8 nodes. Using a single computer for
execution of any heavy process takes lot of time in execution. So here we are forming
a cluster and executing those processes in parallel by dividing the process into number
of sub processes. Depending on the nodes in cluster we migrate the process to those
node and when the execution is over then it brings back the output produced by them
to the Master node. By doing this we are reducing the process execution time and
increasing the CPU utilization.
5.1 Hardware Architecture
We have implemented a Shared Nothing Architecture of parallel system by
making use of Coarse Grain Cluster structure. The inter-connect is ordinary 8-port
switch and an optionally a Class-B or Class-C network. It is 3 level architecture:
1. Master topology
2. Slave Topology
3. Network interconnect
1. Master is a Linux running machine with a 2.6.x or 2.4.x (both under testing)
kernel. It runs the parallel-server and contains the application interface to drive the
remaining machines. The master runs a network scanning script to detect all the slaves
that are alive and retrieves all the necessary information about each slave. To
determine the load on each slave just before the processing of the main application,
the master sends a small diagnostic application to the slave to estimate the load it can
take at the present moment. Having collected all the relevant information, it does all
the scheduling, implementing of parallel algorithms (distributing the tasks according
processing power and current load), making use of CPU extensions (MMX, SSE,
3DNOW) depending upon the slave architecture, and everything except the execution
of the program itself. It accepts the input/task to be executed. It allocates the tasks to
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 25 -
underlying slave nodes constituting the parallel system, which execute the tasks in
parallel and return the output to the Master. Master plays the role of watchdog, which
may or may not participate in actual processing But manages the entire task.
2. Slave is a single system cluster image (SSCI). It is basically dedicated for
processing purpose. It accepts the sub-task along with the necessary library modules
executes them and returns the output back to the Master. In our case, the slaves would
be multi-boot capable systems, which could at one point of time be diskless cluster
hosts, at other time they might behave as a general purpose cluster node and at some
other time, they could act as normal CPU handling routine tasks of office and homes.
In case of Diskless Machines, the slave will boot on Pre-created kernel image patched
appropriately.
3. Network interconnection is to merge both Master and Slave topologies. It makes
use of an 8-port switch, RJ 45 connectors and serial CAT 5 cables. It is a Star
topology where the Master and the Slaves are interconnected through the Switch.
Fig. 5.1 Parallel System H/W Architecture
Cluster Monitoring: Each slave runs a server that collects the kernel processing / IO
/ memory / CPU and all the related details from PROC VIRTUAL file system and
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 26 -
forwards it to the MASTER NODE (here acting as a slave to each server running on
each slave), and a user base programs plots it interactively on the Server screen thus
showing the CPU / MEMORY / IO details of each node separately.
5.2 SOFTWARE ARCHITECTURE:-
This architecture consists of two parts i.e.
1. Master Architecture
2. Slave Architecture
Master consists of following levels.
1. Linux BIOS: Linux BIOS usually loads a Linux kernel.
2. Linux: Platform on which Master runs.
3. SSCI + Beoboot: This level extracts a single system cluster image used by
Slave nodes.
4. Fedora Core/ Red Hat: Actual Operating System running on Master.
5. System Services: Essential Services running on Master. Eg. RARP Resolver
Daemon.
Slave inherits the Master with the following levels.
1. Linux BIOS
2. Linux
3. SSCI
Fig 5.2 Parallel System S/W Architecture
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 27 -
Parallex is broadly divided in to following Modules:
1. Scheduler: this is the heart of out system. With radically new approach
towards data and instruction level distribution, we have implemented a
completely optimal heterogeneous cluster technology. We do task allocation
based on the actual processing capability on each node and not on the give
GHz power on the manual of the system. The task allocation is dynamic and
the scheduling policy is based on POSIX scheduling implementation. We are
also capable of implementing preemption, which we right now do not do in
favour of the fact that system such as Linux and FreeBSD are capable of
industry level preemption.
2. Job/instruction alligator: this is a set of remote fork like utility that allocates
the jobs to then nodes. Unlike traditional cluster technology, this job allocator
is capable of doing execution in disconnected mode that means that the
network latency would substantially reduce due to temporary disconnection.
3. Accounting: we have written a utility “remote cluster monitor” which is
capable of providing us samples of results from all the nodes, information
about the CPU load, temperature, and memory statistics. We propose that with
less than 0.2% of CPU power consumption, our network monitoring utility can
sample over 1000 nodes in less than 3 seconds.
4. Authentication: all transactions between the nodes are 128 bit encrypted and
do not require root privileges to run. Just a common user on all the standalone
node must exist. For the diskless part, we remove this restriction as well.
5. Resource discovery: we run our own socket layered resource discovery
utility, which discovers any additional nodes. Also reports if the resource has
been lost. In case of any additional hardware capable of being used as part of
parallel system, such as an additional processor to a system, or a replacement
of processor with dual core processor is also reported continually.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 28 -
6. Synchronizer: the central balancing of the cluster. Since the cluster is capable
of simultaneously running both the diskless, and standalone nodes as part of
the same cluster, the synchronizer makes the result more reasonable in output
is queued in real time so that data is not mixed up. It does instruction
dependency analysis, and also uses pipelines in the network to make
interconnect more communicative.
5.3 Description for software behavior
The end user will submit the process/application to the administrator in case
the application is source based, and the Cluster administrator owns the responsibility
to explicitly parallelize the application for maximum exploitation of parallel
architectures within the CPU and across the cluster nodes. In case the application is
binary ( non source), the user might himself/herself submit the code to Master node
program acceptor, which in turn would run the application with somewhat lower
efficiency as compared to the source submissions to the administrator. Now the total
system is responsible for minimizing the time of processing which in turn increases
the throughput and speed up the processing.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 29 -
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 30 -
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 31 -
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 32 -
5.3.1 Events
1. System Installation
2. Network initialization
3. Server and host configuration
4. Take input
5. Parallel execution
6. Send response
5.3.2 States
1. System Ready
2. System Busy
3. System Idle
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 33 -
Chapter 6. Technologies Used
6.1 General terms
We will now briefly define the general terms that will be used in further descriptions
or are related to our system.
Cluster: - Interconnection of large number of computers working together in close
synchronized manner to achieve higher performance, scalability and net
computational power.
Master: - Server machine which acts as the administrator of the entire parallel Cluster
and executes task scheduling.
Slave: - A client node which executes the task as given by the Master.
SSCI: - Single System Cluster Image is a hypothetical idea of implementing cluster
nodes into an image, where the cluster nodes will behave as if it were an additional
processor; add on ram etc. into the controlling Master computer. This is the base
theory of cluster level parallelism. Example implementations are, Multi node NUMA
(IBM/Sequent) Multi-quad computers, SGI ATIX Servers. However, the idea of true
SSCI remains unimplemented when it comes to heterogeneous clusters for parallel
processing, except for Supercomputing clusters such as Thunder and Earth Stimulator.
RARP: - Reverse Address Resolution
Protocol is a network layer protocol used to resolve an IP address from a
given hardware address (such as an Ethernet address / MAC Address).
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 34 -
BProc:-
The Beowulf Distributed Process Space (BProc) is set of kernel modifications,
utilities and libraries which allow a user to start processes on other machines in a
Beowulf-style cluster. Remote processes started with this mechanism appear in the
process table of the front end machine in a cluster. This allows remote process
management using the normal UNIX process control facilities. Signals are
transparently forwarded to remote processes and exit status is received using the usual
wait() mechanisms.
Having discussed the basic concepts of parallel and distributed systems, the problems
in this field, and an overview of Parallex, we now move forward with the requirement
analysis and design details of our system.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 35 -
Chapter 7. Testing
Logic Coverage/Decision Based: Test cases
SI
No
.
Test case name Test
Procedure
Pre-
condition
Expected
Result
Reference
to Detailed
Design
1. Initial_frame_fail Initial frame
not defined
None Parallex
should
give error
& exit
Distribution
algo
2. Final_frame_fail Final frame not
defined
None Parallex
should
give error
& exit
Distribution
algo
3. Initial_final_full Initial & Final
frame given
None Parallex
should
distribute
accordingt
to speed.
Distribution
Algo.
4. Input_file_name_
blank
No input file
given
None Input file
not found
Parallex
Master
5. Input_parameters
_blank
No parameters
defined at
command line
None Exit on
error
Parallex
Master
Table 7.1 Logic/ coverage/decidion Testing
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 36 -
Initial Functional Test Cases for Parallex
Use Case
Function Being
Tested
Initial System
State
Input Expected Output
System
Startup
Master is started
when the switch
is turned "on"
Master is off
Activate the
"on" switch
Master ON
System
Startup
Nodes is started
when the switch
is turned “on”
Nodes is ON
Activate the
"on" switch
NODES is ON
System
Startup
Nodes assigned
IP by master
Booting
Get boot Image
from Master
Master shows that
nodes are UP
System
Shutdown
System is shut
down when the
switch is turned
"off"
System is on and
not servicing a
customer
Activate the
"off" switch
System is off
System
Shutdown
Connection to the
Master is
terminated when
the system is shut
down
System has just
been shut down
Verify from the
Master side that a
connection to the
Slave no longer
exists
Session
System reads a
customer's
Program
System is on and
not servicing a
customer
Insert a readable
Code/Program
Program accepted
Session
System rejects an
unreadable
Program
System is on and
not servicing a
customer
Insert an
unreadable
Code/ program
Program is
rejected; System
displays an error
screen; System is
ready to start a new
sesion
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 37 -
Use Case
Function Being
Tested
Initial System
State
Input Expected Output
System
Startup
Master is started
when the switch
is turned "on"
Master is off
Activate the
"on" switch
Master ON
Session
System accepts
customer's
Program
System is asking
for entry of
RANGE of
calculation
Enter a RANGE
System gets the
RANGE
Session
System breaks
the task
System is
breaking task
according to
processing speed
of Nodes.
Perform
distribution
Algo
System breaks task
& write into a file.
Session
System feeds the
task to Nodes for
processing
System feeds
tasks to the
nodes for
execution
Send tasks
System displays a
menu of task
running on Nodes
Session
Session ends
when all nodes
gives out output
System is
getting output of
all nodes &
display the
output & ends
Get the output
from all nodes.
System displays
the output & quit.
Table 7.2 Functional Test
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 38 -
Cyclomatic Complexity:
Control Flow Graph of a System:
Fig 7.1 Cyclomatic Diagram for the system
Cyclomatic complexity is a software metric (measurement) in computational
complexity theory. It was developed by Thomas McCabe and is used to measure the
complexity of a program. It directly measures the number of linearly independent
paths through a program's source code.
Computation of Cyclomatic Complexity:
In the above flow graph
E = no. of edges = 9
N = no. of nodes = 7
M = E – N + 2
= 9 – 7 + 2
= 4
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 39 -
Console And Black Box Testing:
CONSOLE TEST CASES
Sr.
No.
Test Procedure Pre - Condition Expected Result Actual Result
1
Testing in Linux
terminal
Terminal
variables have
default values
Xterm related tools
are disabled
No graphical
information
displayed
2
Invalid no. of
arguments
All nodes are up Error message Proper Usage given
3
Pop-up terminals
for different
nodes
All nodes are up
No of pop-ups =
no. of cores in alive
nodes
No of pop-ups = no.
of cores in alive
nodes
4
3D Rendering on
single machine
All necessary files
in place
Live 3D rendering
Shows frame being
rendered
5
3D Rendering on
Parallex system.
All nodes are up Status of rendering Rendered video
6 Mplayer testing Rendered frames
Animation in .avi
format
Rendered
video(.avi)
Table 7.3 Console Test cases
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 40 -
BLACK BOX TEST CASES
Sr.
No.
Test Procedure Pre - Condition Expected Result Actual Result
1 New Node up Node is Down
Status Message
Displayed By
NetMon Tool.
Message Node UP
2 Node goes Down Nodes is UP
Status Message
Displayed By
NetMon Tool
Message Node
DOWN
3
Nodes
Information
Nodes are UP
Internal Information
of Nodes
Status, IP , MAC
addr, RAM etc.
4
Main task
submission
Application is
Compiled
Next module called
(distribution algo)
Processing speed
of the nodes.
5
Main task
submission with
faulty input.
Application is
Compiled
ERROR
Display error &
EXIT
6
Distribution
algorithm
Get RANGE
Break task according
processing speed of
the nodes
Breaks The
RANGE &
generates scripts
7 Cluster feed script All nodes up
Task sent to
individual machines
for execution
Display shows
task executed on
each machine
8 Result assembly
All machines have
returned results
Final result
calculation
Final result
displayed on
screen
9 Fault tolerance
Machine(s) goes
down in-between
execution
Error recovery script
is executed
Task resent to all
alive machines
Table 7.4 Black box Testing
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 41 -
System Usage Specification outline:
Fig 7.2 System Usage pattern :
Fig 7.3 Histogram:
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 42 -
Runtime BENCHMARK:
Runtime Benchmark :
Fig 7.4 One frame from Complex Rendering on Parallex: Simulation of an explosion
The following is the output comparison of same application with same
parameters being run on a Standalone Machine, Existing Beowulf Parallel Cluster,
and Our Cluster System Parallex.
Application: POVRAY
Hardware Specifications:
NODE 0 P4 2.8 GHz
NODE 1 Cor2DUO 2.8 GHz
NODE 2 AMD 64, 2.01 GHz
NODE 3 AMD 64, 1.80 GHz
NODE 4 CELERON D,2.16 GHz
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 43 -
Benchmark Results:
Time Single
Machine
Existing
Parallel
Systems(4
NODES)
Parallex
Cluster
System (4
NODES)
Real Time 14m 44.3 s 3m 41.61 s 3m 1.62 s
User Time 13m 33.2s 10m 4.67 s 9m 30.75 s
Sys Time 2m 2.26s 0m 2.26 s 0m 2.31s
Table 7.5 Benchmark Results
Note : User Time of Cluster is approximate sum of all per user system time per node.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 44 -
Chapter 8. Cost Estimation
Since the growth of requirements of processing is far greater than the growth
of CPU power, and since the silicon chip is fast approaching its full capacity, the
implementation of parallel processing at every level of computing becomes inevitable.
Therefore we propose that in coming ages parallel processing and the
algorithms that sophisticate it, like the ones we have designed and implemented,
would form the heart of modern computing. Not surprisingly, parallel processing has
already begun to penetrate the modern computing marker directly in form of multi
core processors such is Intel dual-core and quad-core processors.
One of ours primary aims are simplistic implementation and least
administrative overhead makes the implementation of Parallex simple and effective.
Parallex can be easily deployed to all sectors of modern computing where
CPU intensive applications form an important part for its growth.
While a system of n parallel processors is less efficient than one n times faster
processor, the Parallel System is often cheaper to build. Parallel computation is used
for tasks which require very large amounts of computation, take a lot of time, and can
be divided into n independent subtasks. In recent years, most high performance
computing systems, also known as supercomputers, have parallel architectures.
Cost effectiveness is one of the major achievements of our Parallex system.
We need no external or expensive hardware nor software, so price of our system is not
been expensive. Our system is based on heterogeneous clusters in which power of
CPU is not an issue due to our mathematical distribution algorithm. Our system
efficiency will not drop by more than 5% due to fewer slower machines.
So, we can say that we are using Silicon waste as challenge to our system,
where we use out dated slower CPUs. Hence our system is Environment friendly
design. One more feature of our system is that we are using diskless nodes which will
reduce the total cost of system by approx. 20% as we are not using the storage devices
of nodes. Apart from separate storage device we will use a centralized storage
solution. Last but not the least our all software tools are Open source.
Hence, we conclude that our Parallex system is one of the most cost effective
systems in its genre.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 45 -
Chapter 9. User Manual
9.1 Dedicated cluster setup
For the dedicated cluster with one master and many diskless slaves, all the user has to
do is install the RPMs supplied in the installation disk on the master. The BProc
configuration file will then be found at /etc/bproc/config.
9.1.1 BProc Configuration
Main configuration file:
/etc/bproc/config
• Edit with favorite text editor
• Lines consist of comments (starting with #)
• Rest are keyword followed by arguments
• Specify interface:
interface eth0 10.0.4.1 255.255.255.0
• eth0 is interface connected to nodes
• IP of master node is 10.0.4.1
• Netmask of master node is 255.255.255.0
• Interface will be configured when BProc is started
Specify range of IP addresses for nodes:
iprange 0 10.0.4.10 10.0.4.14
• Start assigning IP addresses at node 0
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 46 -
• First address is 10.0.4.10, last is 10.0.4.14
• The size of this range determines the number of nodes in the cluster
• Next entries are default libraries to be installed on nodes
• Can explicitly specify libraries or extract library information from an
executable
• Need to add entry to install extra libraries
librariesfrombinary /bin/ls /usr/bin/gdb
• The bplib command can be used to see libraries that will be loaded
Next line specifies the name of the phase 2 image
bootfile /var/bproc/boot.img
• Should be no need to change this
• Need to add a line to specify kernel command line
• kernelcommandline apm=off console=ttyS0,19200
• Turn APM support off (since these nodes don’t have any)
• Set console to use ttyS0 and speed to 19200
• This is used by beoboot command when building phase 2 image
Final lines specify Ethernet addresses of nodes, examples given
#node 0 00:50:56:00:00:00
#node 00:50:56:00:00:01
• Needed so node can learn its IP address from master
• First 0 is optional, assign this address to node 0
• Can automatically determine and add ethernet addresses using the
nodeadd command
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 47 -
• We will use this command later, so no need to change now
• Save file and exit from editor
Other configuration files
/etc/bproc/config.boot
• Specifies PCI devices that are going to be used by the nodes at boot time
• Modules are included in phase 1 and phase 2 boot images
• By default the node will try all network interfaces it can find
/etc/bproc/node_up.conf
• Specifies actions to be taken in order to bring a node up
• Load modules
• Configure network interfaces
• Probe for PCI devices
• Copy files and special devices out to node
9.1.2 Bringing up BProc
Check BProc will be started at boot time
# chkconfig --list clustermatic
• Restart master daemon and boot server
# service bjs stop
# service clustermatic restart
# service bjs start
• Load the new configuration
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 48 -
• BJS uses BProc, so needs to be stopped first
• Check interface has been configured correctly
# ifconfig eth0
• Should have IP address we specified in config file
9.1.3 Build a Phase 2 Image
• Run the beoboot command on the master
# beoboot -2 -n --plugin mon
• -2 this is a phase 2 image
• -n image will boot over network
• --plugin add plugin to the boot image
• The following warning messages can be safely ignored
WARNING: Didn’t find a kernel module called gmac.o
WARNING: Didn’t find a kernel module called bmac.o
• Check phase 2 image is available
# ls -l /var/clustermatic/boot.img
9.1.4 Loading the Phase 2 Image
• Two Kernel Monte is a piece of software which will load a new
Linux kernel replacing one that is already running
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 49 -
• This allows you to use Linux as your boot loader!
• Using Linux means you can use any network that Linux supports.
• There is no PXE bios or Etherboot support for Myrinet, Quadrics or Infiniband
• “Pink” network boots on Myrinet which allowed us to avoid buying a 1024
port ethernet network
• Currently supports x86 (including AMD64) and Alpha
9.1.5 Using the Cluster
bpsh
• Migrates a process to one or more nodes
• Process is started on front-end, but is immediately migrated onto nodes
• Effect similar to rsh command, but no login is performed and no shell is
started
• I/O forwarding can be controlled
• Output can be prefixed with node number
• Run date command on all nodes which are up
# bpsh -a -p date
• See other arguments that are available
# bpsh -h
bpcp
• Copies files to a node
• Files can come from master node, or other nodes
• Note that a node only has a ram disk by default
• Copy /etc/hosts from master to /tmp/hosts on node 0
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 50 -
# bpcp /etc/hosts 0:/tmp/hosts
# bpsh 0 cat /tmp/hosts
9.1.6 Managing the Cluster
bpstat
• Shows status of nodes
• up node is up and available
• down node is down or can’t be contacted by master
• boot node is coming up (running node_up)
• error an error occurred while the node was booting
• Shows owner and group of node
• Combined with permissions, determines who can start jobs on the node
• Shows permissions of the node
---x------ execute permission for node owner
------x--- execute permission for users in node group
---------x execute permission for other users
bpctl
• Control a nodes status
• Reboot node 1 (takes about a minute)
# bpctl -S 1 –R
• Set state of node 0
# bpctl -S 0 -s groovy
• Only up, down, boot and error have special meaning, everything else
means not down
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 51 -
• Set owner of node 0
# bpctl -S 0 -u nobody
• Set permissions of node 0 so anyone can execute a job
# bpctl -S 0 -m 111
bplib
• Manage libraries that are loaded on a node
• List libraries to be loaded
# bplib –l
• Add a library to the list
# bplib -a /lib/libcrypt.so.1
• Remove a library from the list
# bplib -d /lib/libcrypt.so.1
9.1.7 Troubleshooting techniques
• The tcpdump command can be used to check for node activity during and after a
node has booted
• Connect a cable to serial port on node to check console output for errors in boot
process
• Once node reaches node_up processing, messages will be logged in
/var/log/bproc/node.N (where N is node number)
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 52 -
9.2 Shared Cluster Setup
Once you have the basic installation completed, you'll need to configure the system.
Many of the tasks are no different for machines in a cluster than for any other system.
For other tasks, being part of a cluster impacts what needs to be done. The following
subsections describe the issues associated with several services that require special
considerations.
9.2.1 DHCP
Dynamic Host Configuration Protocol (DHCP) is used to supply network
configuration parameters, including IP addresses, host names, and other information
to clients as they boot. With clusters, the head node is often configured as a DHCP
server and the compute nodes as DHCP clients. There are two reasons to do this. First,
it simplifies the installation of compute nodes since the information DHCP can supply
is often the only thing that is different among the nodes. Since a DHCP server can
handle these differences, the node installation can be standardized and automated. A
second advantage of DHCP is that it is much easier to change the configuration of the
network. You simply change the configuration file on the DHCP server, restart the
server, and reboot each of the compute nodes.
The basic installation is rarely a problem. The DHCP system can be installed as a part
of the initial Linux installation or after Linux has been installed. The DHCP server
configuration file, typically /etc/dhcpd.conf, controls the information distributed to
the clients. If you are going to have problems, the configuration file is the most likely
source.
The DHCP configuration file may be created or changed automatically when some
cluster software is installed. Occasionally, the changes may not be done optimally or
even correctly so you should have at least a reading knowledge of DHCP
configuration files. Here is a heavily commented sample configuration file that
illustrates the basics. (Lines starting with "#" are comments.)
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 53 -
# A sample DHCP configuration file.
# The first commands in this file are global,
# i.e., they apply to all clients.
# Only answer requests from known machines,
# i.e., machines whose hardware addresses are given.
deny unknown-clients;
# Set the subnet mask, broadcast address, and router address.
option subnet-mask 255.255.255.0;
option broadcast-address 172.16.1.255;
option routers 172.16.1.254;
# This section defines individual cluster nodes.
# Each subnet in the network has its own section.
subnet 172.16.1.0 netmask 255.255.255.0 {
group {
# The first host, identified by the given MAC address,
# will be named node1.cluster.int, will be given the
# IP address 172.16.1.1, and will use the default router
# 172.16.1.254 (the head node in this case).
host node1{
hardware ethernet 00:08:c7:07:68:48;
fixed-address 172.16.1.1;
option routers 172.16.1.254;
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 54 -
option domain-name "cluster.int";
}
host node2{
hardware ethernet 00:08:c7:07:c1:73;
fixed-address 172.16.1.2;
option routers 172.16.1.254;
option domain-name "cluster.int";
}
# Additional node definitions go here.
}
}
# For servers with multiple interfaces, this entry says to ignore requests
# on specified subnets.
subnet 10.0.32.0 netmask 255.255.248.0 { not authoritative; }
As shown in this example, you should include a subnet section for each subnet on
your network. If the head node has an interface for the cluster and a second interface
connected to the Internet or your organization's network, the configuration file will
have a group for each interface or subnet. Since the head node should answer DHCP
requests for the cluster but not for the organization, DHCP should be configured so
that it will respond only to DHCP requests from the compute nodes.
9.2.2 NFS
A network filesystem is a filesystem that physically resides on one computer (the file
server), which in turn shares its files over the network with other computers on the
network (the clients). The best-known and most common network filesystem is
Network File System (NFS). In setting up a cluster, designate one computer as your
NFS server. This is often the head node for the cluster, but there is no reason it has to
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 55 -
be. In fact, under some circumstances, you may get slightly better performance if you
use different machines for the NFS server and head node. Since the server is where
your user files will reside, make sure you have enough storage. This machine is a
likely candidate for a second disk drive or raid array and a fast I/O subsystem. You
may even what to consider mirroring the filesystem using a small high-availability
cluster.
Why use an NFS? It should come as no surprise that for parallel programming you'll
need a copy of the compiled code or executable on each machine on which it will run.
You could, of course, copy the executable over to the individual machines, but this
quickly becomes tiresome. A shared filesystem solves this problem. Another
advantage to an NFS is that all the files you will be working on will be on the same
system. This greatly simplifies backups. (You do backups, don't you?) A shared
filesystem also simplifies setting up SSH, as it eliminates the need to distribute keys.
(SSH is described later in this chapter.) For this reason, you may want to set up NFS
before setting up SSH. NFS can also play an essential role in some installation
strategies.
If you have never used NFS before, setting up the client and the server are slightly
different, but neither is particularly difficult. Most Linux distributions come with most
of the work already done for you.
9.2.2.1 Running NFS
Begin with the server; you won't get anywhere with the client if the server isn't
already running. Two things need to be done to get the server running. The file
/etc/exports must be edited to specify which machines can mount which directories,
and then the server software must be started. Here is a single line from the file
/etc/exports on the server amy:
/home basil(rw) clara(rw) desmond(rw) ernest(rw) george(rw)
This line gives the clients basil, clara, desmond, ernest, and george read/write access
to the directory /home on the server. Read access is the default. A number of other
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 56 -
options are available and could be included. For example, the no_root_squash option
could be added if you want to edit root permission files from the nodes.
Had a space been inadvertently included between basil and (rw), read access would
have been granted to basil and read/write access would have been granted to all other
systems. (Once you have the systems set up, it is a good idea to use the command
showmount -a to see who is mounting what.)
Once /etc/exports has been edited, you'll need to start NFS. For testing, you can use
the service command as shown here
[root@fanny init.d]# /sbin/service nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS mountd: [ OK ]
Starting NFS daemon: [ OK ]
[root@fanny init.d]# /sbin/service nfs status
rpc.mountd (pid 1652) is running...
nfsd (pid 1666 1665 1664 1663 1662 1661 1660 1657) is running...
rpc.rquotad (pid 1647) is running...
(With some Linux distributions, when restarting NFS, you may find it necessary to
explicitly stop and restart both nfslock and portmap as well.) You'll want to change
the system configuration so that this starts automatically when the system is rebooted.
For example, with Red Hat, you could use the serviceconf or chkconfig commands.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 57 -
For the client, the software is probably already running on your system. You just need
to tell the client to mount the remote filesystem. You can do this several ways, but in
the long run, the easiest approach is to edit the file /etc/fstab, adding an entry for the
server. Basically, you'll add a line to the file that looks something like this:
amy:/home /home nfs rw,soft 0 0
In this example, the local system mounts the /home filesystem located on amy as the
/home directory on the local machine. The filesystems may have different names. You
can now manually mount the filesystem with the mount command
[root@ida /]# mount /home
When the system reboots, this will be done automatically.
When using NFS, you should keep a couple of things in mind. The mount point,
/home, must exist on the client prior to mounting. While the remote directory is
mounted, any files that were stored on the local system in the /home directory will be
inaccessible. They are still there; you just can't get to them while the remote directory
is mounted. Next, if you are running a firewall, it will probably block NFS traffic. If
you are having problems with NFS, this is one of the first things you should check.
File ownership can also create some surprises. User and group IDs should be
consistent among systems using NFS, i.e., each user will have identical IDs on all
systems. Finally, be aware that root privileges don't extend across NFS shared systems
(if you have configured your systems correctly). So if, as root, you change the
directory (cd) to a remotely mounted filesystem, don't expect to be able to look at
every file. (Of course, as root you can always use su to become the owner and do all
the snooping you want.) Details for the syntax and options can be found in the nfs(5),
exports(5), fstab(5), and mount(8) manpages.
9.2.3 SSH
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 58 -
To run software across a cluster, you'll need some mechanism to start processes on
each machine. In practice, a prerequisite is the ability to log onto each machine within
the cluster. If you need to enter a password for each machine each time you run a
program, you won't get very much done. What is needed is a mechanism that allows
logins without passwords.
This boils down to two choices—you can use remote shell (RSH) or secure shell
(SSH). If you are a trusting soul, you may want to use RSH. It is simpler to set up with
less overhead. On the other hand, SSH network traffic is encrypted, so it is safe from
snooping. Since SSH provides greater security, it is generally the preferred approach.
SSH provides mechanisms to log onto remote machines, run programs on remote
machines, and copy files among machines. SSH is a replacement for ftp, telnet, rlogin,
rsh, and rcp. A commercial version of SSH is available from SSH Communications
Security (http://www.ssh.com), a company founded by Tatu Ylönen, an original
developer of SSH. Or you can go with OpenSSH, an open source version from
http://www.openssh.org.
OpenSSH is the easiest since it is already included with most Linux distributions. It
has other advantages as well. By default, OpenSSH automatically forwards the
DISPLAY variable. This greatly simplifies using the X Window System across the
cluster. If you are running an SSH connection under X on your local machine and
execute an X program on the remote machine, the X window will automatically open
on the local machine. This can be disabled on the server side, so if it isn't working,
that is the first place to look.
There are two sets of SSH protocols, SSH-1 and SSH-2. Unfortunately, SSH-1 has a
serious security vulnerability. SSH-2 is now the protocol of choice. This discussion
will focus on using OpenSSH with SSH-2.
Before setting up SSH, check to see if it is already installed and running on your
system. With Red Hat, you can check to see what packages are installed using the
package manager.
[root@fanny root]# rpm -q -a | grep ssh
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 59 -
openssh-3.5p1-6
openssh-server-3.5p1-6
openssh-clients-3.5p1-6
openssh-askpass-gnome-3.5p1-6
openssh-askpass-3.5p1-6
This particular system has the SSH core package, both server and client software as
well as additional utilities. The SSH daemon is usually started as a service. As you
can see, it is already running on this machine.
[root@fanny root]# /sbin/service sshd status
sshd (pid 28190 1658) is running...
Of course, it is possible that it wasn't started as a service but is still installed and
running. You can use ps to double check.
[root@fanny root]# ps -aux | grep ssh
root 29133 0.0 0.2 3520 328 ? S Dec09 0:02 /usr/sbin/sshd
...
Again, this shows the server is running.
With some older Red Hat installations, e.g., the 7.3 workstation, only the client
software is installed by default. You'll need to manually install the server software. If
using Red Hat 7.3, go to the second install disk and copy over the file
RedHat/RPMS/openssh-server-3.1p1-3.i386.rpm. (Better yet, download the latest
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 60 -
version of this software.) Install it with the package manager and then start the
service.
[root@james root]# rpm -vih openssh-server-3.1p1-3.i386.rpm
Preparing... ########################################### [100%]
1:openssh-server ########################################### [100%]
[root@james root]# /sbin/service sshd start
Generating SSH1 RSA host key: [ OK ]
Generating SSH2 RSA host key: [ OK ]
Generating SSH2 DSA host key: [ OK ]
Starting sshd: [ OK ]
When SSH is started for the first time, encryption keys for the system are generated.
Be sure to set this up so that it is done automatically when the system reboots.
Configuration files for both the server, sshd_config, and client, ssh_config, can be
found in /etc/ssh, but the default settings are usually quite reasonable. You shouldn't
need to change these files.
9.2.3.1 Using SSH
To log onto a remote machine, use the command ssh with the name or IP address of
the remote machine as an argument. The first time you connect to a remote machine,
you will receive a message with the remote machines' fingerprint, a string that
identifies the machine. You'll be asked whether to proceed or not. This is normal.
[root@fanny root]# ssh amy
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 61 -
The authenticity of host 'amy (10.0.32.139)' can't be established.
RSA key fingerprint is 98:42:51:3e:90:43:1c:32:e6:c4:cc:8f:4a:ee:cd:86.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'amy,10.0.32.139' (RSA) to the list of known hosts.
root@amy's password:
Last login: Tue Dec 9 11:24:09 2003
[root@amy root]#
The fingerprint will be recorded in a list of known hosts on the local machine. SSH
will compare fingerprints on subsequent logins to ensure that nothing has changed.
You won't see anything else about the fingerprint unless it changes. Then SSH will
warn you and query whether you should continue. If the remote system has changed,
e.g., if it has been rebuilt or if SSH has been reinstalled, it's OK to proceed. But if you
think the remote system hasn't changed, you should investigate further before logging
in.
Notice in the last example that SSH automatically uses the same identity when
logging into a remote machine. If you want to log on as a different user, use the -l
option with the appropriate account name.
You can also use SSH to execute commands on remote systems. Here is an example
of using date remotely.
[root@fanny root]# ssh -l sloanjd hector date
sloanjd@hector's password:
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 62 -
Mon Dec 22 09:28:46 EST 2003
Notice that a different account, sloanjd, was used in this example.
To copy files, you use the scp command. For example,
[root@fanny root]# scp /etc/motd george:/root/
root@george's password:
motd 100% |*****************************| 0 00:00
Here file /etc/motd was copied from fanny to the /root directory on george.
In the examples thus far, the system has asked for a password each time a command
was run. If you want to avoid this, you'll need to do some extra work. You'll need to
generate a pair of authorization keys that will be used to control access and then store
these in the directory ~/.ssh. The ssh-keygen command is used to generate keys.
[sloanjd@fanny sloanjd]$ ssh-keygen -b1024 -trsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/sloanjd/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/sloanjd/.ssh/id_rsa.
Your public key has been saved in /home/sloanjd/.ssh/id_rsa.pub.
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 63 -
The key fingerprint is:
2d:c8:d1:e1:bc:90:b2:f6:6d:2e:a5:7f:db:26:60:3f sloanjd@fanny
[sloanjd@fanny sloanjd]$ cd .ssh
[sloanjd@fanny .ssh]$ ls -a
. .. id_rsa id_rsa.pub known_hosts
The options in this example are used to specify a 1,024-bit key and the RSA
algorithm. (You can use DSA instead of RSA if you prefer.) Notice that SSH will
prompt you for a pass phrase, basically a multi-word password.
Two keys are generated, a public and a private key. The private key should never be
shared and resides only on the client machine. The public key is distributed to remote
machines. Copy the public key to each system you'll want to log onto, renaming it
authorized_keys2.
[sloanjd@fanny .ssh]$ cp id_rsa.pub authorized_keys2
[sloanjd@fanny .ssh]$ chmod go-rwx authorized_keys2
[sloanjd@fanny .ssh]$ chmod 755 ~/.ssh
If you are using NFS, as shown here, all you need to do is copy and rename the file in
the current directory. Since that directory is mounted on each system in the cluster, it
is automatically available.
If you used the NFS setup described earlier, root's home
directory/root, is not shared. If you want to log in as root
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 64 -
without a password, manually copy the public keys to the target
machines. You'll need to decide whether you feel secure setting
up the root account like this.
You will use two utilities supplied with SSH to manage the login process. The first is
an SSH agent program that caches private keys, ssh-agent. This program stores the
keys locally and uses them to respond to authentication queries from SSH clients. The
second utility, ssh-add, is used to manage the local key cache. Among other things, it
can be used to add, list, or remove keys.
[sloanjd@fanny .ssh]$ ssh-agent $SHELL
[sloanjd@fanny .ssh]$ ssh-add
Enter passphrase for /home/sloanjd/.ssh/id_rsa:
Identity added: /home/sloanjd/.ssh/id_rsa (/home/sloanjd/.ssh/id_rsa)
(While this example uses the $SHELL variable, you can substitute the actual name of
the shell you want to run if you wish.) Once this is done, you can log in to remote
machines without a password.
This process can be automated to varying degrees. For example, you can add the call
to ssh-agent as the last line of your login script so that it will be run before you make
any changes to your shell's environment. Once you have done this, you'll need to run
ssh-add only when you log in. But you should be aware that Red Hat console logins
don't like this change.
You can find more information by looking at the ssh(1), ssh-agent(1), and ssh-add(1)
manpages. If you want more details on how to set up ssh-agent, you might look at
SSH, The Secure Shell by Barrett and Silverman, O'Reilly, 2001. You can also find
The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer
AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 65 -
scripts on the Internet that will set up a persistent agent so that you won't need to
rerun ssh-add each time.
9.2.4 Hosts file and name services
Life will be much simpler in the long run if you provide appropriate name services.
NIS is certainly one possibility. At a minimum, don't forget to edit /etc/hosts for your
cluster. At the very least, this will reduce network traffic and speed up some software.
And some packages assume it is correctly installed. Here are a few lines from the host
file for amy:
127.0.0.1 localhost.localdomain localhost
10.0.32.139 amy.wofford.int amy
10.0.32.140 basil.wofford.int basil
...
Notice that amy is not included on the line with localhost. Specifying the host name as
an alias for localhost can break some software.
9.3 Working with Parallex
Once the master has been configured and all nodes are up, working with Parallex to
utilize all your available resources is very easy. Follow these simple steps to use the
power of all nodes that are up.
• Compile your code and place it in $PARALLEX_DIR/bin/
You can use the Makefile to do this for you.
# make main_app
• After the application is compiled without any errors, first start the networking
monitoring tool of Parallex
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer
Parallex - The Supercomputer

Contenu connexe

Tendances

Linux26 New Features
Linux26 New FeaturesLinux26 New Features
Linux26 New Featuresguest491c69
 
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...JOLLUSUDARSHANREDDY
 
SMI_SNUG_paper_v10
SMI_SNUG_paper_v10SMI_SNUG_paper_v10
SMI_SNUG_paper_v10Igor Lesik
 
Processor allocation in Distributed Systems
Processor allocation in Distributed SystemsProcessor allocation in Distributed Systems
Processor allocation in Distributed SystemsRitu Ranjan Shrivastwa
 
Evaluating the performance and behaviour of rt xen
Evaluating the performance and behaviour of rt xenEvaluating the performance and behaviour of rt xen
Evaluating the performance and behaviour of rt xenijesajournal
 
Operating Systems Part II-Process Scheduling, Synchronisation & Deadlock
Operating Systems Part II-Process Scheduling, Synchronisation & DeadlockOperating Systems Part II-Process Scheduling, Synchronisation & Deadlock
Operating Systems Part II-Process Scheduling, Synchronisation & DeadlockAjit Nayak
 
Analysis of Embedded Linux Literature Review Report
Analysis of Embedded Linux Literature Review ReportAnalysis of Embedded Linux Literature Review Report
Analysis of Embedded Linux Literature Review ReportSitakanta Mishra
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionShobha Kumar
 
Ppt project process migration
Ppt project process migrationPpt project process migration
Ppt project process migrationjaya380
 
Operating System - Unit I - Introduction
Operating System - Unit I - IntroductionOperating System - Unit I - Introduction
Operating System - Unit I - Introductioncscarcas
 
Operating System - Unit I - Operating System Structures
Operating System - Unit I - Operating System StructuresOperating System - Unit I - Operating System Structures
Operating System - Unit I - Operating System Structurescscarcas
 
Load module kernel
Load module kernelLoad module kernel
Load module kernelAbu Azzam
 
Components in real time systems
Components in real time systemsComponents in real time systems
Components in real time systemsSaransh Garg
 
Galvin-operating System(Ch5)
Galvin-operating System(Ch5)Galvin-operating System(Ch5)
Galvin-operating System(Ch5)dsuyal1
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...vtunotesbysree
 
1 intro and overview
1 intro and overview1 intro and overview
1 intro and overviewBaliThorat1
 

Tendances (20)

Linux26 New Features
Linux26 New FeaturesLinux26 New Features
Linux26 New Features
 
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
 
SMI_SNUG_paper_v10
SMI_SNUG_paper_v10SMI_SNUG_paper_v10
SMI_SNUG_paper_v10
 
Processor allocation in Distributed Systems
Processor allocation in Distributed SystemsProcessor allocation in Distributed Systems
Processor allocation in Distributed Systems
 
Evaluating the performance and behaviour of rt xen
Evaluating the performance and behaviour of rt xenEvaluating the performance and behaviour of rt xen
Evaluating the performance and behaviour of rt xen
 
Operating Systems Part II-Process Scheduling, Synchronisation & Deadlock
Operating Systems Part II-Process Scheduling, Synchronisation & DeadlockOperating Systems Part II-Process Scheduling, Synchronisation & Deadlock
Operating Systems Part II-Process Scheduling, Synchronisation & Deadlock
 
Analysis of Embedded Linux Literature Review Report
Analysis of Embedded Linux Literature Review ReportAnalysis of Embedded Linux Literature Review Report
Analysis of Embedded Linux Literature Review Report
 
Microkernel
MicrokernelMicrokernel
Microkernel
 
Rtos part2
Rtos part2Rtos part2
Rtos part2
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solution
 
Ppt project process migration
Ppt project process migrationPpt project process migration
Ppt project process migration
 
OS_Ch4
OS_Ch4OS_Ch4
OS_Ch4
 
Operating System - Unit I - Introduction
Operating System - Unit I - IntroductionOperating System - Unit I - Introduction
Operating System - Unit I - Introduction
 
Operating System - Unit I - Operating System Structures
Operating System - Unit I - Operating System StructuresOperating System - Unit I - Operating System Structures
Operating System - Unit I - Operating System Structures
 
Load module kernel
Load module kernelLoad module kernel
Load module kernel
 
Components in real time systems
Components in real time systemsComponents in real time systems
Components in real time systems
 
Galvin-operating System(Ch5)
Galvin-operating System(Ch5)Galvin-operating System(Ch5)
Galvin-operating System(Ch5)
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
 
1 intro and overview
1 intro and overview1 intro and overview
1 intro and overview
 
Microkernel design
Microkernel designMicrokernel design
Microkernel design
 

En vedette (20)

zForce Touch Screen Technology
zForce Touch Screen TechnologyzForce Touch Screen Technology
zForce Touch Screen Technology
 
Secure SHell
Secure SHellSecure SHell
Secure SHell
 
Multitouch Interaction
Multitouch   InteractionMultitouch   Interaction
Multitouch Interaction
 
Xmax technology
Xmax technologyXmax technology
Xmax technology
 
Smart glass introduction
Smart glass introductionSmart glass introduction
Smart glass introduction
 
Diamond chip
Diamond chipDiamond chip
Diamond chip
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
E paper slides
E paper slidesE paper slides
E paper slides
 
007 014 belcaro corrige
007 014 belcaro corrige007 014 belcaro corrige
007 014 belcaro corrige
 
Terofox v port catalogue (rev.01)
Terofox v port catalogue (rev.01)Terofox v port catalogue (rev.01)
Terofox v port catalogue (rev.01)
 
Itf ipp ch10_2012_final
Itf ipp ch10_2012_finalItf ipp ch10_2012_final
Itf ipp ch10_2012_final
 
201506 CSE340 Lecture 14
201506 CSE340 Lecture 14201506 CSE340 Lecture 14
201506 CSE340 Lecture 14
 
201506 CSE340 Lecture 13
201506 CSE340 Lecture 13201506 CSE340 Lecture 13
201506 CSE340 Lecture 13
 
RCMSL Phenomenal July 23, 2009
RCMSL Phenomenal July 23, 2009RCMSL Phenomenal July 23, 2009
RCMSL Phenomenal July 23, 2009
 
Slides boekpresentatie 'Sociale Media en Journalistiek'
Slides boekpresentatie 'Sociale Media en Journalistiek'Slides boekpresentatie 'Sociale Media en Journalistiek'
Slides boekpresentatie 'Sociale Media en Journalistiek'
 
200910 - iPhone at OOPSLA
200910 - iPhone at OOPSLA200910 - iPhone at OOPSLA
200910 - iPhone at OOPSLA
 
Domain driven design
Domain driven designDomain driven design
Domain driven design
 
Eprotect Complan Ver 4
Eprotect Complan Ver 4Eprotect Complan Ver 4
Eprotect Complan Ver 4
 
RCMSL Phenomenal Aug 13 And 20, 2009
RCMSL Phenomenal Aug 13 And 20, 2009RCMSL Phenomenal Aug 13 And 20, 2009
RCMSL Phenomenal Aug 13 And 20, 2009
 
OORPT Dynamic Analysis
OORPT Dynamic AnalysisOORPT Dynamic Analysis
OORPT Dynamic Analysis
 

Similaire à Parallex - The Supercomputer

Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
 Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridadMarketing Donalba
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...CSCJournals
 
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind
 
UNIT I_Introduction.pptx
UNIT I_Introduction.pptxUNIT I_Introduction.pptx
UNIT I_Introduction.pptxssuser4ca1eb
 
Adaptable embedded systems
Adaptable embedded systemsAdaptable embedded systems
Adaptable embedded systemsSpringer
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...Malobe Lottin Cyrille Marcel
 
PETRUCCI_Andrea_Research_Projects_and_Publications
PETRUCCI_Andrea_Research_Projects_and_PublicationsPETRUCCI_Andrea_Research_Projects_and_Publications
PETRUCCI_Andrea_Research_Projects_and_PublicationsAndrea PETRUCCI
 
Unit 1 Introduction to Embedded computing and ARM processor
Unit 1 Introduction to Embedded computing and ARM processorUnit 1 Introduction to Embedded computing and ARM processor
Unit 1 Introduction to Embedded computing and ARM processorVenkat Ramanan C
 
Arduino Based Collision Prevention Warning System
Arduino Based Collision Prevention Warning SystemArduino Based Collision Prevention Warning System
Arduino Based Collision Prevention Warning SystemMadhav Reddy Chintapalli
 
Network rollout-solution-brochure
Network rollout-solution-brochureNetwork rollout-solution-brochure
Network rollout-solution-brochureTaha77
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsCSCJournals
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringRafael Ferreira da Silva
 
System on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsjournalBEEI
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingeSAT Journals
 

Similaire à Parallex - The Supercomputer (20)

Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
 Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
 
035
035035
035
 
Cluster computing report
Cluster computing reportCluster computing report
Cluster computing report
 
Clusetrreport
ClusetrreportClusetrreport
Clusetrreport
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
 
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
 
UNIT I_Introduction.pptx
UNIT I_Introduction.pptxUNIT I_Introduction.pptx
UNIT I_Introduction.pptx
 
Adaptable embedded systems
Adaptable embedded systemsAdaptable embedded systems
Adaptable embedded systems
 
UNIT I.pptx
UNIT I.pptxUNIT I.pptx
UNIT I.pptx
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
 
PETRUCCI_Andrea_Research_Projects_and_Publications
PETRUCCI_Andrea_Research_Projects_and_PublicationsPETRUCCI_Andrea_Research_Projects_and_Publications
PETRUCCI_Andrea_Research_Projects_and_Publications
 
Unit 1 Introduction to Embedded computing and ARM processor
Unit 1 Introduction to Embedded computing and ARM processorUnit 1 Introduction to Embedded computing and ARM processor
Unit 1 Introduction to Embedded computing and ARM processor
 
Arduino Based Collision Prevention Warning System
Arduino Based Collision Prevention Warning SystemArduino Based Collision Prevention Warning System
Arduino Based Collision Prevention Warning System
 
Network rollout-solution-brochure
Network rollout-solution-brochureNetwork rollout-solution-brochure
Network rollout-solution-brochure
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
System on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power Electronics
 
ERTS_Unit 1_PPT.pdf
ERTS_Unit 1_PPT.pdfERTS_Unit 1_PPT.pdf
ERTS_Unit 1_PPT.pdf
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
 

Plus de Ankit Singh

IoT in Mining for Sensing, Monitoring and Prediction of Underground Mines Roo...
IoT in Mining for Sensing, Monitoring and Prediction of Underground Mines Roo...IoT in Mining for Sensing, Monitoring and Prediction of Underground Mines Roo...
IoT in Mining for Sensing, Monitoring and Prediction of Underground Mines Roo...Ankit Singh
 
Security Vision for Software on Wheels (Autonomous Vehicles)
Security Vision for Software on Wheels (Autonomous Vehicles)Security Vision for Software on Wheels (Autonomous Vehicles)
Security Vision for Software on Wheels (Autonomous Vehicles)Ankit Singh
 
Restricted Usage of Anonymous Credentials in VANET for Misbehaviour Detection
Restricted Usage of Anonymous Credentials in VANET for Misbehaviour DetectionRestricted Usage of Anonymous Credentials in VANET for Misbehaviour Detection
Restricted Usage of Anonymous Credentials in VANET for Misbehaviour DetectionAnkit Singh
 
The Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud ComputingThe Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud ComputingAnkit Singh
 
The Security and Privacy Requirements in VANET
The Security and Privacy Requirements in VANETThe Security and Privacy Requirements in VANET
The Security and Privacy Requirements in VANETAnkit Singh
 
MicazXpl Intelligent Sensors Network Project Presentation
MicazXpl Intelligent Sensors Network Project PresentationMicazXpl Intelligent Sensors Network Project Presentation
MicazXpl Intelligent Sensors Network Project PresentationAnkit Singh
 
DO-178B/ED-12B Presentation
DO-178B/ED-12B PresentationDO-178B/ED-12B Presentation
DO-178B/ED-12B PresentationAnkit Singh
 
Toilet etiquettes
Toilet etiquettesToilet etiquettes
Toilet etiquettesAnkit Singh
 
Indian German Unity
Indian German UnityIndian German Unity
Indian German UnityAnkit Singh
 
TINYOS Oscilloscope Application
TINYOS Oscilloscope ApplicationTINYOS Oscilloscope Application
TINYOS Oscilloscope ApplicationAnkit Singh
 
Mote Mote Radio Communication
Mote Mote Radio CommunicationMote Mote Radio Communication
Mote Mote Radio CommunicationAnkit Singh
 
TinyOS installation Guide And Manual
TinyOS installation Guide And ManualTinyOS installation Guide And Manual
TinyOS installation Guide And ManualAnkit Singh
 
Simple Railroad Command Protocol
Simple Railroad Command ProtocolSimple Railroad Command Protocol
Simple Railroad Command ProtocolAnkit Singh
 
Dane presentation
Dane presentationDane presentation
Dane presentationAnkit Singh
 
Anti Collision Railways System
Anti Collision Railways SystemAnti Collision Railways System
Anti Collision Railways SystemAnkit Singh
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault ToleranceAnkit Singh
 

Plus de Ankit Singh (16)

IoT in Mining for Sensing, Monitoring and Prediction of Underground Mines Roo...
IoT in Mining for Sensing, Monitoring and Prediction of Underground Mines Roo...IoT in Mining for Sensing, Monitoring and Prediction of Underground Mines Roo...
IoT in Mining for Sensing, Monitoring and Prediction of Underground Mines Roo...
 
Security Vision for Software on Wheels (Autonomous Vehicles)
Security Vision for Software on Wheels (Autonomous Vehicles)Security Vision for Software on Wheels (Autonomous Vehicles)
Security Vision for Software on Wheels (Autonomous Vehicles)
 
Restricted Usage of Anonymous Credentials in VANET for Misbehaviour Detection
Restricted Usage of Anonymous Credentials in VANET for Misbehaviour DetectionRestricted Usage of Anonymous Credentials in VANET for Misbehaviour Detection
Restricted Usage of Anonymous Credentials in VANET for Misbehaviour Detection
 
The Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud ComputingThe Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud Computing
 
The Security and Privacy Requirements in VANET
The Security and Privacy Requirements in VANETThe Security and Privacy Requirements in VANET
The Security and Privacy Requirements in VANET
 
MicazXpl Intelligent Sensors Network Project Presentation
MicazXpl Intelligent Sensors Network Project PresentationMicazXpl Intelligent Sensors Network Project Presentation
MicazXpl Intelligent Sensors Network Project Presentation
 
DO-178B/ED-12B Presentation
DO-178B/ED-12B PresentationDO-178B/ED-12B Presentation
DO-178B/ED-12B Presentation
 
Toilet etiquettes
Toilet etiquettesToilet etiquettes
Toilet etiquettes
 
Indian German Unity
Indian German UnityIndian German Unity
Indian German Unity
 
TINYOS Oscilloscope Application
TINYOS Oscilloscope ApplicationTINYOS Oscilloscope Application
TINYOS Oscilloscope Application
 
Mote Mote Radio Communication
Mote Mote Radio CommunicationMote Mote Radio Communication
Mote Mote Radio Communication
 
TinyOS installation Guide And Manual
TinyOS installation Guide And ManualTinyOS installation Guide And Manual
TinyOS installation Guide And Manual
 
Simple Railroad Command Protocol
Simple Railroad Command ProtocolSimple Railroad Command Protocol
Simple Railroad Command Protocol
 
Dane presentation
Dane presentationDane presentation
Dane presentation
 
Anti Collision Railways System
Anti Collision Railways SystemAnti Collision Railways System
Anti Collision Railways System
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault Tolerance
 

Dernier

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Dernier (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Parallex - The Supercomputer

  • 2. PPAARRAALLLLEEXX –– TTHHEE SSUUPPEERR CCOOMMPPUUTTEERR A PROJECT REPORT Submitted by Mr. AMIT KUMAR Mr. ANKIT SINGH Mr. SUSHANT BHADKAMKAR in partial fulfillment for the award of the degree Of BACHELOR OF ENGINEERING IN COMPUTER SCIENCE GUIDE: MR. ANIL KADAM AISSMS’S COLLEGE OF ENGINEERING, PUNE UNIVERSITY OF PUNE 2007 - 2008
  • 3. CERTIFICATE Certified that this project report “Parallex - The Super Computer” is the bonafide work of Mr. AMIT KUMAR (Seat No. :: B3*****7) Mr. ANKIT SINGH (Seat No. :: B3*****8) Mr. SUSHANT BHADKAMKAR (Seat No. :: B3*****2) who carried out the project work under my supervision. Prof. M. A. Pradhan Prof. Anil Kadam HEAD OF DEPARTMENT GUIDE
  • 4. Acknowledgment The success of any project is never limited to an individual undertaking the project. It is the collective effort of people around the individual that spell success. There are some key personalities involved whose role has been very vital to pave way for the success of the project. We take the opportunity to express our sincere thanks and gratitude to them. We would like to thank all the faculties (teaching & non-teaching) of Computer Engineering Department of AISSMS College of Engineering, Pune. Our project guide Prof. Anil Kadam was very generous in his time and knowledge with us. We are grateful to Mr. Shasikant Athavale who was the source of constant motivation and inspiration for us. We are very thankful and obliged by the valuable suggestions constantly given by Prof. Nitin Talhar and Ms. Sonali Nalamwar which proved to be very helpful for the success of our project. Our deepest gratitude to Prof. M. A. Pradhan for her thoughtful comments accompanied with her gentle support during the academics. We would like to thank the college authorities for providing us with full support regarding lab, network and related software.
  • 5. Abstract Parallex is a parallel processing cluster consisting of control nodes and execution nodes. Our implementation removes all the requirements of kernel level modification and kernel patches to run a Beowulf cluster system. There can be many control nodes in a typical Parallex cluster. The many control nodes will no longer just monitor but will also take part in execution if resources permit. We have removed all the restrictions of kernel, architecture and platform dependencies making out cluster system work with completely different sets of CPU powers, operating systems, and architectures, that too without the use of any existing parallel libraries, such as MPI and PVM. With a radically new perspective of how parallel system is supposed to be, we have implemented our own distribution algorithms and parallel algorithms aimed at ease of administration and simplicity of usage, without compromising the efficiency. With a fully modular 7-step design we attack the traditional complications and deficiencies in existing parallel system, such as redundancy, scheduling, cluster accounting and parallel monitoring. A typical Parallex cluster may consist of a few old-386 running NetBSD, some ultra modern Intel – Dual Core running Linux, and some server class MIPS processor running IRIX, all working in parallel with full homogeneity.
  • 6. Table of Contents Chapter No. Title Page No. LIST OF FIGURES I LIST OF TABLES II 1. A General Introduction 1.1 Basic concepts 1 1.2 Promises and Challenges 5 1.2.1 Processing technology 6 1.2.2 Networking technology 6 1.2.3 Software tools and technology 7 1.3 Current scenario 8 1.3.1 End user perspectives 8 1.3.2 Industrial perspective 8 1.3.3 Developers, researchers & scientists perspective 9 1.4 Obstacles and Why we don’t have 10 GHz today 9 1.5 Myths and Realities: 2 x 3 GHz < 6GHz 10 1.6 The problem statement 11 1.7 About PARALLEX 11 1.8 Motivation 12 1.9 Feature of PARALLEX 13 1.10 Why our design is “alternative” to parallel system 13 1.11 Innovation 14 2. REQURIREMENT ANALYSIS 16 2.1 Determining the overall mission of Parallex 16 2.2 Functional requirement for Parallex system 16 2.3 Non-functional requirement for system 17 3. PROJECT PLAN 19
  • 7. 4. SYSTEM DESIGN 21 5. IMPLEMENTATION DETAIL 24 5.1 Hardware architecture 24 5.2 Software architecture 26 5.3 Description for software behavior 28 5.3.1 Events 32 5.3.2 States 32 6. TECNOLOGIES USED 33 6.1 General terms 33 7. TESTING 35 8. COST ESTIMATION 44 9. USER MANUAL 45 9.1 Dedicated cluster setup 45 9.1.1 BProc Configuration 45 9.1.2 Bringing up BProc 47 9.1.3 Build phase 2 image 48 9.1.4 Loading phase 2 image 48 9.1.5 Using the cluster 49 9.1.6 Managing the cluster 50 9.1.7 Troubleshooting techniques 51 9.2 Share cluster setup 52 9.2.1 DHCP 52 9.2.2 NFS 54 9.2.2.1 Running NFS 55 9.2.3 SSH 57 9.2.3.1 Using SSH 60 9.2.4 Host file and name service 65 9.3 Working with PARALLEX 65
  • 8. 10. CONCLUSION 67 11. FUTURE ENHANCEMENT 68 12. REFERENCE 69 APPENDIX A 70 – 77 APPENDIX B 78 – 88 GLOSSARY 89 – 92 MEMORABLE JOURNEY (PHOTOS) 93 – 95 PARALLEX ACHIEVEMENTS 96 - 97
  • 9. I. LIST OF FIGURES: 1.1 High-performance distributed system. 1.2 Transistor vs. Clock Speed 4.1 Design Framework 4.2 Parallex Design 5.1 Parallel System H/W Architecture 5.2 Parallel System S/W Architecture 7.1 Cyclomatic Diagram for the system 7.2 System Usage pattern 7.3 Histogram 7.4 One frame from Complex Rendering on Parallex: Simulation of an explosion II. LIST OF TABLES: 1.1 Project Plan 7.1 Logic/ coverage/decidion Testing 7.2 Functional Test 7.3 Console Test cases 7.4 Black box Testing 7.5 Benchmark Results
  • 10. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 1 - Chapter 1. A General Introduction 1.1 BASIC CONCEPTS The last two decades spawned a revolution in the world of computing; a move away from central mainframe-based computing to network-based computing. Today, servers are fast achieving the levels of CPU performance, memory capacity, and I/O bandwidth once available only in mainframes, at cost orders of magnitude below that of a mainframe. Servers are being used to solve computationally intensive problems in science and engineering that once belonged exclusively to the domain of supercomputers. A distributed computing system is the system architecture that makes a collection of heterogeneous computers, workstations, or servers act and behave as a single computing system. In such a computing environment, users can uniformly access and name local or remote resources, and run processes from anywhere in the system, without being aware of which computers their processes are running on. Distributed computing systems have been studied extensively by researchers, and a great many claims and benefits have been made for using such systems. In fact, it is hard to rule out any desirable feature of a computing system that has not been claimed to be offered by a distributed system [24]. However, the current advances in processing and networking technology and software tools make it feasible to achieve the following advantages: • Increased performance. The existence of multiple computers in a distributed system allows applications to be processed in parallel and thus improves application and system performance. For example, the performance of a file system can be improved by replicating its functions over several computers; the file replication allows several applications to access that file system in parallel. Furthermore, file replication distributes network traffic associated with file access across the various sites and thus reduces network contention and queuing delays. • Sharing of resources. Distributed systems are cost-effective and enable efficient access to all system resources. Users can share special purpose and sometimes
  • 11. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 2 - expensive hardware and software resources such as database servers, compute servers, virtual reality servers, multimedia information servers, and printer servers, to name just a few. • Increased extendibility. Distributed systems can be designed to be modular and adaptive so that for certain computations, the system will configure itself to include a large number of computers and resources, while in other instances, it will just consist of a few resources. Furthermore, limitations in file system capacity and computing power can be overcome by adding more computers and file servers to the system incrementally. • Increased reliability, availability, and fault tolerance. The existence of multiple computing and storage resources in a system makes it attractive and cost-effective to introduce fault tolerance to distributed systems. The system can tolerate the failure in one computer by allocating its tasks to another available computer. Furthermore, by replicating system functions and/or resources, the system can tolerate one or more component failures. • Cost-effectiveness. The performance of computers has been approximately doubling every two years, while their cost has decreased by half every year during the last decade. Furthermore, the emerging high speed network technology [e.g., wave- division multiplexing, asynchronous transfer mode (ATM)] will make the development of distributed systems attractive in terms of the price/performance ratio compared to that of parallel computers. These advantages cannot be achieved easily because designing a general purpose distributed computing system is several orders of magnitude more difficult than designing centralized computing systems—designing a reliable general-purpose distributed system involves a large number of options and decisions, such as the physical system configuration, communication network and computing platform characteristics, task scheduling and resource allocation policies and mechanisms, consistency control, concurrency control, and security, to name just a few. The difficulties can be attributed to many factors related to the lack of maturity in the distributed computing field, the asynchronous and independent behavior of the
  • 12. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 3 - systems, and the geographic dispersion of the system resources. These are summarized in the following points: • There is a lack of a proper understanding of distributed computing theory—the field is relatively new and we need to design and experiment with a large number of general-purpose reliable distributed systems with different architectures before we can master the theory of designing such computing systems. One interesting explanation for the lack of understanding of the design process of distributed systems was given by Mullender. Mullender compared the design of a distributed system to the design of a reliable national railway system that took a century and half to be fully understood and mature. Similarly, distributed systems (which have been around for approximately two decades) need to evolve into several generations of different design architectures before their designs, structures, and programming techniques can be fully understood and mature. • The asynchronous and independent behavior of the system resources and/or (hardware and software) components complicate the control software that aims at making them operate as one centralized computing system. If the computers are structured in a master–slave relationship, the control software is easier to develop and system behavior is more predictable. However, this structure is in conflict with the distributed system property that requires computers to operate independently and asynchronously. • The use of a communication network to interconnect the computers introduces another level of complexity. Distributed system designers not only have to master the design of the computing systems and system software and services, but also have to master the design of reliable communication networks, how to achieve synchronization and consistency, and how to handle faults in a system composed of geographically dispersed heterogeneous computers. The number of resources involved in a system can vary from a few to hundreds, thousands, or even hundreds of thousands of computing and storage resources. Despite these difficulties, there has been limited success in designing special-purpose distributed systems such as banking systems, online transaction systems, and point-of- sale systems. However, the design of a general purpose reliable distributed system
  • 13. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 4 - that has the advantages of both centralized systems (accessibility, management, and coherence) and networked systems (sharing, growth, cost, and autonomy) is still a challenging task. Kleinrock makes an interesting analogy between the human-made computing systems and the brain. He points out that the brain is organized and structured very differently from our present computing machines. Nature has been extremely successful in implementing distributed systems that are far more intelligent and impressive than any computing machines humans have yet devised. We have succeeded in manufacturing highly complex devices capable of high speed computation and massive accurate memory, but we have not gained sufficient understanding of distributed systems; our systems are still highly constrained and rigid in their construction and behavior. The gap between natural and man-made systems is huge, and more research is required to bridge this gap and to design better distributed systems. In the next section we present a design framework to better understand the architectural design issues involved in developing and implementing high performance distributed computing systems. A high-performance distributed system (HPDS) (Figure 1.1) includes a wide range of computing resources, such as workstations, PCs, minicomputers, mainframes, supercomputers, and other special- purpose hardware units. The underlying network interconnecting the system resources can span LANs, MANs, and even WANs, can have different topologies (e.g., bus, ring, full connectivity, random interconnect), and can support a wide range of communication protocols.
  • 14. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 5 - Fig. 1.1 High-performance distributed system. 1.2 PROMISES AND CHALLENGES OF PARALLEL AND DISTRIBUTED SYSTEMS The proliferation of high-performance systems and the emergence of high speed networks (terabit networks) have attracted a lot of interest in parallel and distributed computing. The driving forces toward this end will be (1) The advances in processing technology, (2) The availability of high-speed network, and (3) The increasing research efforts directed toward the development of software support and programming environments for distributed computing. Further, with the increasing requirements for computing power and the diversity in the computing requirements, it is apparent that no single computing platform will meet all these requirements. Consequently, future computing environments need to capitalize on and effectively utilize the existing heterogeneous computing resources. Only parallel and distributed systems provide the potential of achieving such an integration of resources and technologies in a feasible manner while retaining desired usability and flexibility. Realization of this potential, however, requires advances on a
  • 15. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 6 - number of fronts: processing technology, network technology, and software tools and environments. 1.2.1 Processing Technology Distributed computing relies to a large extent on the processing power of the individual nodes of the network. Microprocessor performance has been growing at a rate of 35 to 70 percent during the last decade, and this trend shows no indication of slowing down in the current decade. The enormous power of the future generations of microprocessors, however, cannot be utilized without corresponding improvements in memory and I/O systems. Research in main-memory technologies, high-performance disk arrays, and high-speed I/O channels are, therefore, critical to utilize efficiently the advances in processing technology and the development of cost-effective high performance distributed computing. 1.2.2 Networking Technology The performance of distributed algorithms depends to a large extent on the bandwidth and latency of communication among work nodes. Achieving high bandwidth and low latency involves not only fast hardware, but also efficient communication protocols that minimize the software overhead. Developments in high-speed networks provide gigabit bandwidths over local area networks as well as wide area networks at moderate cost, thus increasing the geographical scope of high-performance distributed systems. The problem of providing the required communication bandwidth for distributed computational algorithms is now relatively easy to solve given the mature state of fiber-optic and optoelectronic device technologies. Achieving the low latencies necessary, however, remains a challenge. Reducing latency requires progress on a number of fronts. First, current communication protocols do not scale well to a high- speed environment. To keep latencies low, it is desirable to execute the entire protocol stack, up to the transport layer, in hardware. Second, the communication interface of the operating system must be streamlined to allow direct transfer of data from the network interface to the memory space of the application program. Finally, the speed
  • 16. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 7 - of light (approximately 5 microseconds per kilometer) poses the ultimate limit to latency. In general, achieving low latency requires a two-pronged approach: 1. Latency reduction. Minimize protocol-processing overhead by using streamlined protocols executed in hardware and by improving the network interface of the operating system. 2. Latency hiding. Modify the computational algorithm to hide latency by pipelining communication and computation. These problems are now perhaps most fundamental to the success of parallel and distributed computing, a fact that is increasingly being recognized by the research community. 1.2.3 Software Tools and Environments The development of parallel and distributed applications is a nontrivial process and requires a thorough understanding of the application and the architecture. Although a parallel and distributed system provides the user with enormous computing power and a great deal of flexibility, this flexibility implies increased degrees of freedom which have to be optimized in order to fully exploit the benefits of the distributed system. For example, during software development, the developer is required to select the optimal hardware configuration for the particular application, the best decomposition of the problem on the hardware configuration selected, and the best communication and synchronization strategy to be used, and so on. The set of reasonable alternatives that have to be evaluated in such an environment is very large, and selecting the best alternative among these is a nontrivial task. Consequently, there is a need for a set of simple and portable software development tools that can assist the developer in appropriately distributing the application computations to make efficient use of the underlying computing resources. Such a set of tools should span the software life cycle and must support the developer during each stage of application development, starting from the specification and design formulation stages, through the programming, mapping, distribution, scheduling phases, tuning, and debugging stages, up to the evaluation and maintenance stages.
  • 17. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 8 - 1.3 Current Scenario The current scenario of the Parallel Systems can be viewed under three perspectives. A common concept that applies to all of the following is the idea of Total Ownership Cost (TOC). By far TOC is a common scale on which level of computer processing is assessed worldwide. TOC is defined by the ratio of Total Cost of Implementation and maintenance by the net throughput the parallel cluster delivers. TOTAL COST OF IMPLEMENTATION AND MAINTENANCE TOC = ------------------------------------------------------------------------------------ NETSYSTEM THROUGHPUT (IN FLOATING POINT / SEC) 1.3.1 End user perspectives Various activities such as rendering, adobe Photoshop applications and different processes come under this category. As there is increase in need of processing power day by day it thereby increases hardware cost. From the end user prospective the Parallel Systems aims to reduce the expenses and avoid the complexities. At this stage we are trying to implement a Parallel System which is more cost effective and user friendly. However, as the end user, TOC is less important in most cases because Parallel Clusters could rarely be owned by a single user, and in that case the net throughput of the Parallel System becomes the most crucial factor. 1.3.2 Industrial Perspective In Corporate Sectors Parallel Systems are extensively implemented. Such a Parallel Systems consist of machines that have to handle millions of nodes theoretically not practically. From the industrial point of view the Parallel System aims at resource isolation, replacing large scale dedicated commodity hardware and Mainframes. Corporate sectors often place TOC as the primary criteria at which a Parallel Cluster is judged. With increase in scalability, the cost of owing Parallel Clusters shoot up to unmanageable heights and our primary aim is this area is to bring down the TOC as much as possible.
  • 18. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 9 - 1.3.3 Developers, Researchers & Scientists Perspective Scientific applications such as 3D simulations, high scale scientific rendering, intense numerical calculations, complex programming logic, and large scale implementation of algorithms (BLAS and FFT Libraries) require levels of processing and calculation that no modern day dedicated vector CPU could possibly meet. Consequently, the Parallel Systems are proven to be the only and the most efficient alternative in order to keep pace with modern day scientific advancements and research. TOC is rarely a matter of concern here. 1.4 Obstacles and Why we don’t have 10 GHz today… Fig 1.2 Transistor vs. Clock Speed CPU performance growth as we have known it hit a wall Figure graphs the history of Intel chip introductions by clock speed and number of transistors. The number of transistors continues to climb, at least for now. Clock speed, however, is a different story.
  • 19. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 10 - Around the beginning of 2003, you’ll note a disturbing sharp turn in the previous trend toward ever-faster CPU clock speeds. We have added lines to show the limit trends in maximum clock speed; instead of continuing on the previous path, as indicated by the thin dotted line, there is a sharp flattening. It has become harder and harder to exploit higher clock speeds due to not just one but several physical issues, notably heat (too much of it and too hard to dissipate), power consumption (too high), and current leakage problems. Sure, Intel has samples of their chips running at even higher speeds in the lab—but only by heroic efforts, such as attaching hideously impractical quantities of cooling equipment. You won’t have that kind of cooling hardware in your office any day soon, let alone on your lap while computing on the plane. 1.5 Myths and Realities: 2 x 3GHz < 6 GHz So a dual-core CPU that combines two 3GHz cores practically offers 6GHz of processing power. Right? Wrong. Even having two threads running on two physical processors doesn’t mean getting two times the performance. Similarly, most multi-threaded applications won’t run twice as fast on a dual-core box. They should run faster than on a single- core CPU; the performance gain just isn’t linear, that’s all. Why not? First, there is coordination overhead between the cores to ensure cache coherency (a consistent view of cache, and of main memory) and to perform other handshaking. Today, a two- or four-processor machine isn’t really two or four times as fast as a single CPU even for multi-threaded applications. The problem remains essentially the same even when the CPUs in question sit on the same die. Second, unless the two cores are running different processes, or different threads of a single process that are well-written to run independently and almost never wait for each other, they won’t be well utilized. (Despite this, we will speculate that today’s single-threaded applications as actually used in the field could actually see a performance boost for most users by going to a dual-core chip, not because the extra core is actually doing anything useful, but because it is running the ad ware and spy ware that infest many users’ systems and are otherwise slowing down the single CPU
  • 20. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 11 - that user has today. We leave it up to you to decide whether adding a CPU to run your spy ware is the best solution to that problem.) If you’re running a single-threaded application, then the application can only make use of one core. There should be some speedup as the operating system and the application can run on separate cores, but typically the OS isn’t going to be maxing out the CPU anyway so one of the cores will be mostly idle. (Again, the spy ware can share the OS’s core most of the time.) 1.6 The problem statement So now let us summarize and define the problem statement: • Since the growth of requirements of processing is far greater than the growth of CPU power, and since the silicon chip is fast approaching its full capacity, the implementation of parallel processing at every level of computing becomes inevitable. • There is a need to have a single and complete clustering solution which requires minimum user interference but at the same time supports editing/modifications to suit the user’s requirements. • There should be no need to modify the existing applications. • The parallel system must be able to support different platforms • The system should be able to fully utilize all the available hardware resources without the need of buying any extra/special kind of hardware. 1.7 About PARALLEX While the term parallel is often used to describe clusters, they are more correctly described as a type of distributed computing. Typically, the term parallel computing refers to tightly coupled sets of computation. Distributed computing is usually used to describe computing that spans multiple machines or multiple locations. When several pieces of data are being processed simultaneously in the same CPU, this might be called a parallel computation, but would never be described as a distributed computation. Multiple CPUs within a single enclosure might be used for
  • 21. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 12 - parallel computing, but would not be an example of distributed computing. When talking about systems of computers, the term parallel usually implies a homogenous collection of computers, while distributed computing typically implies a more heterogeneous collection. Computations that are done asynchronously are more likely to be called distributed than parallel. Clearly, the terms parallel and distributed lie at either end of a continuum of possible meanings. In any given instance, the exact meanings depend upon the context. The distinction is more one of connotations than of clearly established usage. Parallex is both a parallel and distributed cluster because it supports both ideas of multiple CPUs within a single enclosure as well as a heterogeneous collection of computers. 1.8 Motivation The motivation behind this project is to provide a cheap and easy to use solution to cater to the high performance computing requirements of organizations without the need to install any expensive hardware. In many organizations including our college, we have observed that when old systems are replaced by newer ones the older ones are generally dumped or sold at throw away prices. We also wanted to find a solution to effectively use this “silicon waste”. These wasted resources can be easily added to our system as the processing need increases, because the parallel system is linearly scalable and hardware independent. Thus the intent is to have an environment friendly and effective solution that utilizes all the available CPU power to execute applications faster. 1.9 Features of Parallex • Parallex simplifies the cluster setup, configuration and management process. • It supports machines with hard disks as well as diskless machines running at the same time. • It is flexible in design and easily adaptable. • Parallex does not require any special kind of hardware.
  • 22. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 13 - • It is multi platform compatible. • It ensures efficient utilization of silicon waste (old unused hardware). • Parallex is scalable. How these features are achieved and details of design will be discussed in subsequent chapters. 1.10 Why our design is “Alternative” to parallel system? Every renowned technology needs to evolve after a particular time as new generation enhances the sort come of the technology used earlier. So what we achieved is a bare bone line semantic of parallel system. When we were studying about the parallel and distributed system, the advantage is that we were working on the latest technology. The parallel system designed by scientist, no doubt were far more genius and intelligent than us. Our system is unique because we are actually splitting up the task according to processing power of nodes instead of just load balancing. Hence a slow processing node will get a smaller task compared to a faster one and all nodes will show the output the same calculated time on master node. We found some difficulties that how much task should be given to the heterogeneous system in order to get result at same time. We worked on this problem to find the solution and developed mathematical distribution algorithm which was successfully implemented and functional. This algorithm breaks the task according to the speed of the CPUs by sending a test application to all nodes and storing the return time of each node into a file. Then we further worked on the automation of the entire system. We were using password less secure shell login and network file system. We were successful up to some extent but atomization was not possible to ssh and NFS configuration. Hence manually setting up of new nodes every time is a demerit of ssh and NFS. To overcome this demerit we sorted the alternative solution which is Beowulf cluster, but after studying we concluded that it considered all nodes of same configuration and send tasks equally to all nodes. To improve our system we think differently from Beowulf cluster. We tried to make system more cost effective. We thought of diskless cluster concept in order get reed of hard disk to cut the cost and enhance the reliability of machine. The storage
  • 23. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 14 - device will affect the performance of entire system and increase the cost (due to replacement of the disks) and increase the waste of time in searching the faults. So, we studied & patched the Beowulf server & Beowulf distributed process space according to our need for our system. We made a kernel images for running diskless clusters using RARP protocol. When clusters runs kernel image in its memory, it demands for IP from master node or can also be called as server. The server assigns IP & node number of the clusters. By this, our diskless clusters system stands & ready to use for parallel computing. Then we modified our various codes including our own distribution algorithm, according to our new design. The best part of our system was that there is no need for any authorization setup. Every thing is now automatic. Till now, we were working on CODE LEVEL PARALLELISM. In this, we little bit modify code to run on our system just like MPI libraries are used to make code parallely executable. Now, the challenge with us was that what if we didn’t get source code instead of which we will get binary file to execute it on our parallel system. So, now we need to enhance our system by adding BINARY LEVEL PARALLELISM. We studied Open Mosix. Once open Mosix is installed & all the nodes are booted, the Open Mosix nodes see each other in the cluster and start exchanging information about their load level and resource usage. Once the load increases beyond the defined level, the process migrates to any other nodes on the network. There might be a situation where process demands heavy resource usage, it may happen that the process may keep migrating from node to node without been serviced. This is the major design flaw of the Open Mosix. And we are working out to find the solution. So, Our Design is ALTERNATIVE to all problems in the world of parallel computing. 1.11 Innovation Firstly our system does not require any additional hardware if the existing machines are well connected in a network. Secondly, even in a heterogeneous environment, with few fast CPUs and a few slower ones, the efficiency of the system does not drop by more than 1 to 5%, still maintaining an efficiency of around 80% for suitably adapted applications. This is because the mathematical distribution algorithm
  • 24. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 15 - considers relative processing powers of the node distributing only the amount of load that a node can process in the calculated optimal time of the system. All the nodes will process respective tasks and produce output at this calculated time. The most important point about our system is the ability to use diskless nodes in cluster, thereby reducing hardware costs and space and the required maintenance. Also in case of binary executables (when source code is not available) our system exhibits almost 20% performance gains.
  • 25. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 16 - Chapter 2. Requirement Analysis 2.1 Determining the overall mission of Parallex • User base: Students, educational institutes, small to medium business organizations. • Cluster usage: There will be one part of the cluster fully dedicated to solve the problem at hand and an optional part where computing resources from individual workstations are used. In the latter part, the parallel problems will be having lower priorities. • Software to be run on cluster: Depends upon the user base. At the cluster management level, the system software will be Linux. • Dedicated or shared cluster: As mentioned above it will be both. • Extent of the cluster: Computers that are all on the same subnet 2.2 Functional Requirements for Parallex system Functional Requirement 1 The PC’s must be connected in LAN so as to enable the system to be use without any obstacles. Functional Requirement 2 There will one master or controlling node which will distribute the task according to the processing speed of the node. Services Three services are to be provided on the master. 1. There is a Network Monitoring tool for resource discovery (e.g. IP address, MAC addresses, UP/DOWN Status etc.) 2. The Distribution Algorithm will distribute the task according to the current processing speed of the nodes. 3. Parallex Master Script that will send the distributed task to the nodes and get back the result and integrate it and gives out the output.
  • 26. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 17 - Functional Requirement 3 The final size of the executable code so be such that it should reside in the limited memory constraints on the machine. Functional Requirement 4 This product will only be used to speed up the applications which are preexisting in the enterprise. 2.3 Non-Functional Requirements for system - Performance Even in a heterogeneous environment, with few fast CPUs and a few slower ones, the efficiency of the system does not drop by more than 1 to 5%, still maintaining an efficiency of around 80% for suitably adapted applications. This is because the mathematical distribution algorithm considers relative processing powers of the node distributing only the amount of load that a node can process in the calculated optimal time of the system. All the nodes will process respective tasks and produce output at this calculated time. The most important point about our system is the ability to use diskless nodes in cluster, thereby reducing hardware costs and space and the required maintenance. Also in case of binary executables (when source code is not available) our system exhibits almost 20% performance gains. - Cost While a system of n parallel processors is less efficient than one n times faster processor, the Parallel System is often cheaper to build. Parallel computation is used for tasks which require very large amounts of computation, take a lot of time, and can be divided into n independent subtasks. In recent years, most high performance computing systems, also known as supercomputers, have parallel architectures.
  • 27. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 18 - - Manufacturing costs No extra hardware required. Cost of setting up LAN. - Benchmarks There are at least three reasons for running benchmarks. First, a benchmark will provide us with a baseline. If we make changes to our cluster or if we suspect problems with our cluster, we can rerun the benchmark to see if performance is really any different. Second, benchmarks are useful when comparing systems or cluster configurations. They can provide a reasonable basis for selecting between alternatives. Finally, benchmarks can be helpful with planning. For benchmarking we will use a 3D rendering tool named Povray (Persistence Of Vision Ray tracer, please see the Appendix for more details). - Hardware required x686 Class PCs (Linux (2.6x Kernels) installed with intranet connection) Switch (100/10T) Serial port connectors 100 BASE T LAN cable, RJ 45 connectors. - Software Resources Required Linux (2.6.x kernel) Intel Compiler suite (Noncommercial) LSB (Linux Standard Base) Set of GNU Kits with GNU CC/C++/F77/LD/AS GNU Krell monitor Number of PC’s connected in LAN 8 NODES in the LAN.
  • 28. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 19 - Chapter 3. Project Plan Plan of execution for the project was as follows: Serial No. Activity Software Used Number Of Days 1 Project Planning a) Choosing domain b) Identifying Key areas of work c) Requirement analysis - 10 2 Basic Installation of LINUX. LINUX (2.6x Kernel) 3 3 Brushing up on C programming Skills - 5 4 Shell Scripting LINUX (2.6x Kernel), GNU BASH 12 5 C Programming in LINUX Environment GNU C Compiler Suite 5 6 A Demo Project (Universal Sudoku Solver) To familiarize with LINUX programming environment. GNU C Compiler Suite , INTEL Compiler suite (Non- commercial) 16 7 Study Advanced LINUX tools and Installation of Packages & RED HAT RPMs. Iptraf, mc, tar, rpm, awk, sed, GNU plot, strace, gdb, etc. 10
  • 29. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 20 - 8 Studying Networking Basics & Network configuration in LINUX. - 8 9 Recompiling, Patching and analyzing the system kernel LINUX (Kernel 2.6x.x), GNU c compiler 3 10 Study & implementation of Advanced Networking Tools : SSH & NFS ssh & Openssh, nfs 7 11 a) Preparing the preliminary design of the total workflow of the project. b) Deciding the modules for overall execution, and dividing the areas of the concentration among the project group. c) Build Stage I prototype All of the above 17 12 Build Stage II prototype (Replacing ssh by custom made application) All of the above 15 13 Build Stage III prototype (Making Diskless Cluster) All of the above 10 14 Testing & Building Final Packages All of the above 10 Table 1.1 Project Plan
  • 30. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 21 - Chapter 4. System Design Generally speaking, the design process of a distributed system involves three main activities: (1) designing the communication system that enables the distributed system resources and objects to exchange information, (2) defining the system structure (architecture) and the system services that enable multiple computers to act as a system rather than as a collection of computers, and (3) defining the distributed computing programming techniques to develop parallel and distributed applications. Based on this notion of the design process, the distributed system design framework can be described in terms of three layers: (1) network, protocol, and interface (NPI) layer, (2) system architecture and services (SAS) layer, and (3) distributed computing paradigms (DCP) layer. In what follows, we describe the main design issues to be addressed in each layer. Fig. 4.1 Design Framework
  • 31. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 22 - • Communication network, protocol, and interface layer. This layer describes the main components of the communication system that will be used for passing control and information among the distributed system resources. This layer is decomposed into three sub layers: network type, communication protocols, and network interfaces. • Distributed system architecture and services layer. This layer represents the designer’s and system manager’s view of the system. SAS layer defines the structure and architecture and the system services (distributed file system, concurrency control, redundancy management, load sharing and balancing, security service, etc.) that must be supported by the distributed system in order to provide a single-image computing System. • Distributed computing paradigms layer. This layer represents the programmer (user) perception of the distributed system. This layer focuses on the programming paradigms that can be used to develop distributed applications. Distributed computing paradigms can be broadly characterized based on the computation and communication models. Parallel and distributed computations can be described in terms of two paradigms: functional parallel and data parallel paradigms. In functional parallel paradigm, the computations are divided into distinct functions which are then assigned to different computers. In data parallel paradigm, all the computers run the same program, the same program multiple data (SPMD) stream, but each computer operates on different data streams. With reference to Fig. 4.1, Parallex can be described as follows:
  • 32. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 23 - Fig. 4.2 Parallex Design
  • 33. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 24 - Chapter 5. Implementation Details The goal of the project is to provide an efficient system that will handle process parallelism with the help of Clusters. This parallelism will thereby reduce the time of execution. Currently we form a cluster of 8 nodes. Using a single computer for execution of any heavy process takes lot of time in execution. So here we are forming a cluster and executing those processes in parallel by dividing the process into number of sub processes. Depending on the nodes in cluster we migrate the process to those node and when the execution is over then it brings back the output produced by them to the Master node. By doing this we are reducing the process execution time and increasing the CPU utilization. 5.1 Hardware Architecture We have implemented a Shared Nothing Architecture of parallel system by making use of Coarse Grain Cluster structure. The inter-connect is ordinary 8-port switch and an optionally a Class-B or Class-C network. It is 3 level architecture: 1. Master topology 2. Slave Topology 3. Network interconnect 1. Master is a Linux running machine with a 2.6.x or 2.4.x (both under testing) kernel. It runs the parallel-server and contains the application interface to drive the remaining machines. The master runs a network scanning script to detect all the slaves that are alive and retrieves all the necessary information about each slave. To determine the load on each slave just before the processing of the main application, the master sends a small diagnostic application to the slave to estimate the load it can take at the present moment. Having collected all the relevant information, it does all the scheduling, implementing of parallel algorithms (distributing the tasks according processing power and current load), making use of CPU extensions (MMX, SSE, 3DNOW) depending upon the slave architecture, and everything except the execution of the program itself. It accepts the input/task to be executed. It allocates the tasks to
  • 34. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 25 - underlying slave nodes constituting the parallel system, which execute the tasks in parallel and return the output to the Master. Master plays the role of watchdog, which may or may not participate in actual processing But manages the entire task. 2. Slave is a single system cluster image (SSCI). It is basically dedicated for processing purpose. It accepts the sub-task along with the necessary library modules executes them and returns the output back to the Master. In our case, the slaves would be multi-boot capable systems, which could at one point of time be diskless cluster hosts, at other time they might behave as a general purpose cluster node and at some other time, they could act as normal CPU handling routine tasks of office and homes. In case of Diskless Machines, the slave will boot on Pre-created kernel image patched appropriately. 3. Network interconnection is to merge both Master and Slave topologies. It makes use of an 8-port switch, RJ 45 connectors and serial CAT 5 cables. It is a Star topology where the Master and the Slaves are interconnected through the Switch. Fig. 5.1 Parallel System H/W Architecture Cluster Monitoring: Each slave runs a server that collects the kernel processing / IO / memory / CPU and all the related details from PROC VIRTUAL file system and
  • 35. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 26 - forwards it to the MASTER NODE (here acting as a slave to each server running on each slave), and a user base programs plots it interactively on the Server screen thus showing the CPU / MEMORY / IO details of each node separately. 5.2 SOFTWARE ARCHITECTURE:- This architecture consists of two parts i.e. 1. Master Architecture 2. Slave Architecture Master consists of following levels. 1. Linux BIOS: Linux BIOS usually loads a Linux kernel. 2. Linux: Platform on which Master runs. 3. SSCI + Beoboot: This level extracts a single system cluster image used by Slave nodes. 4. Fedora Core/ Red Hat: Actual Operating System running on Master. 5. System Services: Essential Services running on Master. Eg. RARP Resolver Daemon. Slave inherits the Master with the following levels. 1. Linux BIOS 2. Linux 3. SSCI Fig 5.2 Parallel System S/W Architecture
  • 36. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 27 - Parallex is broadly divided in to following Modules: 1. Scheduler: this is the heart of out system. With radically new approach towards data and instruction level distribution, we have implemented a completely optimal heterogeneous cluster technology. We do task allocation based on the actual processing capability on each node and not on the give GHz power on the manual of the system. The task allocation is dynamic and the scheduling policy is based on POSIX scheduling implementation. We are also capable of implementing preemption, which we right now do not do in favour of the fact that system such as Linux and FreeBSD are capable of industry level preemption. 2. Job/instruction alligator: this is a set of remote fork like utility that allocates the jobs to then nodes. Unlike traditional cluster technology, this job allocator is capable of doing execution in disconnected mode that means that the network latency would substantially reduce due to temporary disconnection. 3. Accounting: we have written a utility “remote cluster monitor” which is capable of providing us samples of results from all the nodes, information about the CPU load, temperature, and memory statistics. We propose that with less than 0.2% of CPU power consumption, our network monitoring utility can sample over 1000 nodes in less than 3 seconds. 4. Authentication: all transactions between the nodes are 128 bit encrypted and do not require root privileges to run. Just a common user on all the standalone node must exist. For the diskless part, we remove this restriction as well. 5. Resource discovery: we run our own socket layered resource discovery utility, which discovers any additional nodes. Also reports if the resource has been lost. In case of any additional hardware capable of being used as part of parallel system, such as an additional processor to a system, or a replacement of processor with dual core processor is also reported continually.
  • 37. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 28 - 6. Synchronizer: the central balancing of the cluster. Since the cluster is capable of simultaneously running both the diskless, and standalone nodes as part of the same cluster, the synchronizer makes the result more reasonable in output is queued in real time so that data is not mixed up. It does instruction dependency analysis, and also uses pipelines in the network to make interconnect more communicative. 5.3 Description for software behavior The end user will submit the process/application to the administrator in case the application is source based, and the Cluster administrator owns the responsibility to explicitly parallelize the application for maximum exploitation of parallel architectures within the CPU and across the cluster nodes. In case the application is binary ( non source), the user might himself/herself submit the code to Master node program acceptor, which in turn would run the application with somewhat lower efficiency as compared to the source submissions to the administrator. Now the total system is responsible for minimizing the time of processing which in turn increases the throughput and speed up the processing.
  • 38. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 29 -
  • 39. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 30 -
  • 40. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 31 -
  • 41. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 32 - 5.3.1 Events 1. System Installation 2. Network initialization 3. Server and host configuration 4. Take input 5. Parallel execution 6. Send response 5.3.2 States 1. System Ready 2. System Busy 3. System Idle
  • 42. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 33 - Chapter 6. Technologies Used 6.1 General terms We will now briefly define the general terms that will be used in further descriptions or are related to our system. Cluster: - Interconnection of large number of computers working together in close synchronized manner to achieve higher performance, scalability and net computational power. Master: - Server machine which acts as the administrator of the entire parallel Cluster and executes task scheduling. Slave: - A client node which executes the task as given by the Master. SSCI: - Single System Cluster Image is a hypothetical idea of implementing cluster nodes into an image, where the cluster nodes will behave as if it were an additional processor; add on ram etc. into the controlling Master computer. This is the base theory of cluster level parallelism. Example implementations are, Multi node NUMA (IBM/Sequent) Multi-quad computers, SGI ATIX Servers. However, the idea of true SSCI remains unimplemented when it comes to heterogeneous clusters for parallel processing, except for Supercomputing clusters such as Thunder and Earth Stimulator. RARP: - Reverse Address Resolution Protocol is a network layer protocol used to resolve an IP address from a given hardware address (such as an Ethernet address / MAC Address).
  • 43. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 34 - BProc:- The Beowulf Distributed Process Space (BProc) is set of kernel modifications, utilities and libraries which allow a user to start processes on other machines in a Beowulf-style cluster. Remote processes started with this mechanism appear in the process table of the front end machine in a cluster. This allows remote process management using the normal UNIX process control facilities. Signals are transparently forwarded to remote processes and exit status is received using the usual wait() mechanisms. Having discussed the basic concepts of parallel and distributed systems, the problems in this field, and an overview of Parallex, we now move forward with the requirement analysis and design details of our system.
  • 44. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 35 - Chapter 7. Testing Logic Coverage/Decision Based: Test cases SI No . Test case name Test Procedure Pre- condition Expected Result Reference to Detailed Design 1. Initial_frame_fail Initial frame not defined None Parallex should give error & exit Distribution algo 2. Final_frame_fail Final frame not defined None Parallex should give error & exit Distribution algo 3. Initial_final_full Initial & Final frame given None Parallex should distribute accordingt to speed. Distribution Algo. 4. Input_file_name_ blank No input file given None Input file not found Parallex Master 5. Input_parameters _blank No parameters defined at command line None Exit on error Parallex Master Table 7.1 Logic/ coverage/decidion Testing
  • 45. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 36 - Initial Functional Test Cases for Parallex Use Case Function Being Tested Initial System State Input Expected Output System Startup Master is started when the switch is turned "on" Master is off Activate the "on" switch Master ON System Startup Nodes is started when the switch is turned “on” Nodes is ON Activate the "on" switch NODES is ON System Startup Nodes assigned IP by master Booting Get boot Image from Master Master shows that nodes are UP System Shutdown System is shut down when the switch is turned "off" System is on and not servicing a customer Activate the "off" switch System is off System Shutdown Connection to the Master is terminated when the system is shut down System has just been shut down Verify from the Master side that a connection to the Slave no longer exists Session System reads a customer's Program System is on and not servicing a customer Insert a readable Code/Program Program accepted Session System rejects an unreadable Program System is on and not servicing a customer Insert an unreadable Code/ program Program is rejected; System displays an error screen; System is ready to start a new sesion
  • 46. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 37 - Use Case Function Being Tested Initial System State Input Expected Output System Startup Master is started when the switch is turned "on" Master is off Activate the "on" switch Master ON Session System accepts customer's Program System is asking for entry of RANGE of calculation Enter a RANGE System gets the RANGE Session System breaks the task System is breaking task according to processing speed of Nodes. Perform distribution Algo System breaks task & write into a file. Session System feeds the task to Nodes for processing System feeds tasks to the nodes for execution Send tasks System displays a menu of task running on Nodes Session Session ends when all nodes gives out output System is getting output of all nodes & display the output & ends Get the output from all nodes. System displays the output & quit. Table 7.2 Functional Test
  • 47. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 38 - Cyclomatic Complexity: Control Flow Graph of a System: Fig 7.1 Cyclomatic Diagram for the system Cyclomatic complexity is a software metric (measurement) in computational complexity theory. It was developed by Thomas McCabe and is used to measure the complexity of a program. It directly measures the number of linearly independent paths through a program's source code. Computation of Cyclomatic Complexity: In the above flow graph E = no. of edges = 9 N = no. of nodes = 7 M = E – N + 2 = 9 – 7 + 2 = 4
  • 48. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 39 - Console And Black Box Testing: CONSOLE TEST CASES Sr. No. Test Procedure Pre - Condition Expected Result Actual Result 1 Testing in Linux terminal Terminal variables have default values Xterm related tools are disabled No graphical information displayed 2 Invalid no. of arguments All nodes are up Error message Proper Usage given 3 Pop-up terminals for different nodes All nodes are up No of pop-ups = no. of cores in alive nodes No of pop-ups = no. of cores in alive nodes 4 3D Rendering on single machine All necessary files in place Live 3D rendering Shows frame being rendered 5 3D Rendering on Parallex system. All nodes are up Status of rendering Rendered video 6 Mplayer testing Rendered frames Animation in .avi format Rendered video(.avi) Table 7.3 Console Test cases
  • 49. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 40 - BLACK BOX TEST CASES Sr. No. Test Procedure Pre - Condition Expected Result Actual Result 1 New Node up Node is Down Status Message Displayed By NetMon Tool. Message Node UP 2 Node goes Down Nodes is UP Status Message Displayed By NetMon Tool Message Node DOWN 3 Nodes Information Nodes are UP Internal Information of Nodes Status, IP , MAC addr, RAM etc. 4 Main task submission Application is Compiled Next module called (distribution algo) Processing speed of the nodes. 5 Main task submission with faulty input. Application is Compiled ERROR Display error & EXIT 6 Distribution algorithm Get RANGE Break task according processing speed of the nodes Breaks The RANGE & generates scripts 7 Cluster feed script All nodes up Task sent to individual machines for execution Display shows task executed on each machine 8 Result assembly All machines have returned results Final result calculation Final result displayed on screen 9 Fault tolerance Machine(s) goes down in-between execution Error recovery script is executed Task resent to all alive machines Table 7.4 Black box Testing
  • 50. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 41 - System Usage Specification outline: Fig 7.2 System Usage pattern : Fig 7.3 Histogram:
  • 51. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 42 - Runtime BENCHMARK: Runtime Benchmark : Fig 7.4 One frame from Complex Rendering on Parallex: Simulation of an explosion The following is the output comparison of same application with same parameters being run on a Standalone Machine, Existing Beowulf Parallel Cluster, and Our Cluster System Parallex. Application: POVRAY Hardware Specifications: NODE 0 P4 2.8 GHz NODE 1 Cor2DUO 2.8 GHz NODE 2 AMD 64, 2.01 GHz NODE 3 AMD 64, 1.80 GHz NODE 4 CELERON D,2.16 GHz
  • 52. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 43 - Benchmark Results: Time Single Machine Existing Parallel Systems(4 NODES) Parallex Cluster System (4 NODES) Real Time 14m 44.3 s 3m 41.61 s 3m 1.62 s User Time 13m 33.2s 10m 4.67 s 9m 30.75 s Sys Time 2m 2.26s 0m 2.26 s 0m 2.31s Table 7.5 Benchmark Results Note : User Time of Cluster is approximate sum of all per user system time per node.
  • 53. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 44 - Chapter 8. Cost Estimation Since the growth of requirements of processing is far greater than the growth of CPU power, and since the silicon chip is fast approaching its full capacity, the implementation of parallel processing at every level of computing becomes inevitable. Therefore we propose that in coming ages parallel processing and the algorithms that sophisticate it, like the ones we have designed and implemented, would form the heart of modern computing. Not surprisingly, parallel processing has already begun to penetrate the modern computing marker directly in form of multi core processors such is Intel dual-core and quad-core processors. One of ours primary aims are simplistic implementation and least administrative overhead makes the implementation of Parallex simple and effective. Parallex can be easily deployed to all sectors of modern computing where CPU intensive applications form an important part for its growth. While a system of n parallel processors is less efficient than one n times faster processor, the Parallel System is often cheaper to build. Parallel computation is used for tasks which require very large amounts of computation, take a lot of time, and can be divided into n independent subtasks. In recent years, most high performance computing systems, also known as supercomputers, have parallel architectures. Cost effectiveness is one of the major achievements of our Parallex system. We need no external or expensive hardware nor software, so price of our system is not been expensive. Our system is based on heterogeneous clusters in which power of CPU is not an issue due to our mathematical distribution algorithm. Our system efficiency will not drop by more than 5% due to fewer slower machines. So, we can say that we are using Silicon waste as challenge to our system, where we use out dated slower CPUs. Hence our system is Environment friendly design. One more feature of our system is that we are using diskless nodes which will reduce the total cost of system by approx. 20% as we are not using the storage devices of nodes. Apart from separate storage device we will use a centralized storage solution. Last but not the least our all software tools are Open source. Hence, we conclude that our Parallex system is one of the most cost effective systems in its genre.
  • 54. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 45 - Chapter 9. User Manual 9.1 Dedicated cluster setup For the dedicated cluster with one master and many diskless slaves, all the user has to do is install the RPMs supplied in the installation disk on the master. The BProc configuration file will then be found at /etc/bproc/config. 9.1.1 BProc Configuration Main configuration file: /etc/bproc/config • Edit with favorite text editor • Lines consist of comments (starting with #) • Rest are keyword followed by arguments • Specify interface: interface eth0 10.0.4.1 255.255.255.0 • eth0 is interface connected to nodes • IP of master node is 10.0.4.1 • Netmask of master node is 255.255.255.0 • Interface will be configured when BProc is started Specify range of IP addresses for nodes: iprange 0 10.0.4.10 10.0.4.14 • Start assigning IP addresses at node 0
  • 55. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 46 - • First address is 10.0.4.10, last is 10.0.4.14 • The size of this range determines the number of nodes in the cluster • Next entries are default libraries to be installed on nodes • Can explicitly specify libraries or extract library information from an executable • Need to add entry to install extra libraries librariesfrombinary /bin/ls /usr/bin/gdb • The bplib command can be used to see libraries that will be loaded Next line specifies the name of the phase 2 image bootfile /var/bproc/boot.img • Should be no need to change this • Need to add a line to specify kernel command line • kernelcommandline apm=off console=ttyS0,19200 • Turn APM support off (since these nodes don’t have any) • Set console to use ttyS0 and speed to 19200 • This is used by beoboot command when building phase 2 image Final lines specify Ethernet addresses of nodes, examples given #node 0 00:50:56:00:00:00 #node 00:50:56:00:00:01 • Needed so node can learn its IP address from master • First 0 is optional, assign this address to node 0 • Can automatically determine and add ethernet addresses using the nodeadd command
  • 56. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 47 - • We will use this command later, so no need to change now • Save file and exit from editor Other configuration files /etc/bproc/config.boot • Specifies PCI devices that are going to be used by the nodes at boot time • Modules are included in phase 1 and phase 2 boot images • By default the node will try all network interfaces it can find /etc/bproc/node_up.conf • Specifies actions to be taken in order to bring a node up • Load modules • Configure network interfaces • Probe for PCI devices • Copy files and special devices out to node 9.1.2 Bringing up BProc Check BProc will be started at boot time # chkconfig --list clustermatic • Restart master daemon and boot server # service bjs stop # service clustermatic restart # service bjs start • Load the new configuration
  • 57. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 48 - • BJS uses BProc, so needs to be stopped first • Check interface has been configured correctly # ifconfig eth0 • Should have IP address we specified in config file 9.1.3 Build a Phase 2 Image • Run the beoboot command on the master # beoboot -2 -n --plugin mon • -2 this is a phase 2 image • -n image will boot over network • --plugin add plugin to the boot image • The following warning messages can be safely ignored WARNING: Didn’t find a kernel module called gmac.o WARNING: Didn’t find a kernel module called bmac.o • Check phase 2 image is available # ls -l /var/clustermatic/boot.img 9.1.4 Loading the Phase 2 Image • Two Kernel Monte is a piece of software which will load a new Linux kernel replacing one that is already running
  • 58. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 49 - • This allows you to use Linux as your boot loader! • Using Linux means you can use any network that Linux supports. • There is no PXE bios or Etherboot support for Myrinet, Quadrics or Infiniband • “Pink” network boots on Myrinet which allowed us to avoid buying a 1024 port ethernet network • Currently supports x86 (including AMD64) and Alpha 9.1.5 Using the Cluster bpsh • Migrates a process to one or more nodes • Process is started on front-end, but is immediately migrated onto nodes • Effect similar to rsh command, but no login is performed and no shell is started • I/O forwarding can be controlled • Output can be prefixed with node number • Run date command on all nodes which are up # bpsh -a -p date • See other arguments that are available # bpsh -h bpcp • Copies files to a node • Files can come from master node, or other nodes • Note that a node only has a ram disk by default • Copy /etc/hosts from master to /tmp/hosts on node 0
  • 59. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 50 - # bpcp /etc/hosts 0:/tmp/hosts # bpsh 0 cat /tmp/hosts 9.1.6 Managing the Cluster bpstat • Shows status of nodes • up node is up and available • down node is down or can’t be contacted by master • boot node is coming up (running node_up) • error an error occurred while the node was booting • Shows owner and group of node • Combined with permissions, determines who can start jobs on the node • Shows permissions of the node ---x------ execute permission for node owner ------x--- execute permission for users in node group ---------x execute permission for other users bpctl • Control a nodes status • Reboot node 1 (takes about a minute) # bpctl -S 1 –R • Set state of node 0 # bpctl -S 0 -s groovy • Only up, down, boot and error have special meaning, everything else means not down
  • 60. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 51 - • Set owner of node 0 # bpctl -S 0 -u nobody • Set permissions of node 0 so anyone can execute a job # bpctl -S 0 -m 111 bplib • Manage libraries that are loaded on a node • List libraries to be loaded # bplib –l • Add a library to the list # bplib -a /lib/libcrypt.so.1 • Remove a library from the list # bplib -d /lib/libcrypt.so.1 9.1.7 Troubleshooting techniques • The tcpdump command can be used to check for node activity during and after a node has booted • Connect a cable to serial port on node to check console output for errors in boot process • Once node reaches node_up processing, messages will be logged in /var/log/bproc/node.N (where N is node number)
  • 61. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 52 - 9.2 Shared Cluster Setup Once you have the basic installation completed, you'll need to configure the system. Many of the tasks are no different for machines in a cluster than for any other system. For other tasks, being part of a cluster impacts what needs to be done. The following subsections describe the issues associated with several services that require special considerations. 9.2.1 DHCP Dynamic Host Configuration Protocol (DHCP) is used to supply network configuration parameters, including IP addresses, host names, and other information to clients as they boot. With clusters, the head node is often configured as a DHCP server and the compute nodes as DHCP clients. There are two reasons to do this. First, it simplifies the installation of compute nodes since the information DHCP can supply is often the only thing that is different among the nodes. Since a DHCP server can handle these differences, the node installation can be standardized and automated. A second advantage of DHCP is that it is much easier to change the configuration of the network. You simply change the configuration file on the DHCP server, restart the server, and reboot each of the compute nodes. The basic installation is rarely a problem. The DHCP system can be installed as a part of the initial Linux installation or after Linux has been installed. The DHCP server configuration file, typically /etc/dhcpd.conf, controls the information distributed to the clients. If you are going to have problems, the configuration file is the most likely source. The DHCP configuration file may be created or changed automatically when some cluster software is installed. Occasionally, the changes may not be done optimally or even correctly so you should have at least a reading knowledge of DHCP configuration files. Here is a heavily commented sample configuration file that illustrates the basics. (Lines starting with "#" are comments.)
  • 62. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 53 - # A sample DHCP configuration file. # The first commands in this file are global, # i.e., they apply to all clients. # Only answer requests from known machines, # i.e., machines whose hardware addresses are given. deny unknown-clients; # Set the subnet mask, broadcast address, and router address. option subnet-mask 255.255.255.0; option broadcast-address 172.16.1.255; option routers 172.16.1.254; # This section defines individual cluster nodes. # Each subnet in the network has its own section. subnet 172.16.1.0 netmask 255.255.255.0 { group { # The first host, identified by the given MAC address, # will be named node1.cluster.int, will be given the # IP address 172.16.1.1, and will use the default router # 172.16.1.254 (the head node in this case). host node1{ hardware ethernet 00:08:c7:07:68:48; fixed-address 172.16.1.1; option routers 172.16.1.254;
  • 63. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 54 - option domain-name "cluster.int"; } host node2{ hardware ethernet 00:08:c7:07:c1:73; fixed-address 172.16.1.2; option routers 172.16.1.254; option domain-name "cluster.int"; } # Additional node definitions go here. } } # For servers with multiple interfaces, this entry says to ignore requests # on specified subnets. subnet 10.0.32.0 netmask 255.255.248.0 { not authoritative; } As shown in this example, you should include a subnet section for each subnet on your network. If the head node has an interface for the cluster and a second interface connected to the Internet or your organization's network, the configuration file will have a group for each interface or subnet. Since the head node should answer DHCP requests for the cluster but not for the organization, DHCP should be configured so that it will respond only to DHCP requests from the compute nodes. 9.2.2 NFS A network filesystem is a filesystem that physically resides on one computer (the file server), which in turn shares its files over the network with other computers on the network (the clients). The best-known and most common network filesystem is Network File System (NFS). In setting up a cluster, designate one computer as your NFS server. This is often the head node for the cluster, but there is no reason it has to
  • 64. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 55 - be. In fact, under some circumstances, you may get slightly better performance if you use different machines for the NFS server and head node. Since the server is where your user files will reside, make sure you have enough storage. This machine is a likely candidate for a second disk drive or raid array and a fast I/O subsystem. You may even what to consider mirroring the filesystem using a small high-availability cluster. Why use an NFS? It should come as no surprise that for parallel programming you'll need a copy of the compiled code or executable on each machine on which it will run. You could, of course, copy the executable over to the individual machines, but this quickly becomes tiresome. A shared filesystem solves this problem. Another advantage to an NFS is that all the files you will be working on will be on the same system. This greatly simplifies backups. (You do backups, don't you?) A shared filesystem also simplifies setting up SSH, as it eliminates the need to distribute keys. (SSH is described later in this chapter.) For this reason, you may want to set up NFS before setting up SSH. NFS can also play an essential role in some installation strategies. If you have never used NFS before, setting up the client and the server are slightly different, but neither is particularly difficult. Most Linux distributions come with most of the work already done for you. 9.2.2.1 Running NFS Begin with the server; you won't get anywhere with the client if the server isn't already running. Two things need to be done to get the server running. The file /etc/exports must be edited to specify which machines can mount which directories, and then the server software must be started. Here is a single line from the file /etc/exports on the server amy: /home basil(rw) clara(rw) desmond(rw) ernest(rw) george(rw) This line gives the clients basil, clara, desmond, ernest, and george read/write access to the directory /home on the server. Read access is the default. A number of other
  • 65. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 56 - options are available and could be included. For example, the no_root_squash option could be added if you want to edit root permission files from the nodes. Had a space been inadvertently included between basil and (rw), read access would have been granted to basil and read/write access would have been granted to all other systems. (Once you have the systems set up, it is a good idea to use the command showmount -a to see who is mounting what.) Once /etc/exports has been edited, you'll need to start NFS. For testing, you can use the service command as shown here [root@fanny init.d]# /sbin/service nfs start Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS mountd: [ OK ] Starting NFS daemon: [ OK ] [root@fanny init.d]# /sbin/service nfs status rpc.mountd (pid 1652) is running... nfsd (pid 1666 1665 1664 1663 1662 1661 1660 1657) is running... rpc.rquotad (pid 1647) is running... (With some Linux distributions, when restarting NFS, you may find it necessary to explicitly stop and restart both nfslock and portmap as well.) You'll want to change the system configuration so that this starts automatically when the system is rebooted. For example, with Red Hat, you could use the serviceconf or chkconfig commands.
  • 66. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 57 - For the client, the software is probably already running on your system. You just need to tell the client to mount the remote filesystem. You can do this several ways, but in the long run, the easiest approach is to edit the file /etc/fstab, adding an entry for the server. Basically, you'll add a line to the file that looks something like this: amy:/home /home nfs rw,soft 0 0 In this example, the local system mounts the /home filesystem located on amy as the /home directory on the local machine. The filesystems may have different names. You can now manually mount the filesystem with the mount command [root@ida /]# mount /home When the system reboots, this will be done automatically. When using NFS, you should keep a couple of things in mind. The mount point, /home, must exist on the client prior to mounting. While the remote directory is mounted, any files that were stored on the local system in the /home directory will be inaccessible. They are still there; you just can't get to them while the remote directory is mounted. Next, if you are running a firewall, it will probably block NFS traffic. If you are having problems with NFS, this is one of the first things you should check. File ownership can also create some surprises. User and group IDs should be consistent among systems using NFS, i.e., each user will have identical IDs on all systems. Finally, be aware that root privileges don't extend across NFS shared systems (if you have configured your systems correctly). So if, as root, you change the directory (cd) to a remotely mounted filesystem, don't expect to be able to look at every file. (Of course, as root you can always use su to become the owner and do all the snooping you want.) Details for the syntax and options can be found in the nfs(5), exports(5), fstab(5), and mount(8) manpages. 9.2.3 SSH
  • 67. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 58 - To run software across a cluster, you'll need some mechanism to start processes on each machine. In practice, a prerequisite is the ability to log onto each machine within the cluster. If you need to enter a password for each machine each time you run a program, you won't get very much done. What is needed is a mechanism that allows logins without passwords. This boils down to two choices—you can use remote shell (RSH) or secure shell (SSH). If you are a trusting soul, you may want to use RSH. It is simpler to set up with less overhead. On the other hand, SSH network traffic is encrypted, so it is safe from snooping. Since SSH provides greater security, it is generally the preferred approach. SSH provides mechanisms to log onto remote machines, run programs on remote machines, and copy files among machines. SSH is a replacement for ftp, telnet, rlogin, rsh, and rcp. A commercial version of SSH is available from SSH Communications Security (http://www.ssh.com), a company founded by Tatu Ylönen, an original developer of SSH. Or you can go with OpenSSH, an open source version from http://www.openssh.org. OpenSSH is the easiest since it is already included with most Linux distributions. It has other advantages as well. By default, OpenSSH automatically forwards the DISPLAY variable. This greatly simplifies using the X Window System across the cluster. If you are running an SSH connection under X on your local machine and execute an X program on the remote machine, the X window will automatically open on the local machine. This can be disabled on the server side, so if it isn't working, that is the first place to look. There are two sets of SSH protocols, SSH-1 and SSH-2. Unfortunately, SSH-1 has a serious security vulnerability. SSH-2 is now the protocol of choice. This discussion will focus on using OpenSSH with SSH-2. Before setting up SSH, check to see if it is already installed and running on your system. With Red Hat, you can check to see what packages are installed using the package manager. [root@fanny root]# rpm -q -a | grep ssh
  • 68. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 59 - openssh-3.5p1-6 openssh-server-3.5p1-6 openssh-clients-3.5p1-6 openssh-askpass-gnome-3.5p1-6 openssh-askpass-3.5p1-6 This particular system has the SSH core package, both server and client software as well as additional utilities. The SSH daemon is usually started as a service. As you can see, it is already running on this machine. [root@fanny root]# /sbin/service sshd status sshd (pid 28190 1658) is running... Of course, it is possible that it wasn't started as a service but is still installed and running. You can use ps to double check. [root@fanny root]# ps -aux | grep ssh root 29133 0.0 0.2 3520 328 ? S Dec09 0:02 /usr/sbin/sshd ... Again, this shows the server is running. With some older Red Hat installations, e.g., the 7.3 workstation, only the client software is installed by default. You'll need to manually install the server software. If using Red Hat 7.3, go to the second install disk and copy over the file RedHat/RPMS/openssh-server-3.1p1-3.i386.rpm. (Better yet, download the latest
  • 69. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 60 - version of this software.) Install it with the package manager and then start the service. [root@james root]# rpm -vih openssh-server-3.1p1-3.i386.rpm Preparing... ########################################### [100%] 1:openssh-server ########################################### [100%] [root@james root]# /sbin/service sshd start Generating SSH1 RSA host key: [ OK ] Generating SSH2 RSA host key: [ OK ] Generating SSH2 DSA host key: [ OK ] Starting sshd: [ OK ] When SSH is started for the first time, encryption keys for the system are generated. Be sure to set this up so that it is done automatically when the system reboots. Configuration files for both the server, sshd_config, and client, ssh_config, can be found in /etc/ssh, but the default settings are usually quite reasonable. You shouldn't need to change these files. 9.2.3.1 Using SSH To log onto a remote machine, use the command ssh with the name or IP address of the remote machine as an argument. The first time you connect to a remote machine, you will receive a message with the remote machines' fingerprint, a string that identifies the machine. You'll be asked whether to proceed or not. This is normal. [root@fanny root]# ssh amy
  • 70. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 61 - The authenticity of host 'amy (10.0.32.139)' can't be established. RSA key fingerprint is 98:42:51:3e:90:43:1c:32:e6:c4:cc:8f:4a:ee:cd:86. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'amy,10.0.32.139' (RSA) to the list of known hosts. root@amy's password: Last login: Tue Dec 9 11:24:09 2003 [root@amy root]# The fingerprint will be recorded in a list of known hosts on the local machine. SSH will compare fingerprints on subsequent logins to ensure that nothing has changed. You won't see anything else about the fingerprint unless it changes. Then SSH will warn you and query whether you should continue. If the remote system has changed, e.g., if it has been rebuilt or if SSH has been reinstalled, it's OK to proceed. But if you think the remote system hasn't changed, you should investigate further before logging in. Notice in the last example that SSH automatically uses the same identity when logging into a remote machine. If you want to log on as a different user, use the -l option with the appropriate account name. You can also use SSH to execute commands on remote systems. Here is an example of using date remotely. [root@fanny root]# ssh -l sloanjd hector date sloanjd@hector's password:
  • 71. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 62 - Mon Dec 22 09:28:46 EST 2003 Notice that a different account, sloanjd, was used in this example. To copy files, you use the scp command. For example, [root@fanny root]# scp /etc/motd george:/root/ root@george's password: motd 100% |*****************************| 0 00:00 Here file /etc/motd was copied from fanny to the /root directory on george. In the examples thus far, the system has asked for a password each time a command was run. If you want to avoid this, you'll need to do some extra work. You'll need to generate a pair of authorization keys that will be used to control access and then store these in the directory ~/.ssh. The ssh-keygen command is used to generate keys. [sloanjd@fanny sloanjd]$ ssh-keygen -b1024 -trsa Generating public/private rsa key pair. Enter file in which to save the key (/home/sloanjd/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/sloanjd/.ssh/id_rsa. Your public key has been saved in /home/sloanjd/.ssh/id_rsa.pub.
  • 72. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 63 - The key fingerprint is: 2d:c8:d1:e1:bc:90:b2:f6:6d:2e:a5:7f:db:26:60:3f sloanjd@fanny [sloanjd@fanny sloanjd]$ cd .ssh [sloanjd@fanny .ssh]$ ls -a . .. id_rsa id_rsa.pub known_hosts The options in this example are used to specify a 1,024-bit key and the RSA algorithm. (You can use DSA instead of RSA if you prefer.) Notice that SSH will prompt you for a pass phrase, basically a multi-word password. Two keys are generated, a public and a private key. The private key should never be shared and resides only on the client machine. The public key is distributed to remote machines. Copy the public key to each system you'll want to log onto, renaming it authorized_keys2. [sloanjd@fanny .ssh]$ cp id_rsa.pub authorized_keys2 [sloanjd@fanny .ssh]$ chmod go-rwx authorized_keys2 [sloanjd@fanny .ssh]$ chmod 755 ~/.ssh If you are using NFS, as shown here, all you need to do is copy and rename the file in the current directory. Since that directory is mounted on each system in the cluster, it is automatically available. If you used the NFS setup described earlier, root's home directory/root, is not shared. If you want to log in as root
  • 73. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 64 - without a password, manually copy the public keys to the target machines. You'll need to decide whether you feel secure setting up the root account like this. You will use two utilities supplied with SSH to manage the login process. The first is an SSH agent program that caches private keys, ssh-agent. This program stores the keys locally and uses them to respond to authentication queries from SSH clients. The second utility, ssh-add, is used to manage the local key cache. Among other things, it can be used to add, list, or remove keys. [sloanjd@fanny .ssh]$ ssh-agent $SHELL [sloanjd@fanny .ssh]$ ssh-add Enter passphrase for /home/sloanjd/.ssh/id_rsa: Identity added: /home/sloanjd/.ssh/id_rsa (/home/sloanjd/.ssh/id_rsa) (While this example uses the $SHELL variable, you can substitute the actual name of the shell you want to run if you wish.) Once this is done, you can log in to remote machines without a password. This process can be automated to varying degrees. For example, you can add the call to ssh-agent as the last line of your login script so that it will be run before you make any changes to your shell's environment. Once you have done this, you'll need to run ssh-add only when you log in. But you should be aware that Red Hat console logins don't like this change. You can find more information by looking at the ssh(1), ssh-agent(1), and ssh-add(1) manpages. If you want more details on how to set up ssh-agent, you might look at SSH, The Secure Shell by Barrett and Silverman, O'Reilly, 2001. You can also find
  • 74. The SupeThe SupeThe SupeThe Super Computerr Computerr Computerr Computer AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering”AISSMS “College Of Engineering” - 65 - scripts on the Internet that will set up a persistent agent so that you won't need to rerun ssh-add each time. 9.2.4 Hosts file and name services Life will be much simpler in the long run if you provide appropriate name services. NIS is certainly one possibility. At a minimum, don't forget to edit /etc/hosts for your cluster. At the very least, this will reduce network traffic and speed up some software. And some packages assume it is correctly installed. Here are a few lines from the host file for amy: 127.0.0.1 localhost.localdomain localhost 10.0.32.139 amy.wofford.int amy 10.0.32.140 basil.wofford.int basil ... Notice that amy is not included on the line with localhost. Specifying the host name as an alias for localhost can break some software. 9.3 Working with Parallex Once the master has been configured and all nodes are up, working with Parallex to utilize all your available resources is very easy. Follow these simple steps to use the power of all nodes that are up. • Compile your code and place it in $PARALLEX_DIR/bin/ You can use the Makefile to do this for you. # make main_app • After the application is compiled without any errors, first start the networking monitoring tool of Parallex