Contenu connexe
Similaire à 50120140503017 2
Similaire à 50120140503017 2 (20)
Plus de IAEME Publication
Plus de IAEME Publication (20)
50120140503017 2
- 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME
148
A NEW MODEL OF A MASSIVELY PARALLEL VIRTUAL COMPUTER IN
A DISTRIBUTED SYSTEM
M. YOUSSFI1
, O. BOUATTANE1
, H. FAKHI1
, M.O. BENSALAH2
, B. CHERRADI1,3
1
Laboratoire SSDIA, ENSET, University Hassan II Mohammedia Casablanca, Morocco
2
FSR, University Mohammed V AGDAL Rabat, Morocco
3
CRMEF El Jadida, Morocco
ABSTRACT
In this paper, we present a new model of a massively parallel single Instruction Multiple Data
(SIMD) structure machines in a distributed system. Among the modelled machines, we distinguish
the linear, 2D, 3D meshes, pyramidal structures and GPU structure. All these computers are based
physically on a multitude of fine grained processing elements (PE) arranged and coupled according
to their associated topological pattern. In this model the host is represented by a distributed agent.
Each virtual host agent, deployed in a physical computer, manages a local parallel virtual computer
composed by a set of virtual processing elements (VPE). Each VPE is represented by a self threaded
object. The distributed virtual host agents are interconnected throw a multi agent system platform.
This feature allows combining the performances of a set of physical computers to build a high
performance massively parallel virtual machine. The developed software is based on a hard kernel of
a parallel virtual machine in which we translate all the physical properties of its different
components. This kernel is based on an abstract layer which can be easily extended to the other
topological structures. To implement a parallel program in the proposed platform, we have
developed a new parallel programming language based on XML and its compiler which allow
editing, compiling and running parallel programs. To illustrate the performance of this model, we
present an example of a parallel program implementation for the edge detection of a brain MRI
image presenting pathology.
Keywords: Parallel Computing, Parallel Programming, SIMD, Emulation, Parallel Virtual Machine,
Parallel Programming Language.
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 5, Issue 3, March (2014), pp. 148-157
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2014): 8.5328 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
- 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME
149
I. INTRODUCTION
Compute-intensive applications, such as data analysis, financial forecasts or scientific
simulation and applications with high volume of data as image processing, multimedia, online video
gaming and the social networking, need more processing power and availability of storage resources
and computing. Analysis tools, the computation methods and their technological computational
models, have known a very high level of progress. This progress has oriented the scientists toward
new computation strategies based on parallel approaches. Due to the large volume of data to be
processed and to the large amount of computations needed to solve a given problem, the basic idea is
to split tasks and data so that we can easily perform their corresponding algorithms concurrently on
different physical computational units. Naturally, the use of the parallel approaches implies
important data exchange between computational units. Subsequently, this generates new problems of
data exchange and communications.
Naturally, the obvious need allow appears new computation strategies based on parallel
approaches. Compared to the sequential approaches, the parallelism is absolutely more efficient. The
highlight of this paradigm is to reduce the execution time for solving a given problems, it is
extensively implemented on image processing [1][2][3], on Tomography[4], and also on the research
area of Multimedia Content Analysis (MMCA)[5].
The need of the new architectures and the processor efficiency improvement has been excited
and encouraged by the VLSI development. As a result, we have seen the new processor technologies
(e.g. Reduced Instruction Set Computer “RISC”, Transputer, Digital Signal Processor “DSP”,
Cellular automata etc.) and the parallel interconnection of fine grained networks (e.g. Linear, 2-D
grid of processors, pyramidal architectures, cubic and hyper cubic connection machines, etc.).
Several parallel architectures were created to solve parallel problems. In the past decade some simple
grids of cellular automata have been created. Due to some technological enhancements, the cellular
automaton became a fine grained processing element and the resulted grid became the well known
mesh connected computer MCC [6]. Using some additional communication Buses, the MCC became
the mesh with multiple broadcast [7] and polymorphic torus [8]. Finally, the reconfigurable mesh
computer (RMC) integrates a reconfiguration network at each processing element [9, 10]. Recently,
the graphical processing unit GPU architectures [11, 12, 13] that are largely used for graphical
parallel computing, are introduced as the most feasible structure to represent some parallel
architectures for more complicated problems.
The theorists innovate more and more by imagining new parallel machines having well
adapted topologies to specific problems. These innovations lead to new algorithms for high
performance calculation. Unfortunately, the technology is not able to follow the scientist’s
imaginations. Due to the unavailability or to the high cost of this kind of real parallel machines, we
conclude that creating emulators for these architectures is the first good way to test and validate the
parallel algorithms on serial machines. In this context, the realization of an emulator for SIMD
parallel machines has been the first exploited model in our research works [14], [15], [16] and [17].
This emulator has been of great use to elaborate, validate and test new parallel algorithms without
need the real parallel machines. However, emulating a parallel machine on a single processor
machine do not introduce any enhancement in terms of performance at runtime. It is therefore
necessary to look for other ways to achieve these emulated parallel applications in a real
environment that offers a high performance gain.
The parallel and distributed systems have become a natural solution for different types of
environments [18]. In such systems many middleware have been designed and used to build grid
computing. Subsequently, new challenges appear, such as load balancing and fault tolerances. The
quality and performance of a grid computing depend on the performance of the machines used in its
nodes, the interconnection networks and mainly the quality of the associated middleware.
Researchers intensify their efforts to find new solutions and new high-performance middleware
- 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME
150
having new innovative features to improve the performance of parallel virtual machines based on
grid computing.
In this paper, we present a new model to describe and emulate a polymorphic massively
parallel single Instruction Multiple Data (SIMD) structure machines in a distributed system. Among
the modelled machines, we distinguish the linear, 2D, 3D meshes, pyramidal structures and GPU
structure. All these computers are based physically on a multitude of fine grained processing
elements (PE) arranged and coupled according to their associated topological pattern. In this model
the host is represented by a distributed agent. Each virtual host agent, deployed in a physical
computer, manages a local parallel virtual computer composed by a set of virtual processing
elements (VPE). Each VPE is represented by a self threaded object. The distributed virtual host
agents are interconnected throw a multi agent system platform. This feature allows combining the
performances of a set of physical computers to build a high performance massively parallel virtual
machine. The developed software is based on a hard kernel of a parallel virtual machine in which we
translate all the physical properties of its different components. This kernel can be easily extended to
the other mentioned topological structures. To implement a parallel program in the proposed
platform, we have developed a new parallel programming language based on XML and its compiler
which allow editing, compiling and running parallel programs.
The rest of this paper is organized as follows. In Section 2, we will present the proposed
massively parallel computational model. In the section 3, we present the architecture and the design
this model. In the newt section, we illustrate the performance of this model by presenting an example
of a parallel program implementation for the edge detection of a brain MRI image presenting
pathology. Finally, the last section gives some concluding remarks and some exploiting perspectives.
II. THE PROPOSED PARALLEL COMPUTATIONAL MODEL
A. The massively virtual computer structure
As it mentioned previously, one of the clear trends of high performance is the Parallel
Computing, aimed to substitute and distribute task and data between elementary computers.
Consequently, computer architects have always strived to increase the performance of their computer
architectures, allowing appears several architectures in scientific literature such as the linear, 2D, 3D
meshes, pyramidal structures and GPU structure.
In this work, our virtual Reconfigurable Parallel Computer (PRC) is based on the
reconfigurable 3D mesh easily extensible to the other mentioned architectures.
A 3D Reconfigurable Mesh Computer (3D-RMC) of size M x N x K, is a massively parallel
machine having M x N x K Processing elements (PEs) arranged on a 3-D matrix. The figure 1 shows
a 3-D MCR of size 3 x 3 x 3. It is a Single Instruction Multiple Data (SIMD) structure, in which
each PE (i, j, k) is localized in Cartesian coordinates (i, j, k) and has an identifier defined by
ID = M x i + N x j+ k. A 2D RMC of size n x n has a finite number of registers of size (log2 n) bits.
The PEs can carry out arithmetic and logical operations. They can also carry out reconfiguration
operations to exchange data over the mesh. The PEs can use a shared a memory. Each PE is a self
Thread autonomous component. All the PEs are managed by a virtual host manager represented by a
distributed agent that is designed to load parallel program and global data to distribute them over the
mesh of PEs for any execution. A virtual host agent can communicate with other host agents
deployed in the distributed system. This feature gives the possibility to combine the performances of
a set of distributed physical computers in order to build a massively parallel virtual machine.
- 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME
151
Figure 1: A Virtual host Agent representing a 3D Reconfigurable Mesh Computer of size 3 x 3 x 3
B. Processing Element (PE) Model
In this part, we have translated all the physical components of a PE as shown in figure 2, and
its functionalities to the virtual processor model. Thus, the state of the virtual PE is defined by the
following components:
• Identifier registers (iReg, jReg, kReg): Used to identify the PE in virtual PE topology.
• Data registers (Reg[]): Used to load the single data to be performed by the PE. For example,
the data may be a set of pixels representing a part of an image to be processed by the virtual
computer.
• Flags: Each of its flag bits will indicate the PE state related to any instruction.
• Communication Ports: Defined variables: p1, p2, …, pn.
• ALUnit: To perform logical and arithmetical operations,
• Other components: Notice that the PRC machine emulator is an open system. It can be
dynamically and easily extended in terms of register memories, communication bus width, and
special functions according to the problem in query.
In such cases of special advanced programming applications, some algorithms require special
additional registers. The programmer has the ability to extend easily its computational model. The
figure 2 shows a simplified model of two connected virtual processing elements used in our model.
Figure 2. The main virtual PE components
Shared Memory
PE
000
H
O
S
T
P7
P5
P6
P4
P3
P2
P1
P0
iReg iReg iReg
ALU
Reg[0]
Reg[1]
Reg[..] Flags
Bridge
idReg
Other
PE 000
P7
P5
P6
P4
P3
P2
P1
P0
iReg iReg iReg
ALU
Reg[0]
Reg[1]
Reg[..] Flags
Bridge
idReg
Other
PE 010
- 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME
152
The behavior of the virtual VE is defined by the following operations:
• Basic PE operations
Like any processor, each processing element (PE) of the RMC can execute a set of
instructions relating to the arithmetic and logical operations. The operands concerned can be the
local data of a PE or the data arising on its communication channels (Bit-Model or Word-model)
after any data exchange operation between the PEs. The PE can be configured as a marked PE to
participate in performing parallel operations.
The PE can also carry out the bridge configurations in order to establish connections between
two or more communication channels. When the PE of the RMC is in a Simple Bridge (SB) state, it
establishes connections between two of its communication channels. This PE can connect itself to
each bit of its channels, either in transmitting mode, or in receiving mode, as it can be isolated from
some of its bits (i.e. neither transmitter, nor receiver). A PE is in a Double Bridge (DB) state when it
carries out a configuration having two independent buses. A PE is in Crossed Bridge (CB) State
when it connects all its active communication channels in only one; each bit with its correspondent.
This operation is generally used when we want to transmit information to a set of PEs at the same
time.
III. ARCHITECTURE AND DESIGN OF THE PROPOSED MODEL
The proposed parallel virtual machine is represented by a set of distributed agents. Each agent
represents the host of its local virtual machine. Each host has a set of virtual processors arranged
according to the topology of the virtual machine (2D, 3D pyramidal, etc...). The virtual PE is a self
threaded objet which is ready to run the basic instructions of the parallel program broadcasted by the
virtual host. In the proposed model, we have defined an abstract layer of the virtual machine which
represents the core of this framework. We have defined some concrete implementations of the virtual
machine, but the programmer can add other implementations extending the features of the model.
The UML class diagram of Figure 3 shows the main components of the proposed model. In the
realized framework, the parallel programmers must edit or open an edited parallel program and
compile it before its execution. In the developed emulator, we have proposed a new parallel
programming language based on XML language according to a specific developed XML schema.
Figure 3: The UML class diagram of the main components of the parallel virtual computer
In figure 4, we represent the architecture of the parallel virtual computer. In this architecture,
the used components are configured using an xml configuration file. The parallel xml program is
Agent
AbstractParallelVirtualComputer
Mesh3 GPU
ALU
Port
*
*
Other
Thread
VPEImpl OtherVPEImpl
AbstractVirtualPE
- 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976
ISSN 0976 - 6375(Online), Volume 5, Issue
parsed and compiled to a parallel object program. This later is executed by the host agent by
broadcasting and synchronizing data and instructions flows to the marked PEs of the parallel virtual
machine.
Figure 4: Reconfigurable Parallel Computer architecture
A parallel program loaded in the platform is composed by a set of instructions. Each
instruction is defined by zero a set attributes. An instruction contains other child instructions. The
UML class diagram of the figure 5 represents the structure of parallel object program.
Figure 5: The UML class diagram of the XML Parallel progra
IV. APPLICATION
In this section, we present an example of a parallel algorithm implementation for contour
detection of a gray levelled image using Sobel operator [
The Sobel operator is used in image processing, particularly within edge detection algor
Technically, it is a discrete differentiation operator, computing an approximation of the gradient of
the image intensity function. At each point in the image, the result of the Sobel operator is either the
corresponding gradient vector or the norm
the image with a small, separable, and integer valued filter in horizontal and vertical direction and is
therefore relatively inexpensive in terms of computations. On the other hand, the gradient
approximation that it produces is relatively crude, in particular for high frequency variations in the
image.
The operator uses two 3×3 kernels which are convolved with the original image to calculate
approximations of the derivatives: one for horizontal c
the source image, and Gx and Gy are two images which at each point contain the horizontal and
vertical derivative approximations, the computations are as follows [20]
ParallelProgram
-fileName : String
+compile () :void
+run()
Computer Engineering and Technology (IJCET), ISSN 0976
6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME
153
parsed and compiled to a parallel object program. This later is executed by the host agent by
ing and synchronizing data and instructions flows to the marked PEs of the parallel virtual
Reconfigurable Parallel Computer architecture
A parallel program loaded in the platform is composed by a set of instructions. Each
ruction is defined by zero a set attributes. An instruction contains other child instructions. The
UML class diagram of the figure 5 represents the structure of parallel object program.
The UML class diagram of the XML Parallel program
In this section, we present an example of a parallel algorithm implementation for contour
detection of a gray levelled image using Sobel operator [19].
The Sobel operator is used in image processing, particularly within edge detection algor
Technically, it is a discrete differentiation operator, computing an approximation of the gradient of
the image intensity function. At each point in the image, the result of the Sobel operator is either the
corresponding gradient vector or the norm of this vector. The Sobel operator is based on convolving
the image with a small, separable, and integer valued filter in horizontal and vertical direction and is
therefore relatively inexpensive in terms of computations. On the other hand, the gradient
proximation that it produces is relatively crude, in particular for high frequency variations in the
The operator uses two 3×3 kernels which are convolved with the original image to calculate
approximations of the derivatives: one for horizontal changes, and one for vertical. If we define A as
the source image, and Gx and Gy are two images which at each point contain the horizontal and
vertical derivative approximations, the computations are as follows [20]:
Instruction
- name : String
+execute () :void
* *
Attribute
- name : String
- value : String
+execute () :void
*
Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
© IAEME
parsed and compiled to a parallel object program. This later is executed by the host agent by
ing and synchronizing data and instructions flows to the marked PEs of the parallel virtual
A parallel program loaded in the platform is composed by a set of instructions. Each
ruction is defined by zero a set attributes. An instruction contains other child instructions. The
UML class diagram of the figure 5 represents the structure of parallel object program.
In this section, we present an example of a parallel algorithm implementation for contour
The Sobel operator is used in image processing, particularly within edge detection algorithms.
Technically, it is a discrete differentiation operator, computing an approximation of the gradient of
the image intensity function. At each point in the image, the result of the Sobel operator is either the
of this vector. The Sobel operator is based on convolving
the image with a small, separable, and integer valued filter in horizontal and vertical direction and is
therefore relatively inexpensive in terms of computations. On the other hand, the gradient
proximation that it produces is relatively crude, in particular for high frequency variations in the
The operator uses two 3×3 kernels which are convolved with the original image to calculate
hanges, and one for vertical. If we define A as
the source image, and Gx and Gy are two images which at each point contain the horizontal and
Attribute
: String
: String
:void
- 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME
154
Gx ൌ
െ1 0 1
െ2 0 2
െ1 0 1
൩ כ A and Gy ൌ
1 2 1
0 0 0
െ1 െ2 െ1
൩ כ A
Where * here denotes the 2-dimensional convolution operation.
The x-coordinate is defined here as increasing in the right direction, and the y-coordinate is
defined as increasing in the down direction. At each point in the image, the resulting gradient
approximations can be combined to give the gradient magnitude, using:
G ൌ ඥGxଶ Gyଶ
In the sequential algorithm the complexity of this algorithm is evaluated to N. N represent the
pixel count of the image.
We will show that in this parallel implementation the complexity is Ɵ One time.
The structure of a parallel program implemented in the defined XML language is described as
bellow:
<parallelProgram>
<loadImage file="brain1.jpg" reg="0" codage="8"/>
<markAllPEs/>
<exchangeData reg="0" />
<compute expression="reg[9]=Math.abs(-reg[8]-2*reg[1]-
reg[2]+reg[4]+2*reg[3]+reg[6])"/>
<compute expression="reg[10]=Math.abs(reg[8]+2*reg[7]-reg[2]+reg[4]-2*reg[5]-
reg[6])"/>
<compute expression="reg[1]=reg[9]+reg[10]"/>
</parallelProgram>
• Step 1 : Loading the image the massively virtual machine :
<loadImage file="brain1.jpg" reg="0" codage="8"/>
As shown in figure 6, in this step, the gray level value of each pixel of the image is loaded in
the register 0 of each PE of the virtual computer.
Figure 6: Data registers contents of 9 PEs after loading the image
20
0
0
00
PE[i,j]
0
00
0
11
0
0
00
PE[i-1,j]
0
00
0
4
0
0
00
PE[i,j+1]
0
00
0
33
0
0
00
PE[i-1,j+1]
0
00
0
8
0
0
00
PE[i+1,j+1]
0
00
0
44
0
0
00
PE[i+1,j]
0
00
0
22
0
0
00
PE[i+1,j-1]
0
00
0
2
0
0
00
PE[i,j-1]
0
00
0
10
0
0
00
PE[-1,j-1]
0
00
0
- 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME
155
• Step 2 : Marking all PEs:
The statement <markAllPEs/> is used by the host agent to mark all the virtual PEs which are
designated to participate to the parallel computing.
• Step 3 : Exchanging data between the PE and its 8 neighbours:
<exchangeData reg="0"/>
As shown in figure 7, in this step each PE will exchange the data stored in register 0 with its 8
neighbours, if they exist. The received data are stored in other registers reg[1], reg[2], reg[3], reg[4],
reg[5], reg[6], reg[7] and reg[8].
Figure 7: Data registers contents of 9 PEs after the exchanging data operation
• Step 3 : Applying Sobel operator :
Each PE compute Sobel operator Gx, Gy and G, then save them, respectively, in reg[9], reg[10] and
reg[1].
<compute expression="reg[9]=Math.abs(-reg[8]-2*reg[1] reg[2]
+ reg[4]+2*reg[3]+reg[6])"/>
<compute expression="reg[10]=Math.abs(reg[8]+2*reg[7]-reg[2]
+reg[4]-2*reg[5]-reg[6])"/>
<compute expression ="reg[1]=Math.sqrt(reg[9]*reg[9]+reg[10] *reg[10])"/>
Figure 8 shows the parallel program result of a component contour detection of a gray
levelled image using Sobel operator.
The image in figure 8.a represents the input image representing a brain magnetic resonance
image. The resulted output image after the parallel Sobel algorithm is shown in figure 8.b.
20
44
11
42
PE[i,j]
33
822
10
11
20
0
3310
PE[i-1,j]
0
42
0
4
8
0
020
PE[i,j+1]
0
044
0
33
4
0
011
PE[i-1,j+1]
0
020
0
8
0
4
044
PE[i+1,j+1]
0
00
20
44
0
20
822
PE[i+1,j]
4
00
2
22
0
2
440
PE[i+1,j-1]
20
00
0
2
22
10
200
PE[i,j-1]
11
440
0
10
2
0
110
PE[-1,j-1]
0
200
0
- 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME
156
Figure 8: Results of the parallel program for contour detection using parallel Sobel operator
V. CONCLUSION
In this paper, we have presented a new model of a polymorphic massively parallel single
Instruction Multiple Data (SIMD) structure machines in a distributed system. Among the modelled
machines, we distinguish the linear, 2D, 3D meshes, pyramidal structures and GPU structure. In this
model the host is represented by a distributed agent. Each virtual host agent, deployed in a physical
computer, manages a local parallel virtual computer composed by a set of virtual processing
elements (VPE). Each VPE is represented by a self threaded object. The distributed virtual host
agents are interconnected throw a multi agent system platform. The obtained parallel virtual machine
and its programming language compiler can be used as a high performance computing system.
Actually, we have used this tool to resolve problems related to image processing domain using
parallel approach and specially SIMD architecture. However, this platform can be used to resolve
massively parallel applications related to other domains. In this model, we have proposed an abstract
layer representing the core of the framework to allow other developers adding new extensions for
new parallel virtual machine structures. Using a new parallel programming language based on XML
is a good solution for this platform. However, we are working to develop other modules which
allow writing parallel programs using the scientific programming languages like C++, Java or
Fortran.
REFERENCES
[1]. AWG Duller, RH Storer, AR Thomson, EL Dagless « An associative processor array
for image processing» Image and Vision Computing Volume 7, Issue 2, May 1989,
Pages 151–158.
[2]. O.Bouattane, J.Elmesbahi, and A.Rami, « A fast parrallel algorithm for convex hull problem
of multi-leveled images » journal of Intelligent and Robotic Systems, Kluwer academic
Publisher, Vol.33, pp. 285-299, 2002.
[3]. A.Errami, M.Khaldoun, J.Elmesbahi, and O.Bouattane, « O(1) time algorithm for structural
characterization of multi-leveled images and its applications on a re-configurable mesh
computer » journal of Intelligent and Robotic Systems, Springer, Vol .44, pp. 277-290, 2006.
[4]. F.Vázquez, E.M.Garzón, J.J.Fernández « A matrix approach to tomographic reconstruction
and its implementation on GPUs » Journal of Structural Biology Volume 170, Issue 1, April
2010, pp. 146–151.
a) b)
- 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME
157
[5]. Timo van Kessela, Ben van Werkhovena, Niels Drostb, Jason Maassenb, Henri E. Bala, Frank
J. Seinstrab, « User transparent data and task parallel multimedia computing with Pyxis-DT »
Future Generation Computer Systems Volume 29, Issue 8, October 2013, Pages 2252–2261
Including Special sections: Advanced Cloud Monitoring Systems & The fourth IEEE International
Conference on e-Science 2011 — e-Science Applications and Tools & Cluster, Grid, and Cloud
Computing.
[6]. R. Miller. et al, “Geometric algorithms for digitized pictures on a mesh connected computer,”
IEEE Transactions on PAMI, Vol. 7, No. 2, pp. 216–228, 1985.
[7]. V. K. Prasanna and D. I. Reisis, “Image computation on meshes with multiple broadcast,” IEEE
Transactions on PAMI, Vol. 11, No. 11, pp. 1194–1201, 1989.
[8]. H. LI, M. Maresca, “Polymorphic torus network,” IEEE Transaction on Computer, Vol. C-38,
No. 9, pp. 1345– 1351, 1989.
[9]. T. Hayachi, K. Nakano, and S. Olariu, “An O ((log log n)²) time algorithm to compute the convex
hull of sorted points on re-configurable meshes,” IEEE Transactions on Parallel and Distributed
Systems, Vol. 9, No. 12, 1167– 1179, 1998.
[10]. R. Miller, V. K. Prasanna-Kummar, D. I. Reisis, and Q. F. Stout, “Parallel computation on re-
configurable meshes,” IEEE Transactions on Computer, Vol. 42, No. 6, pp. 678–692, 1993.
[11]. Yue Zhao, Francis C.M. Lau, "Implementation of Decoders for LDPC Block Codes and LDPC
Convolutional Codes Based on GPUs," IEEE Transactions on Parallel and Distributed Systems,
vol. 25, no. 3, pp. 663-672, March 2014, doi:10.1109/TPDS.2013.52
[12]. Jing Wu, Joseph JaJa, Elias Balaras, "An Optimized FFT-Based Direct Poisson Solver on CUDA
GPUs," IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 3, pp. 550-559, March
2014, doi:10.1109/TPDS.2013.53
[13]. Abid Rafique, George A. Constantinides, Nachiket Kapre, "Communication Optimization of
Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs," IEEE Transactions on Parallel and
Distributed Systems, 28 Feb. 2014. IEEE computer Society Digital Library. IEEE Computer
Society, <http://doi.ieeecomputersociety.org/10.1109/TPDS.2014.6>
[14]. M. Youssfi, O. Bouattane and M. O. Bensalah “A Massively Parallel Re-Configurable Mesh
Computer Emulator: Design, Modeling and Realization” J. Software Engineering & Applications,
2010, 3: 11-26 doi:10.4236/jsea.2010.31002 Published Online January 2010.
[15]. Bouattane, B. Cherradi, M. Youssfi and M.O. Bensalah « Parallel c-means algorithm for image
segmentation on a reconfigurable mesh computer » ELSEVIER. Parallel computing, 37 (2011),
pp 230-243.
[16]. M. Youssfi, O. Bouattane, and M.O. Bensalah “ On the Object Modelling of the Massively
Parallel Architecture Computers” Proceedings of the IASTED Inter. Conf. Software engineering,
February 16 - 18, 2010, Innsbruck, AUSTRIA. Pp 71-78.
[17]. M. Youssfi, O. Bouattane, and M.O. Bensalah “Parallelization of the local image processing
operators. Application on the emulating framework of the reconfigurable mesh connected
computer”. Proceeding of scientists meeting in Information Technology and Communication
JOSTIC. Rabat, November 3-4, 2008. pp 81-83.
[18]. C. Kesselman, S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,”
International Journal of High Performance Computing Applications, vol. 15, issue3, pp 200–222,
2001.
[19]. SOBEL, I., An Isotropic 3×3 Gradient Operator, Machine Vision for Three – Dimensional Scenes,
Freeman, H., Academic Pres, NY, 376-379, 1990
[20]. R. Gonzalez and R. Woods Digital Image Processing, Addison Wesley, 1992, pp 414 – 428.
[21]. Arup Kumar Bhattacharjee and Soumen Mukherjee, “Object Oriented Design for Sequential and
Parallel Software Components”, International Journal of Information Technology and
Management Information Systems (IJITMIS), Volume 1, Issue 1, 2010, pp. 32 - 44, ISSN Print:
0976 – 6405, ISSN Online: 0976 – 6413.