SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of
       Engineering Research and Applications (IJERA)       ISSN: 2248-9622 www.ijera.com
                         Vol. 2, Issue 4, July-August 2012, pp.1261-1268
           The Assessment of Bandwidth Requirements for
     Meteorological Code VARSHA on a Parallel Computing System
      Mohammed Mahfooz Sheikh*, A M Khan*‡, T Venkatesh** and U N
                               Sinha†
                  *(Dept. of Electronics, Mangalore University, Mangalore, Karnataka 574199.)
        ** (Dept. of Computer Science and Engineering, Ghousia College of Engineering, Ramanagaram)
               † (Distinguished Scientist, CSIR NAL/C-MMACS, Bangalore, Karnataka 560017)


 ABSTRACT
         Complex scientific problems like                is, therefore, not surprising that requirement of
 weather forecasting, computational fluid and            large scale computations has led to development of
 combustion dynamics, computational drug                 parallel machines with history dating back to 1960s
 design etc. essentially require large scale             [4], [5]. The story of developments of the
 computational resources in order to obtain              computers in use till early 70s is well documented
 solution to the equations governing them. These         and vividly presented in the references [6–11].
 solutions are obtained by developing large              Parallel machines are generally built by the
 legacy codes and then executing them using              interconnection of more number of processors and
 parallel processing systems as they require             their architectures purely depend upon the
 large scale computations. The parallel                  complexity of the tasks which demands the type of
 processing computers generally demand huge              coupling required. The parallel processing tasks are
 bandwidth as they consist of large number of            divided among various Processing Elements (PEs)
 networked processing elements. One such                 that execute the jobs in parallel. It is implicitly
 legacy code VARSHA is a meteorological code             assumed here that the task is agreeable with parallel
 used for weather forecasting developed at               processing architecture and the communication
 Flosolver, CSIR-NAL under the joint project             mechanism is in place so that PEs may work on the
 from NMITLI (New Millennium Indian Tech-                subtasks of the main task. Yet they would complete
 nological Leadership Initiative) and MoES               the main task as if the process is carried out on a
 (Ministry of Earth Science). The VARSHA is              single virtual sequential computing machine.
 being run on a large scale computing platform,          Communication paradigm appears at a cross road at
 the Flosolver Mk8, the latest of the Flosolver          this point. It is a fact, that the field equations
 series of parallel computers. This paper                occurring in science when appropriately formulated
 discusses the bandwidth utilisation of VARSHA           very well requires distributed parallel processing. A
 code on Ethernet based interconnect, in its             simple example will illustrate the view point. The
 existing and modified forms in order to access          solution of the potential equation which is
 the bandwidth requirements of such legacy               formulated through Greens function is not naturally
 codes on parallel computing systems.                    amenable to parallel processing whereas when
                                                         formulated by finite difference discretisation leads
Keywords - Meteorological code, Bandwidth                naturally to domain decomposition technique
scaling, Parallel Computing, VARSHA model,               which is highly amenable to parallel processing
Flosolver                                                [12]. The PEs in the parallel machines are thus
                                                         required to cooperate to solve a particular task
1.       INTRODUCTION                                    needing       interconnection        scheme       for
         Parallel Processing has become an               communicating with each other. Such environment
inevitable tool for solving complex scientific           offers faster solution to complex problems than
problems that involve large scale computations.          feasible using sequential machines. Moreover
Without large scale computational resources              sequential machines may not be able to solve the
genome sequencing could not have been possible           problem in reasonable amount of time. The
[1]. New drug development routinely uses large           interconnection network required for the PEs to
scale computing [2]. Many new discoveries have           communicate forms the most important part of a
been result of large scale computations. For             parallel computer next to Central Processing Units
example, solitary waves were found by Ulam and           (CPUs).
his colleagues using large scale computing [3];
space missions demand massive computing for re-
entry trajectories of space vehicles and numerical
precision exceeding 20 digits are quite common. It

                                                                                             1261 | P a g e
Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of
     Engineering Research and Applications (IJERA)       ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue 4, July-August 2012, pp.1261-1268
2.     ESSENCE OF COMMUNICATION                    governing equations for the VARSHA model are
TECHNOLOGY IN PARALLEL                             discussed below.
PROCESSING                                            The model is based on the conversation laws for
Communication component being a critical part in        mass, momentum, energy and moisture. The
building a parallel computer, the bandwidth             momentum equations are replaced by vorticity and
estimation of a parallel processing system [13] is      divergence equations so that the spectral techniques
always a prerequisite and essentially the bandwidth     can be applied in the horizontal direction. The
requirement of the application is required to be        vertical coordinate is  
                                                                                     p where p is the layer
well within the system bandwidth specifications                                      p
[14]. Many real time applications like                  pressure and p  the surface pressure. Finite
meteorological computing, DNS computing, panel
techniques for aircraft wing load calculation and       differences are used to approximate the differential
many other problems of this class essentially           operators in the vertical directions. The governing
require parallel architectures for their solutions.     equations are cast in the form of evolution
The demonstration of super linear speed up of           equations for the model variables viz.
Navier Stokes calculation which is something like a     temperature(T), surface pressure ( p  ), specific
milestone in judging effectiveness of parallel          humidity(q), divergence(D) and vorticity(η). The
computing [15] is indeed an important initiative.       model prognostic equations are given below:
Large legacy codes that demand global coupling                                                         H
essentially require high speed communications.          The thermodynamic equation is           ln  
However it is possible to increase the speed of
                                                                                             t        C pT
computations by using more number of PEs where          where H is the heating rate per unit mass and θ is
more jobs can be executed in parallel but               the potential temperature. This is rewritten to give
appropriate communication mechanisms are to be          the following equation for temperature
used depending on the intensity of communication        T                                 H          T 
demanded by the application. One such parallelised          V  T  kT  V     ln p           
version of legacy code, VARSHA [16] operational         t                t                 Cp          
at Flosolver lab in NAL is presented in this paper                                                   (1)
along with the assessment for its bandwidth                                         R
                                                        where T = π θ;   p ; k     ; and  is the
                                                                            k
requirements. Although VARSHA requires large
communication bandwidth, here cluster based                                         Cp
architecture is used for its assessment due to ease     horizontal gradient in the system.
of its availability. As the case study is based on      The continuity equation is
VARSHA model, it is briefly described in section                                      
3.                                                         ln p  V   ln p    V       0 (2)
                                                        t                               
3.       VARSHA MODEL                                   On integrating this equation with the boundary
           This VARSHA model is a global/general        conditions  (0) =  (1) = 0 we get the equation
                                                                             
circulation model of atmosphere that solves a set of    for surface pressure,

                                                                                                   
nonlinear partial differential equations to predict      1                     
the future state of the atmosphere from a given         0  t 
                                                            ln p       V  V   ln p 
initial state. Since the domain of atmospheric flow
is bounded at the bottom by the surface of the          (3)
earth, exchange of properties take place at this                                                    dq
surface and it is necessary to prescribe appropriate    The conversation law for moisture is           = S,
boundary conditions or values for various
                                                                                                    dt
                                                        where q is the specific humidity and S represents
quantities. The bottom topography plays an
                                                        the sources and sinks. This equation is written as
important plays an important role in controlling the
                                                               q               q
airflow not only close to the ground but also at                   V  q      S             (4)
upper levels through induced vertical motion and               t                
momentum transfer by gravity waves. Present day         The equations for divergence and vorticity are
atmospheric models have moisture as one of the          obtained from the momentum equation by taking
variables and take into account diabatic processes      the dot and cross products respectively. The
like evaporation and condensation. All physical         equation for divergence is
processes involving moisture and others like
radiation, turbulence, gravity wave drag, land
                                                        D    1  B           A 
                                                                     cos     E    RT0 ln p 
                                                                   
surface processes etc. are parameterized in terms of                                   2

                                                        t a cos  
                                                                2                 
variables in the VARSHA model. Detailed                                           
discussion can be found in [17], [18] and the details                                                (5)
of parallelisation are available in [16]. The


                                                                                             1262 | P a g e
Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of
    Engineering Research and Applications (IJERA)       ISSN: 2248-9622 www.ijera.com
                      Vol. 2, Issue 4, July-August 2012, pp.1261-1268
and the equation for vorticity is                         function and such computations can be done in
          1  A               B                      parallel. Parts of the nonlinear terms are computed

 t
    
        a cos  
               2      cos  
                                          (6)           in the Fourier space, while the rest along with the
                                                         physical parameterization computations are carried
                                                       out on the latitude/longitude grid. The natural flow
where D =   V ; η = f   k ;  k  k   V            of the model code is to compute the Fourier
          RT                   V                    coefficients for each latitude, complete calculation
A  U       cos  ln p         F cos         in the Fourier space and then move to the grid
          a         t                              space through the discrete Fast Fourier Transform
          RT              U                         (FFT). After completion of computations in the
B  V       ln p          F cos              grid space, mostly related to physical
          a                                       parameterizations which work independently at
                                                          each grid point along a vertical column,
U  u cos ;V  v cos                                    atmospheric variables are again moved to Fourier
                                                        space through FFT. Finally, at the end of each time
    V V
E       ; Φ = geopotential height; f = Coriolis          step of integration, full spectral coefficients are
      2                                                   again formed by summing up contributions from all
parameter;                                                the latitudes. Except for this last operation all
a = radius of the earth; φ = latitude; λ = longitude;     computations involve quantities specific to one
Fφ = zonal component of the dissipative process;          latitude only. Thus, a natural scheme of
Fλ = meridional component of the dissipative              parallelization considered was to distribute the
processes. The VARSHA code has a horizontal               latitudes to different processors. This essentially
resolution of 80 waves in triangular truncation and       implies a domain splitting in the Fourier space
18 atmospheric layers in the vertical. The Fourier        only. The latitudes were divided among the
space again has 80 waves in the EW on 128 quasi-          processors and that each time step demands global
equidistant Gaussian latitudes which exclude both         communication in order to sum the full spectral
the poles and the equator. In the physical grid space     coefficients of each latitude. The parallel
the Fourier waves are projected onto 256                  simulation was done in Flosolver MK3 parallel
equispaced longitude points on each Gaussian              computer at NAL and [16] contains the data on its
latitude. The number of longitude points has to be        performance. The key point in this simulation was
much larger than the azimuthal wave number in             that a GCM model could be run on 4 processor
order to resolve the shorter waves produced by            Flosolver MK3 which was a remarkable feet and
nonlinear interaction. The VARSHA forecast                the efficiency issues were related to a second place
model      is    fully diabatic       and      includes   as platforms having large number of processors
parameterisation of all known physical processes          were not available. In 2010, the pictures have
like radiation, convection, turbulent boundary layer      changed, the numbers of processors available are
processes, land air interaction, surface friction, etc.   large and the issue is now that of parallel
                                                          efficiency. Practically, the same VARSHA code
4.   PARALLELISATION STRATEGY                             running on Flosolver MK8 (1024 Xeon processors
OF VARSHA MODEL                                           @ 2GHz and 4TB RAM) gives the efficiency
         Large application codes developed till late      shown in the Fig. 1 which is dismally poor.
70’s or early 80’s abound; these codes are
developed around sequential computer having von           5.   BANDWIDTH ASSESSMENT FOR
Neumann’s architecture. Extension of these codes          LEGACY CODE VARSHA
both in terms of scope and efficiency is a natural                  The initial portion and the completion
requirement which can only occur through the only         phase of the VARSHA model, the computational
possible route i.e. parallel routes. Such codes are       load is quite different and is representation of the
commonly called as legacy code. Code VARSHA               main computation task. Thus, there is a need to
used for weather predictions mainly consists of           profile only a specific segment of the code which is
numerical computation of equations in the spectral        compute intensive. Moreover, the complete
domain and communication of data at each time             profiling of VARSHA will generate huge amount
step of forecast that demands higher bandwidth.           of data. With all these computations in mind, the
The model computations are done in three distinct         present study utilises fourth time step of the
spaces namely the two dimensional global spectral         forecast for profiling. During each of the time
space, the one dimensional Fourier space (spectral        steps, the code computes, Fourier Transform of 256
in azimuthal direction) for each latitude and the         latitudes of 512 points each for 18 atmospheric
two dimensional latitude/longitude grid space.            levels. It also computes Legendre Transform for
Since the basic functions in the spectral expansion       equal number of points and also computes the
are orthogonal to each other, all linear operations in    nonlinear part of dynamical calculations. These
the spectral space involve only the selected basic        computations in the fourth time step and their


                                                                                             1263 | P a g e
Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of
           Engineering Research and Applications (IJERA)       ISSN: 2248-9622 www.ijera.com
                             Vol. 2, Issue 4, July-August 2012, pp.1261-1268
      communication to other boards depending on the                  but the way it is partitioned determines the
      number of boards are profiled and presented in                  efficiency. It is extremely essential to keep a
      Table I.                                                        balance     between        the     computation    and
                                                                      communication times. It is easily seen that the rate
                                                                      of communication time to computation times has to
                                                                      be small for parallel computing to be attractive. But
                                                                      its implementation even in the case of VARSHA
                                                                      often needs massive rewriting of the code and the
                                                                      structure of such rewriting gets mapped by the
                                                                      experiments performed using the profiling tools. In
                                                                      practical execution of VARSHA code, one finds
                                                                      that the computation of each physical parameter at
                                                                      each grid point along the vertical coloumn takes
                                                                      place independently. Once the computation for a
      Fig. 1. Efficiency of VARSHA model for various                   single time step (i.e. computation of all parameters)
      number of processors. (using MPI communication                  is complete, the values are then communicated
                protocol, Ethernet 1GB rating)                        globally. Such scenarios where communication
                                                                      waits for completion of computation and vice
            TABLE I SPEEDUP OF ORIGINAL                               versa, have to be eliminated and a better method for
                   APPLICATION CODE                                   synergy between computation and communication
             CPU                      Actual                          need to be followed. At the end of computation of
          Processing                Processing                Speed   each physical parameter, the communication can be
Boards                                                                initiated at the background such that it does not
             Time     Communication    Time                   up
            (msec)     Time (msec)    (msec)                          hamper or delay the computation of next parameter.
 1           3077           2          3079                     1     Towards the end, when the computation of all the
 2           1541         181          1722                    1.8    parameters are complete and at the time when
 4            774         352          1126                    2.7    global communication would normally take place,
                                                                      the communication for each parameters has already
 8            392         528           920                    3.3
                                                                      been completed. Very often the legacy codes are
 16           201         700           901                    3.4    parallelized in simple minded approach where the
                                                                      computation and communication are disjoint and
      The speedup trend is as shown in the previous                   can be overlapped. This overlapping will provide
      graph in Fig. 1. Also the bandwidth assessment                  dramatic reduction in overall execution time of the
      techniques for the speedup trend in Fig. 1 are                  code. Then effective processing time in equation
      explained as follows:                                           (7) would be
       5.1 Scaling of Bandwidth                                                            t = tcpu + tcomm - tov     (9)
      It is evident from Table I that the speedup is not              where tcpu is the CPU processing time, tcomm is the
      appreciable beyond 8 processors. But if the                     communication time and tov is the overlapped
      bandwidth can be improved, the communication                    communication time and is presented in the Table
      time will be reduced and there will be increase in              IV. The effect of bandwidth scaling and the
      speedup for more number of processors. In other                 overlapping techniques were carried out using
      words, if t is the actual processing time then                  profiling tools built in house. The profiling tools
                   t = t cpu + t comm         (7)                     can determine the actual computation and
      where tcomp is the computation time and tcomm is                communication times during the course of the code
      the communication time. The bandwidth being                     execution. However, as the emphasis is on
      increased by a factor of k, the effective processing            bandwidth scaling and overlapping techniques for
      time then computed would be                                     the legacy code, the profiling techniques shall not
                             t comm                                   be discussed here. The timing analysis of the
               t = t cpu +          ; where k is a positive           VARSHA code for these techniques is explained in
                                k                                     the subsequent sections.
             integer                                  (8)
      Here, as k is inversely proportional to the actual
      processing time t, any increase in k value shall tend
      to minimise the time t and increase the speedup as
      evident from the Table II.
       5.2 Overlapping      of     Communication        and
           Computation Times
         The efficiency of parallel computing is governed
      by single factor synergy between computation and
      communication. Basic problem remain the same



                                                                                                          1264 | P a g e
Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of
      Engineering Research and Applications (IJERA)       ISSN: 2248-9622 www.ijera.com
                        Vol. 2, Issue 4, July-August 2012, pp.1261-1268
         TABLE II SPEED UP IN CASE OF                     used.
    BANDWIDTH SCALED BY FACTOR OF 8
           CPU                      Actual
                  Communication
        Processing                Processing Speed
Boards                 Time
           Time                      Time     up
                      (msec)        (msec)
          (msec)
  1        3077         0.3         3077.3       1
  2        1541        22.6         1563.6      2.0
  4         774        44.0          818.0      3.8
  8         392        66.0          458.0      6.7
 16         201        87.5          288.5     10.7
    TABLE III SPEEDUP OF ORIGINAL TEST
        CODE IN TERMS OF BANDWIDTH
        Factor of
       Bandwidth          No. of Boards                   Fig. 2. Effect of Bandwidth scaling on speedup for
        Increase                                                            VARSHA model.
           (k)      1     2    4     8     16
            1       1.0   1.8       2.7   3.3   3.4

            2       1.0   1.9       3.2   4.7   5.6

            4       1.0   1.9       3.6   5.9   8.2

            8       1.0   2.0       3.8   6.7   10.7

           16       1.0   2.0       3.9   7.2   12.6

  6.   TIMING ANALYSIS OF VARSHA                                Fig. 3. Speedup in case of modified code.
  CODE                                                    Thus, the scaling of bandwidth is extremely
            The bandwidth requirements of the             significant in deciding the computational efficiency
  VARSHA code shall be assessed using the                 of the legacy code.
  communication and computation times. The timing          6.2 Effect of Bandwidth Scaling on Modified
  analysis using the techniques explained in section 5          Code
  is as described below:                                     For better parallelisation efficiency, the option of
   6.1 Effect of Bandwidth scaling on VARSHA              code rearrangement plays a vital role very often not
     Let us consider an example from TABLE I, for         considered seriously while handling legacy codes.
  an 8 processor system, if the bandwidth is              For example, in the present code for which data is
  increased by a factor of 8 (i.e. k = 8); CPU            presented in TABLE I, the operations of
  processing time will remain the same, i.e. 392msec      computation and communication are disjoint; but in
                                          528             the case if the code is rewritten or modified as the
  but communication time will be              = 66msec,   computation and communication may be made to
                                           8              overlap, one will have the following table of
  and the speedup will become 6.7. In fact, the           efficiency as shown in TABLE V. Then again in
  TABLE I will be modified for k = 8 in equation (8)      TABLE I, considering the case of 8 boards, the
  as shown in TABLE II. Similarly, for a 16               CPU processing time is 392 msec and
  processor system; CPU processing time is 201msec        communication time is 528msec, the overlapped
                                      700                 communication time is 392 msec, then the effective
  and communication time will be          = 87.5msec,     processing time will be 392+528 – 392 = 528 msec
                                       8
  then the speedup will become 10.7. Therefore, in                                     3077
                                                          and the efficiency is             = 5.8 as shown in
  an 8 processor system, the efficiency is increased                                    528
  from 3.3 to 6.7 and in a 16 processor system the        TABLE IV. The graph in Fig. 3 shows the trend in
  efficiency is increased from 3.4 to 10.7. The           the speedup for different numbers of processors.
  TABLE III gives the speedup values for different           If the bandwidth scaling is observed in the case
                                m
  bandwidth of the order k = 2 where 0 ≤ m ≤ 4. The       of modified code, the bandwidth being increased
  graph in Fig. 2 shows the change in speedup for         by a factor of k, the calculation of effective
  various bandwidths. It is observed that as the          processing time in equation (9) becomes,
  bandwidth scaling increases i.e. the communication                           tcomm
  time reduces, the speedup of the application code               t = tcpu +         - tov            (10)
  increases when more number of processors are
                                                                                 k


                                                                                                1265 | P a g e
Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of
                     Engineering Research and Applications (IJERA)       ISSN: 2248-9622 www.ijera.com
                                       Vol. 2, Issue 4, July-August 2012, pp.1261-1268
              The comparative figures for the different
              bandwidth scaling of application code and
              modified application codes are given in TABLE V.
                TABLE IV SPEEDUP OF MODIFIED CODE
                                 Communication
          CPU                                     Effective
                  Communication       Time
       Processing                                processing Speed
Boards                Time        (Overlapping
          Time                                      time     up
                     (msec)        accounted)
         (msec)                                    (msec)
                                     (msec)
  1       3077             2                 2            3077          1
  2       1541            181               181           1541          2.0        (b) Bandwidth inference of modified code
                                                                               Fig. 4. Comparative significance of Bandwidth in
  4        774            352               352           774           4.0
                                                                                        original and modified test codes
  8        392            528               392           528           5.8
  16       201            700               201           700           4.4   It will be interesting to have a comparison table for
                                                                              large number of processors and scaling of
                                                                              bandwidth as shown in TABLE VI, so that
                 TABLE V SIGNIFICANCE OF BANDWIDTH IN
                                                                              performance issues may be put into perspective.
                  ORIGINAL AND MODIFIED CODES (SMALL
                                                                              Fig. 5 shows the comparison graph point for
                            PARALLEL PROCESSING)
                                                                              various higher bandwidths scaling of the order up
                  Factor of                                                   to 512, for original and modified codes. It shows
                 Bandwidth        No. of boards
                                                   Remarks                    that as the number of processors increases, for the
                  Increase    1   2    4     8  16                            bandwidth scaling by a considerable scale (say half
                     (k)                                                      the number of processors), the performance
                                1.0   1.8   2.7   2.7   3.4      Old          speedup in the case of modified code is almost
                      1
                                1.0   2.0   4.0   5.8   4.4     Modified      double.
                                1.0   1.9   3.2   3.2   5.6       Old
                      2
                                1.0   2.0   4.0   7.8   8.8     Modified      7.       CONCLUSION
                                1.0   1.9   3.6   3.6   8.2       Old                   The VARSHA code has been assessed for
                      4                                                       its bandwidth requirements for both its existing and
                                1.0   2.0   4.0   7.8   15.3    Modified
                                                                              modified forms. It is noted that, if the overall
                                1.0   2.0   3.8   3.8   10.7      Old         processing time (TABLE I) has to be decreased
                      8
                                1.0   2.0   4.0   7.8   15.3    Modified      from 3079 msec to around 800 msec i.e. processing
                                1.0   2.0   3.9   3.9   12.6      Old         time be reduced roughly by a factor of 4, it is not
                     16                                                       enough to increase the number of boards from 1 to
                                1.0   2.0   4.0   7.8   15.3    Modified
                                                                              16 which is far more than 4, actually required for
                                                                              expected reduction. Instead if the number of
                 Fig. 4 shows the comparison graph for various                boards are only increased from 1 to 4 and the
                 bandwidth scaling for original and modified codes.           bandwidth increased from 1 to 8 (TABLE II), the
                 It is observed that the performance speedup in the           timing requirements shall be very well met. This
                 case of modified code is large for sizeable scaling          clearly brings out the fact that by increasing
                 of bandwidth.                                                number of boards or CPUs or cores is insufficient
                                                                              for reducing processing time, it needs to be backed
                                                                              up by the increase in communication bandwidth.
                                                                                        The assessment suggests that a present day
                                                                              parallel processing centre needs to have network
                                                                              hardware which supports bandwidth on demand
                                                                              keeping overall resources intact.             It is
                                                                              understandable that at a given point of time all the
                                                                              tasks will not require peak bandwidth. Hence it is
                                                                              meaningful to consider that the resources can be
                                                                              combined. Currently such hardware do not exist,
                                                                              suggesting that there is a need to develop such class
                      (a) Bandwidth inference of original code                of hardware if these centres are not identified with
                                                                              specific application programs. This idea shall give
                                                                              rise to a completely new paradigm of bandwidth on
                                                                              demand in parallel computing.




                                                                                                                  1266 | P a g e
Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of
         Engineering Research and Applications (IJERA)       ISSN: 2248-9622 www.ijera.com
                           Vol. 2, Issue 4, July-August 2012, pp.1261-1268
                                                               NAL, Head, Flosolver Unit and all the scientists at
                                                               Flosolver for their unwavering support in carrying
     TABLE VI SIGNIFICANCE OF BANDWIDTH                        out this work.
        IN ORIGINAL AND MODIFIED CODES
       (MODERATE PARALLEL PROCESSING)                          REFERENCES
 Factor of                                                     [1] Mark Delderfield, Lee Kitching, Gareth Smith,
                     No. of boards
Bandwidth                               Remarks                    David Hoyle and Iain Buchan, Shared
Increase(k) 32    64    128     256 512                            Genomics: Accessible High Performance
              3.1    2.7     2.4     2.1     1.9       Old         Computing for Genomic Medical Research, in
   1                                                               Proc. of the 2008 Fourth IEEE International
              3.4    2.8     2.4     2.1     1.9    Modified
                                                                   Conference on eScience (IEEE Computer
              20.2   26.6   29.9    30.0    28.5       Old         Society), Washington DC, USA, 2008, 404-405.
   16
              32.1   45.6   39.1    34.0    30.2    Modified   [2] Hausheer F H, Numerical simulation, parallel
              24.8   37.6   48.6    53.7    54.1       Old         clusters, and the design of novel pharmaceutical
   32                                                              agents for cancer treatment, in Robert Werner
              32.1   64.1   78.1    68.0    60.5    Modified
                                                                   (Ed.) Proc. of the 1992 ACM/IEEE conference
               28    47.4   70.4    89.0    97.8       Old
   64                                                              on Supercomputing, (IEEE Computer Society
              32.1   64.1   128.2   136.0   120.9   Modified       Press, Los Alamitos, CA, USA) 1992, 636-637.
              29.9   54.5   90.9    132.0   164.3      Old     [3] Fermi E, Pasta J R, Tsingou M and Ulam S,
   128                                                             Studies of nonlinear problems I, Technical
              32.1   64.1   128.2   256.4   241.7   Modified
                                                                   Report LA1940 (Los Alamos Scientific
              30.9   58.9   106.4   174.3   248.9      Old
   256                                                             Laboratory, Los Alamos, NM, USA) 1955.
              32.1   64.1   128.2   256.4   483.8   Modified   [4] Jon Squire S and Sandra Palais M,
                                                                   Programming and design considerations of a
                                                                   highly parallel computer, in Proc. of the spring
                                                                   joint computer conference, ACM, New York,
                                                                   USA, 1963, 395-400.
                                                               [5] Koczela L J and Wang G Y, The Design of a
                                                                   Highly Parallel Computer Organization, IEEE
                                                                   Trans. Computers, 18, 1969 520529.
                                                               [6] Akira Kasahara, Computer Simulations of the
                                                                   Global Circulation of the Earths Atmosphere, in
                                                                   S Fernbach and A Taub (Eds.), Computer and
                                                                   their role in the physical sciences, Chapter 23,
                                                                   (Gordon and Breach Science Publishers, New
         (a) Bandwidth inference of original code                  York) 1970, 571594.
                                                               [7] Herman Goldstine H, The Computer: from
                                                                   Pascal to Von Neumann, 2nd edn., (Princeton
                                                                   University Press, New Jersey) 1973.
                                                               [8] Charles Eames and Ray Eames, A Computer
                                                                   Perspective, Glen Fleck (Ed.), (Harvard
                                                                   University Press, Cambridge, Massachusetts)
                                                                   1973.
                                                               [9] David J Kuck, The structure of Computers and
                                                                   Computation, Vol 1, (John Wiley & Sons Inc.,
                                                                   New York) 1978
                                                               [10] Presper Eckart Jr J, The ENIAC, in A History
                                                                   of Computing in the Twentieth Century, N
        (b) Bandwidth inference of modified code                   Metropolis, J Howlett and Gian Carlo Rotta
        Fig. 5. Comparative significance of higher                 (Eds.), (Academic Press, New York) 1980 525-
       Bandwidth in original and modified test codes               540.
                                                               [11]      John Mauchly W, The ENIAC, in A
                                                                   History of Computing in the Twentieth Century,
                                                                   N Metropolis, J Howlett and Gian Carlo Rotta
  8.         ACKNOWLEDGEMENTS
                                                                   (Eds.),, (Academic Press, New York) 1980 541
          One of the authors Mohammed Mahfooz
                                                                   550.
  Sheikh would like to acknowledge CSIR for the
                                                               [12] Barry Smith F, Petter Bjorstad and William
  support in the form of fellowship as CSIR-SRF
                                                                   Gropp, Domain Decomposition: Parallel
  vide CSIR award No. 31/3(38)/2009-EMR-I.
                                                                   Multilevel Methods for Elliptic Partial
  Authors are extremely thankful to the Director,
                                                                   Differential Equations, (Cambridge University


                                                                                                  1267 | P a g e
Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of
    Engineering Research and Applications (IJERA)       ISSN: 2248-9622 www.ijera.com
                      Vol. 2, Issue 4, July-August 2012, pp.1261-1268
   Press) 1996.
[13] Shioda, S and Mase, K, A new approach to
   bandwidth requirement estimation and its
   accuracy verification, IEEE Intl. Conf. on
   Communications 2004, Vol 4, 2004, 1953-
   1957.
[14]     Yanping Zhao, Eager, D.L. ; Vernon,
   M.K. , Network Bandwidth Requirements for
   Scalable OnDemand Streaming, IEEE/ACM
   Transactions on Networking, 15(4), 2007, 878-
   891.
[15]     T N Venkatesh, V R Sarasamma, S
   Rajalakshmy, Kirti Chandra Sahu, Rama
   Govindarajan, Superlinear speedup of a parallel
   multigrid Navier Stokes solver on Flosolver,
   Current Science, 88(4), 2005, 589-593.
[16] U. N. Sinha, V. R. Sarasamma, S.
   Rajalakshmy, K. R. Subramanian, P. V. R.
   Bharadwaj,        C.      S.     Chandrashekar,
   T.N.Venkatesh,        R.Sunder,         B.K.Basu,
   Sulochana Gadgil and A. Raju, Monsoon
   Forecasting on Parallel Computers, Current
   Science, 67(3), 1994, 178-184.
[17] Holton, J. R., and HsiuChi Tan, The influence
   of the equatorial quasibiennial oscillation on the
   global circulation at 50 mb. J. Atmos. Sci., 37,
   1980, 2200-2208.
[18] Holton, J. R. and H.C. Tan, The quasibiennial
   oscillation in the Northern Hemisphere lower
   stratosphere. J. Meteor. Soc. Japan, 60, 1982,
   140-148.




                                                                          1268 | P a g e

Contenu connexe

Tendances

OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...IJASCSE
 
SPLITTING ALGORITHM BASED RELAY NODE SELEC-TION IN WIRELESS NETWORKS
SPLITTING ALGORITHM BASED RELAY NODE SELEC-TION IN WIRELESS NETWORKSSPLITTING ALGORITHM BASED RELAY NODE SELEC-TION IN WIRELESS NETWORKS
SPLITTING ALGORITHM BASED RELAY NODE SELEC-TION IN WIRELESS NETWORKSChristo Ananth
 
1 s2.0-s2090447917300291-main
1 s2.0-s2090447917300291-main1 s2.0-s2090447917300291-main
1 s2.0-s2090447917300291-mainVIT-AP University
 
Performance analysis for power-splitting energy harvesting based two-way full...
Performance analysis for power-splitting energy harvesting based two-way full...Performance analysis for power-splitting energy harvesting based two-way full...
Performance analysis for power-splitting energy harvesting based two-way full...TELKOMNIKA JOURNAL
 
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor NetworksThe Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networksgraphhoc
 
NS2 IEEE projects 2014
NS2 IEEE projects 2014NS2 IEEE projects 2014
NS2 IEEE projects 2014Senthilvel S
 
Ijaems apr-2016-22TDMA- MAC Protocol based Energy- Potency for Periodic Sensi...
Ijaems apr-2016-22TDMA- MAC Protocol based Energy- Potency for Periodic Sensi...Ijaems apr-2016-22TDMA- MAC Protocol based Energy- Potency for Periodic Sensi...
Ijaems apr-2016-22TDMA- MAC Protocol based Energy- Potency for Periodic Sensi...INFOGAIN PUBLICATION
 
A novel delay dictionary design for compressive sensing-based time varying ch...
A novel delay dictionary design for compressive sensing-based time varying ch...A novel delay dictionary design for compressive sensing-based time varying ch...
A novel delay dictionary design for compressive sensing-based time varying ch...TELKOMNIKA JOURNAL
 
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...Piero Belforte
 
A New Approach to Linear Estimation Problem in Multiuser Massive MIMO Systems
A New Approach to Linear Estimation Problem in Multiuser Massive MIMO SystemsA New Approach to Linear Estimation Problem in Multiuser Massive MIMO Systems
A New Approach to Linear Estimation Problem in Multiuser Massive MIMO SystemsRadita Apriana
 
MuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical RoutingMuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical RoutingM H
 
Performance Analysis of Differential Beamforming in Decentralized Networks
Performance Analysis of Differential Beamforming in Decentralized NetworksPerformance Analysis of Differential Beamforming in Decentralized Networks
Performance Analysis of Differential Beamforming in Decentralized NetworksIJECEIAES
 
Cfd aero
Cfd aeroCfd aero
Cfd aerofdamec
 
Sc13 comex poster
Sc13 comex posterSc13 comex poster
Sc13 comex posterhjjvandam
 
Dimensioning of Multi-Class Over-Provisioned IP Networks
Dimensioning of Multi-Class Over-Provisioned IP NetworksDimensioning of Multi-Class Over-Provisioned IP Networks
Dimensioning of Multi-Class Over-Provisioned IP NetworksEM Legacy
 
age acknowledging reliable multiplier
age acknowledging reliable multiplierage acknowledging reliable multiplier
age acknowledging reliable multiplierIJAEMSJORNAL
 
Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...Arthur Weglein
 

Tendances (18)

OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
 
Ijebea14 272
Ijebea14 272Ijebea14 272
Ijebea14 272
 
SPLITTING ALGORITHM BASED RELAY NODE SELEC-TION IN WIRELESS NETWORKS
SPLITTING ALGORITHM BASED RELAY NODE SELEC-TION IN WIRELESS NETWORKSSPLITTING ALGORITHM BASED RELAY NODE SELEC-TION IN WIRELESS NETWORKS
SPLITTING ALGORITHM BASED RELAY NODE SELEC-TION IN WIRELESS NETWORKS
 
1 s2.0-s2090447917300291-main
1 s2.0-s2090447917300291-main1 s2.0-s2090447917300291-main
1 s2.0-s2090447917300291-main
 
Performance analysis for power-splitting energy harvesting based two-way full...
Performance analysis for power-splitting energy harvesting based two-way full...Performance analysis for power-splitting energy harvesting based two-way full...
Performance analysis for power-splitting energy harvesting based two-way full...
 
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor NetworksThe Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
The Neighborhood Broadcast Problem in Wireless Ad Hoc Sensor Networks
 
NS2 IEEE projects 2014
NS2 IEEE projects 2014NS2 IEEE projects 2014
NS2 IEEE projects 2014
 
Ijaems apr-2016-22TDMA- MAC Protocol based Energy- Potency for Periodic Sensi...
Ijaems apr-2016-22TDMA- MAC Protocol based Energy- Potency for Periodic Sensi...Ijaems apr-2016-22TDMA- MAC Protocol based Energy- Potency for Periodic Sensi...
Ijaems apr-2016-22TDMA- MAC Protocol based Energy- Potency for Periodic Sensi...
 
A novel delay dictionary design for compressive sensing-based time varying ch...
A novel delay dictionary design for compressive sensing-based time varying ch...A novel delay dictionary design for compressive sensing-based time varying ch...
A novel delay dictionary design for compressive sensing-based time varying ch...
 
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
 
A New Approach to Linear Estimation Problem in Multiuser Massive MIMO Systems
A New Approach to Linear Estimation Problem in Multiuser Massive MIMO SystemsA New Approach to Linear Estimation Problem in Multiuser Massive MIMO Systems
A New Approach to Linear Estimation Problem in Multiuser Massive MIMO Systems
 
MuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical RoutingMuMHR: Multi-path, Multi-hop Hierarchical Routing
MuMHR: Multi-path, Multi-hop Hierarchical Routing
 
Performance Analysis of Differential Beamforming in Decentralized Networks
Performance Analysis of Differential Beamforming in Decentralized NetworksPerformance Analysis of Differential Beamforming in Decentralized Networks
Performance Analysis of Differential Beamforming in Decentralized Networks
 
Cfd aero
Cfd aeroCfd aero
Cfd aero
 
Sc13 comex poster
Sc13 comex posterSc13 comex poster
Sc13 comex poster
 
Dimensioning of Multi-Class Over-Provisioned IP Networks
Dimensioning of Multi-Class Over-Provisioned IP NetworksDimensioning of Multi-Class Over-Provisioned IP Networks
Dimensioning of Multi-Class Over-Provisioned IP Networks
 
age acknowledging reliable multiplier
age acknowledging reliable multiplierage acknowledging reliable multiplier
age acknowledging reliable multiplier
 
Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...
 

En vedette (20)

Oh2423312334
Oh2423312334Oh2423312334
Oh2423312334
 
Ic2414251429
Ic2414251429Ic2414251429
Ic2414251429
 
Ks2518741883
Ks2518741883Ks2518741883
Ks2518741883
 
Gt2412081212
Gt2412081212Gt2412081212
Gt2412081212
 
Jq2517021708
Jq2517021708Jq2517021708
Jq2517021708
 
Hl2413221328
Hl2413221328Hl2413221328
Hl2413221328
 
Hi2413031309
Hi2413031309Hi2413031309
Hi2413031309
 
Gy2412371242
Gy2412371242Gy2412371242
Gy2412371242
 
Jy2417091713
Jy2417091713Jy2417091713
Jy2417091713
 
Jm2516821688
Jm2516821688Jm2516821688
Jm2516821688
 
Gb2511071110
Gb2511071110Gb2511071110
Gb2511071110
 
Hv2413821386
Hv2413821386Hv2413821386
Hv2413821386
 
Hs2413641369
Hs2413641369Hs2413641369
Hs2413641369
 
Az26337342
Az26337342Az26337342
Az26337342
 
Ex319981004
Ex319981004Ex319981004
Ex319981004
 
Ev31986991
Ev31986991Ev31986991
Ev31986991
 
Restaurant
RestaurantRestaurant
Restaurant
 
WHAT ARE THEY LIKE?
WHAT  ARE  THEY  LIKE?WHAT  ARE  THEY  LIKE?
WHAT ARE THEY LIKE?
 
Herramientas para potenciar tu email #poweremail
Herramientas para potenciar tu email #poweremailHerramientas para potenciar tu email #poweremail
Herramientas para potenciar tu email #poweremail
 
Dia del idioma
Dia del idiomaDia del idioma
Dia del idioma
 

Similaire à BANDWIDTH ASSESSMENT

Modern Tools for the Small-Signal Stability Analysis and Design of FACTS Assi...
Modern Tools for the Small-Signal Stability Analysis and Design of FACTS Assi...Modern Tools for the Small-Signal Stability Analysis and Design of FACTS Assi...
Modern Tools for the Small-Signal Stability Analysis and Design of FACTS Assi...Power System Operation
 
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...Eswar Publications
 
IRJET- An Efficient Cross-Layer Cooperative Diversity Optimization Scheme Tog...
IRJET- An Efficient Cross-Layer Cooperative Diversity Optimization Scheme Tog...IRJET- An Efficient Cross-Layer Cooperative Diversity Optimization Scheme Tog...
IRJET- An Efficient Cross-Layer Cooperative Diversity Optimization Scheme Tog...IRJET Journal
 
RIVERBED-BASED NETWORK MODELING FOR MULTI-BEAM CONCURRENT TRANSMISSIONS
RIVERBED-BASED NETWORK MODELING FOR MULTI-BEAM CONCURRENT TRANSMISSIONSRIVERBED-BASED NETWORK MODELING FOR MULTI-BEAM CONCURRENT TRANSMISSIONS
RIVERBED-BASED NETWORK MODELING FOR MULTI-BEAM CONCURRENT TRANSMISSIONSijwmn
 
Dual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained NetworksDual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained Networksambitlick
 
Distributed Utility-Based Energy Efficient Cooperative Medium Access Control...
Distributed Utility-Based Energy Efficient Cooperative Medium  Access Control...Distributed Utility-Based Energy Efficient Cooperative Medium  Access Control...
Distributed Utility-Based Energy Efficient Cooperative Medium Access Control...IJMER
 
cis97003
cis97003cis97003
cis97003perfj
 
HIGH-LEVEL LANGUAGE EXTENSIONS FOR FAST EXECUTION OF PIPELINE-PARALLELIZED CO...
HIGH-LEVEL LANGUAGE EXTENSIONS FOR FAST EXECUTION OF PIPELINE-PARALLELIZED CO...HIGH-LEVEL LANGUAGE EXTENSIONS FOR FAST EXECUTION OF PIPELINE-PARALLELIZED CO...
HIGH-LEVEL LANGUAGE EXTENSIONS FOR FAST EXECUTION OF PIPELINE-PARALLELIZED CO...ijpla
 
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Waqas Tariq
 
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Waqas Tariq
 
M-EPAR to Improve the Quality of the MANETs
M-EPAR to Improve the Quality of the MANETsM-EPAR to Improve the Quality of the MANETs
M-EPAR to Improve the Quality of the MANETsIJERA Editor
 
IRJET-Mitigation of Harmonic Power Flow in Unbalanced and Polluted Radial Dis...
IRJET-Mitigation of Harmonic Power Flow in Unbalanced and Polluted Radial Dis...IRJET-Mitigation of Harmonic Power Flow in Unbalanced and Polluted Radial Dis...
IRJET-Mitigation of Harmonic Power Flow in Unbalanced and Polluted Radial Dis...IRJET Journal
 
Effect of spikes integrated to airfoil at supersonic speed
Effect of spikes integrated to airfoil at supersonic speedEffect of spikes integrated to airfoil at supersonic speed
Effect of spikes integrated to airfoil at supersonic speedeSAT Journals
 
Effect of spikes integrated to airfoil at supersonic
Effect of spikes integrated to airfoil at supersonicEffect of spikes integrated to airfoil at supersonic
Effect of spikes integrated to airfoil at supersoniceSAT Publishing House
 
Application Aware Topology Generation for Surface Wave Networks-on-Chip
Application Aware Topology Generation for Surface Wave Networks-on-ChipApplication Aware Topology Generation for Surface Wave Networks-on-Chip
Application Aware Topology Generation for Surface Wave Networks-on-Chipzhao fu
 
PARALLEL BIJECTIVE PROCESSING OF REGULAR NANO SYSTOLIC GRIDS VIA CARBON FIELD...
PARALLEL BIJECTIVE PROCESSING OF REGULAR NANO SYSTOLIC GRIDS VIA CARBON FIELD...PARALLEL BIJECTIVE PROCESSING OF REGULAR NANO SYSTOLIC GRIDS VIA CARBON FIELD...
PARALLEL BIJECTIVE PROCESSING OF REGULAR NANO SYSTOLIC GRIDS VIA CARBON FIELD...ijcsit
 

Similaire à BANDWIDTH ASSESSMENT (20)

Modern Tools for the Small-Signal Stability Analysis and Design of FACTS Assi...
Modern Tools for the Small-Signal Stability Analysis and Design of FACTS Assi...Modern Tools for the Small-Signal Stability Analysis and Design of FACTS Assi...
Modern Tools for the Small-Signal Stability Analysis and Design of FACTS Assi...
 
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
 
IRJET- An Efficient Cross-Layer Cooperative Diversity Optimization Scheme Tog...
IRJET- An Efficient Cross-Layer Cooperative Diversity Optimization Scheme Tog...IRJET- An Efficient Cross-Layer Cooperative Diversity Optimization Scheme Tog...
IRJET- An Efficient Cross-Layer Cooperative Diversity Optimization Scheme Tog...
 
ACES_Journal_February_2012_Paper_07
ACES_Journal_February_2012_Paper_07ACES_Journal_February_2012_Paper_07
ACES_Journal_February_2012_Paper_07
 
cug2011-praveen
cug2011-praveencug2011-praveen
cug2011-praveen
 
RIVERBED-BASED NETWORK MODELING FOR MULTI-BEAM CONCURRENT TRANSMISSIONS
RIVERBED-BASED NETWORK MODELING FOR MULTI-BEAM CONCURRENT TRANSMISSIONSRIVERBED-BASED NETWORK MODELING FOR MULTI-BEAM CONCURRENT TRANSMISSIONS
RIVERBED-BASED NETWORK MODELING FOR MULTI-BEAM CONCURRENT TRANSMISSIONS
 
Dual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained NetworksDual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained Networks
 
Coca1
Coca1Coca1
Coca1
 
Distributed Utility-Based Energy Efficient Cooperative Medium Access Control...
Distributed Utility-Based Energy Efficient Cooperative Medium  Access Control...Distributed Utility-Based Energy Efficient Cooperative Medium  Access Control...
Distributed Utility-Based Energy Efficient Cooperative Medium Access Control...
 
cis97003
cis97003cis97003
cis97003
 
HIGH-LEVEL LANGUAGE EXTENSIONS FOR FAST EXECUTION OF PIPELINE-PARALLELIZED CO...
HIGH-LEVEL LANGUAGE EXTENSIONS FOR FAST EXECUTION OF PIPELINE-PARALLELIZED CO...HIGH-LEVEL LANGUAGE EXTENSIONS FOR FAST EXECUTION OF PIPELINE-PARALLELIZED CO...
HIGH-LEVEL LANGUAGE EXTENSIONS FOR FAST EXECUTION OF PIPELINE-PARALLELIZED CO...
 
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
 
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
 
M-EPAR to Improve the Quality of the MANETs
M-EPAR to Improve the Quality of the MANETsM-EPAR to Improve the Quality of the MANETs
M-EPAR to Improve the Quality of the MANETs
 
IRJET-Mitigation of Harmonic Power Flow in Unbalanced and Polluted Radial Dis...
IRJET-Mitigation of Harmonic Power Flow in Unbalanced and Polluted Radial Dis...IRJET-Mitigation of Harmonic Power Flow in Unbalanced and Polluted Radial Dis...
IRJET-Mitigation of Harmonic Power Flow in Unbalanced and Polluted Radial Dis...
 
53
5353
53
 
Effect of spikes integrated to airfoil at supersonic speed
Effect of spikes integrated to airfoil at supersonic speedEffect of spikes integrated to airfoil at supersonic speed
Effect of spikes integrated to airfoil at supersonic speed
 
Effect of spikes integrated to airfoil at supersonic
Effect of spikes integrated to airfoil at supersonicEffect of spikes integrated to airfoil at supersonic
Effect of spikes integrated to airfoil at supersonic
 
Application Aware Topology Generation for Surface Wave Networks-on-Chip
Application Aware Topology Generation for Surface Wave Networks-on-ChipApplication Aware Topology Generation for Surface Wave Networks-on-Chip
Application Aware Topology Generation for Surface Wave Networks-on-Chip
 
PARALLEL BIJECTIVE PROCESSING OF REGULAR NANO SYSTOLIC GRIDS VIA CARBON FIELD...
PARALLEL BIJECTIVE PROCESSING OF REGULAR NANO SYSTOLIC GRIDS VIA CARBON FIELD...PARALLEL BIJECTIVE PROCESSING OF REGULAR NANO SYSTOLIC GRIDS VIA CARBON FIELD...
PARALLEL BIJECTIVE PROCESSING OF REGULAR NANO SYSTOLIC GRIDS VIA CARBON FIELD...
 

Dernier

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

BANDWIDTH ASSESSMENT

  • 1. Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.1261-1268 The Assessment of Bandwidth Requirements for Meteorological Code VARSHA on a Parallel Computing System Mohammed Mahfooz Sheikh*, A M Khan*‡, T Venkatesh** and U N Sinha† *(Dept. of Electronics, Mangalore University, Mangalore, Karnataka 574199.) ** (Dept. of Computer Science and Engineering, Ghousia College of Engineering, Ramanagaram) † (Distinguished Scientist, CSIR NAL/C-MMACS, Bangalore, Karnataka 560017) ABSTRACT Complex scientific problems like is, therefore, not surprising that requirement of weather forecasting, computational fluid and large scale computations has led to development of combustion dynamics, computational drug parallel machines with history dating back to 1960s design etc. essentially require large scale [4], [5]. The story of developments of the computational resources in order to obtain computers in use till early 70s is well documented solution to the equations governing them. These and vividly presented in the references [6–11]. solutions are obtained by developing large Parallel machines are generally built by the legacy codes and then executing them using interconnection of more number of processors and parallel processing systems as they require their architectures purely depend upon the large scale computations. The parallel complexity of the tasks which demands the type of processing computers generally demand huge coupling required. The parallel processing tasks are bandwidth as they consist of large number of divided among various Processing Elements (PEs) networked processing elements. One such that execute the jobs in parallel. It is implicitly legacy code VARSHA is a meteorological code assumed here that the task is agreeable with parallel used for weather forecasting developed at processing architecture and the communication Flosolver, CSIR-NAL under the joint project mechanism is in place so that PEs may work on the from NMITLI (New Millennium Indian Tech- subtasks of the main task. Yet they would complete nological Leadership Initiative) and MoES the main task as if the process is carried out on a (Ministry of Earth Science). The VARSHA is single virtual sequential computing machine. being run on a large scale computing platform, Communication paradigm appears at a cross road at the Flosolver Mk8, the latest of the Flosolver this point. It is a fact, that the field equations series of parallel computers. This paper occurring in science when appropriately formulated discusses the bandwidth utilisation of VARSHA very well requires distributed parallel processing. A code on Ethernet based interconnect, in its simple example will illustrate the view point. The existing and modified forms in order to access solution of the potential equation which is the bandwidth requirements of such legacy formulated through Greens function is not naturally codes on parallel computing systems. amenable to parallel processing whereas when formulated by finite difference discretisation leads Keywords - Meteorological code, Bandwidth naturally to domain decomposition technique scaling, Parallel Computing, VARSHA model, which is highly amenable to parallel processing Flosolver [12]. The PEs in the parallel machines are thus required to cooperate to solve a particular task 1. INTRODUCTION needing interconnection scheme for Parallel Processing has become an communicating with each other. Such environment inevitable tool for solving complex scientific offers faster solution to complex problems than problems that involve large scale computations. feasible using sequential machines. Moreover Without large scale computational resources sequential machines may not be able to solve the genome sequencing could not have been possible problem in reasonable amount of time. The [1]. New drug development routinely uses large interconnection network required for the PEs to scale computing [2]. Many new discoveries have communicate forms the most important part of a been result of large scale computations. For parallel computer next to Central Processing Units example, solitary waves were found by Ulam and (CPUs). his colleagues using large scale computing [3]; space missions demand massive computing for re- entry trajectories of space vehicles and numerical precision exceeding 20 digits are quite common. It 1261 | P a g e
  • 2. Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.1261-1268 2. ESSENCE OF COMMUNICATION governing equations for the VARSHA model are TECHNOLOGY IN PARALLEL discussed below. PROCESSING The model is based on the conversation laws for Communication component being a critical part in mass, momentum, energy and moisture. The building a parallel computer, the bandwidth momentum equations are replaced by vorticity and estimation of a parallel processing system [13] is divergence equations so that the spectral techniques always a prerequisite and essentially the bandwidth can be applied in the horizontal direction. The requirement of the application is required to be vertical coordinate is   p where p is the layer well within the system bandwidth specifications p [14]. Many real time applications like pressure and p  the surface pressure. Finite meteorological computing, DNS computing, panel techniques for aircraft wing load calculation and differences are used to approximate the differential many other problems of this class essentially operators in the vertical directions. The governing require parallel architectures for their solutions. equations are cast in the form of evolution The demonstration of super linear speed up of equations for the model variables viz. Navier Stokes calculation which is something like a temperature(T), surface pressure ( p  ), specific milestone in judging effectiveness of parallel humidity(q), divergence(D) and vorticity(η). The computing [15] is indeed an important initiative. model prognostic equations are given below: Large legacy codes that demand global coupling  H essentially require high speed communications. The thermodynamic equation is ln   However it is possible to increase the speed of t C pT computations by using more number of PEs where where H is the heating rate per unit mass and θ is more jobs can be executed in parallel but the potential temperature. This is rewritten to give appropriate communication mechanisms are to be the following equation for temperature used depending on the intensity of communication T    H  T  demanded by the application. One such parallelised  V  T  kT  V     ln p      version of legacy code, VARSHA [16] operational t  t  Cp     at Flosolver lab in NAL is presented in this paper (1) along with the assessment for its bandwidth R where T = π θ;   p ; k  ; and  is the k requirements. Although VARSHA requires large communication bandwidth, here cluster based Cp architecture is used for its assessment due to ease horizontal gradient in the system. of its availability. As the case study is based on The continuity equation is VARSHA model, it is briefly described in section      3. ln p  V   ln p    V   0 (2) t  3. VARSHA MODEL On integrating this equation with the boundary This VARSHA model is a global/general conditions  (0) =  (1) = 0 we get the equation   circulation model of atmosphere that solves a set of for surface pressure,   nonlinear partial differential equations to predict 1     the future state of the atmosphere from a given 0  t   ln p       V  V   ln p  initial state. Since the domain of atmospheric flow is bounded at the bottom by the surface of the (3) earth, exchange of properties take place at this dq surface and it is necessary to prescribe appropriate The conversation law for moisture is = S, boundary conditions or values for various dt where q is the specific humidity and S represents quantities. The bottom topography plays an the sources and sinks. This equation is written as important plays an important role in controlling the q  q airflow not only close to the ground but also at  V  q   S (4) upper levels through induced vertical motion and t  momentum transfer by gravity waves. Present day The equations for divergence and vorticity are atmospheric models have moisture as one of the obtained from the momentum equation by taking variables and take into account diabatic processes the dot and cross products respectively. The like evaporation and condensation. All physical equation for divergence is processes involving moisture and others like radiation, turbulence, gravity wave drag, land D 1  B A      cos     E    RT0 ln p   surface processes etc. are parameterized in terms of 2 t a cos   2  variables in the VARSHA model. Detailed  discussion can be found in [17], [18] and the details (5) of parallelisation are available in [16]. The 1262 | P a g e
  • 3. Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.1261-1268 and the equation for vorticity is function and such computations can be done in  1  A B  parallel. Parts of the nonlinear terms are computed t  a cos   2    cos     (6) in the Fourier space, while the rest along with the  physical parameterization computations are carried    out on the latitude/longitude grid. The natural flow where D =   V ; η = f   k ;  k  k   V of the model code is to compute the Fourier  RT    V coefficients for each latitude, complete calculation A  U    cos  ln p     F cos  in the Fourier space and then move to the grid  a   t   space through the discrete Fast Fourier Transform  RT    U (FFT). After completion of computations in the B  V    ln p     F cos grid space, mostly related to physical  a     parameterizations which work independently at each grid point along a vertical column, U  u cos ;V  v cos atmospheric variables are again moved to Fourier   space through FFT. Finally, at the end of each time V V E ; Φ = geopotential height; f = Coriolis step of integration, full spectral coefficients are 2 again formed by summing up contributions from all parameter; the latitudes. Except for this last operation all a = radius of the earth; φ = latitude; λ = longitude; computations involve quantities specific to one Fφ = zonal component of the dissipative process; latitude only. Thus, a natural scheme of Fλ = meridional component of the dissipative parallelization considered was to distribute the processes. The VARSHA code has a horizontal latitudes to different processors. This essentially resolution of 80 waves in triangular truncation and implies a domain splitting in the Fourier space 18 atmospheric layers in the vertical. The Fourier only. The latitudes were divided among the space again has 80 waves in the EW on 128 quasi- processors and that each time step demands global equidistant Gaussian latitudes which exclude both communication in order to sum the full spectral the poles and the equator. In the physical grid space coefficients of each latitude. The parallel the Fourier waves are projected onto 256 simulation was done in Flosolver MK3 parallel equispaced longitude points on each Gaussian computer at NAL and [16] contains the data on its latitude. The number of longitude points has to be performance. The key point in this simulation was much larger than the azimuthal wave number in that a GCM model could be run on 4 processor order to resolve the shorter waves produced by Flosolver MK3 which was a remarkable feet and nonlinear interaction. The VARSHA forecast the efficiency issues were related to a second place model is fully diabatic and includes as platforms having large number of processors parameterisation of all known physical processes were not available. In 2010, the pictures have like radiation, convection, turbulent boundary layer changed, the numbers of processors available are processes, land air interaction, surface friction, etc. large and the issue is now that of parallel efficiency. Practically, the same VARSHA code 4. PARALLELISATION STRATEGY running on Flosolver MK8 (1024 Xeon processors OF VARSHA MODEL @ 2GHz and 4TB RAM) gives the efficiency Large application codes developed till late shown in the Fig. 1 which is dismally poor. 70’s or early 80’s abound; these codes are developed around sequential computer having von 5. BANDWIDTH ASSESSMENT FOR Neumann’s architecture. Extension of these codes LEGACY CODE VARSHA both in terms of scope and efficiency is a natural The initial portion and the completion requirement which can only occur through the only phase of the VARSHA model, the computational possible route i.e. parallel routes. Such codes are load is quite different and is representation of the commonly called as legacy code. Code VARSHA main computation task. Thus, there is a need to used for weather predictions mainly consists of profile only a specific segment of the code which is numerical computation of equations in the spectral compute intensive. Moreover, the complete domain and communication of data at each time profiling of VARSHA will generate huge amount step of forecast that demands higher bandwidth. of data. With all these computations in mind, the The model computations are done in three distinct present study utilises fourth time step of the spaces namely the two dimensional global spectral forecast for profiling. During each of the time space, the one dimensional Fourier space (spectral steps, the code computes, Fourier Transform of 256 in azimuthal direction) for each latitude and the latitudes of 512 points each for 18 atmospheric two dimensional latitude/longitude grid space. levels. It also computes Legendre Transform for Since the basic functions in the spectral expansion equal number of points and also computes the are orthogonal to each other, all linear operations in nonlinear part of dynamical calculations. These the spectral space involve only the selected basic computations in the fourth time step and their 1263 | P a g e
  • 4. Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.1261-1268 communication to other boards depending on the but the way it is partitioned determines the number of boards are profiled and presented in efficiency. It is extremely essential to keep a Table I. balance between the computation and communication times. It is easily seen that the rate of communication time to computation times has to be small for parallel computing to be attractive. But its implementation even in the case of VARSHA often needs massive rewriting of the code and the structure of such rewriting gets mapped by the experiments performed using the profiling tools. In practical execution of VARSHA code, one finds that the computation of each physical parameter at each grid point along the vertical coloumn takes place independently. Once the computation for a Fig. 1. Efficiency of VARSHA model for various single time step (i.e. computation of all parameters) number of processors. (using MPI communication is complete, the values are then communicated protocol, Ethernet 1GB rating) globally. Such scenarios where communication waits for completion of computation and vice TABLE I SPEEDUP OF ORIGINAL versa, have to be eliminated and a better method for APPLICATION CODE synergy between computation and communication CPU Actual need to be followed. At the end of computation of Processing Processing Speed each physical parameter, the communication can be Boards initiated at the background such that it does not Time Communication Time up (msec) Time (msec) (msec) hamper or delay the computation of next parameter. 1 3077 2 3079 1 Towards the end, when the computation of all the 2 1541 181 1722 1.8 parameters are complete and at the time when 4 774 352 1126 2.7 global communication would normally take place, the communication for each parameters has already 8 392 528 920 3.3 been completed. Very often the legacy codes are 16 201 700 901 3.4 parallelized in simple minded approach where the computation and communication are disjoint and The speedup trend is as shown in the previous can be overlapped. This overlapping will provide graph in Fig. 1. Also the bandwidth assessment dramatic reduction in overall execution time of the techniques for the speedup trend in Fig. 1 are code. Then effective processing time in equation explained as follows: (7) would be 5.1 Scaling of Bandwidth t = tcpu + tcomm - tov (9) It is evident from Table I that the speedup is not where tcpu is the CPU processing time, tcomm is the appreciable beyond 8 processors. But if the communication time and tov is the overlapped bandwidth can be improved, the communication communication time and is presented in the Table time will be reduced and there will be increase in IV. The effect of bandwidth scaling and the speedup for more number of processors. In other overlapping techniques were carried out using words, if t is the actual processing time then profiling tools built in house. The profiling tools t = t cpu + t comm (7) can determine the actual computation and where tcomp is the computation time and tcomm is communication times during the course of the code the communication time. The bandwidth being execution. However, as the emphasis is on increased by a factor of k, the effective processing bandwidth scaling and overlapping techniques for time then computed would be the legacy code, the profiling techniques shall not t comm be discussed here. The timing analysis of the t = t cpu + ; where k is a positive VARSHA code for these techniques is explained in k the subsequent sections. integer (8) Here, as k is inversely proportional to the actual processing time t, any increase in k value shall tend to minimise the time t and increase the speedup as evident from the Table II. 5.2 Overlapping of Communication and Computation Times The efficiency of parallel computing is governed by single factor synergy between computation and communication. Basic problem remain the same 1264 | P a g e
  • 5. Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.1261-1268 TABLE II SPEED UP IN CASE OF used. BANDWIDTH SCALED BY FACTOR OF 8 CPU Actual Communication Processing Processing Speed Boards Time Time Time up (msec) (msec) (msec) 1 3077 0.3 3077.3 1 2 1541 22.6 1563.6 2.0 4 774 44.0 818.0 3.8 8 392 66.0 458.0 6.7 16 201 87.5 288.5 10.7 TABLE III SPEEDUP OF ORIGINAL TEST CODE IN TERMS OF BANDWIDTH Factor of Bandwidth No. of Boards Fig. 2. Effect of Bandwidth scaling on speedup for Increase VARSHA model. (k) 1 2 4 8 16 1 1.0 1.8 2.7 3.3 3.4 2 1.0 1.9 3.2 4.7 5.6 4 1.0 1.9 3.6 5.9 8.2 8 1.0 2.0 3.8 6.7 10.7 16 1.0 2.0 3.9 7.2 12.6 6. TIMING ANALYSIS OF VARSHA Fig. 3. Speedup in case of modified code. CODE Thus, the scaling of bandwidth is extremely The bandwidth requirements of the significant in deciding the computational efficiency VARSHA code shall be assessed using the of the legacy code. communication and computation times. The timing 6.2 Effect of Bandwidth Scaling on Modified analysis using the techniques explained in section 5 Code is as described below: For better parallelisation efficiency, the option of 6.1 Effect of Bandwidth scaling on VARSHA code rearrangement plays a vital role very often not Let us consider an example from TABLE I, for considered seriously while handling legacy codes. an 8 processor system, if the bandwidth is For example, in the present code for which data is increased by a factor of 8 (i.e. k = 8); CPU presented in TABLE I, the operations of processing time will remain the same, i.e. 392msec computation and communication are disjoint; but in 528 the case if the code is rewritten or modified as the but communication time will be = 66msec, computation and communication may be made to 8 overlap, one will have the following table of and the speedup will become 6.7. In fact, the efficiency as shown in TABLE V. Then again in TABLE I will be modified for k = 8 in equation (8) TABLE I, considering the case of 8 boards, the as shown in TABLE II. Similarly, for a 16 CPU processing time is 392 msec and processor system; CPU processing time is 201msec communication time is 528msec, the overlapped 700 communication time is 392 msec, then the effective and communication time will be = 87.5msec, processing time will be 392+528 – 392 = 528 msec 8 then the speedup will become 10.7. Therefore, in 3077 and the efficiency is = 5.8 as shown in an 8 processor system, the efficiency is increased 528 from 3.3 to 6.7 and in a 16 processor system the TABLE IV. The graph in Fig. 3 shows the trend in efficiency is increased from 3.4 to 10.7. The the speedup for different numbers of processors. TABLE III gives the speedup values for different If the bandwidth scaling is observed in the case m bandwidth of the order k = 2 where 0 ≤ m ≤ 4. The of modified code, the bandwidth being increased graph in Fig. 2 shows the change in speedup for by a factor of k, the calculation of effective various bandwidths. It is observed that as the processing time in equation (9) becomes, bandwidth scaling increases i.e. the communication tcomm time reduces, the speedup of the application code t = tcpu + - tov (10) increases when more number of processors are k 1265 | P a g e
  • 6. Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.1261-1268 The comparative figures for the different bandwidth scaling of application code and modified application codes are given in TABLE V. TABLE IV SPEEDUP OF MODIFIED CODE Communication CPU Effective Communication Time Processing processing Speed Boards Time (Overlapping Time time up (msec) accounted) (msec) (msec) (msec) 1 3077 2 2 3077 1 2 1541 181 181 1541 2.0 (b) Bandwidth inference of modified code Fig. 4. Comparative significance of Bandwidth in 4 774 352 352 774 4.0 original and modified test codes 8 392 528 392 528 5.8 16 201 700 201 700 4.4 It will be interesting to have a comparison table for large number of processors and scaling of bandwidth as shown in TABLE VI, so that TABLE V SIGNIFICANCE OF BANDWIDTH IN performance issues may be put into perspective. ORIGINAL AND MODIFIED CODES (SMALL Fig. 5 shows the comparison graph point for PARALLEL PROCESSING) various higher bandwidths scaling of the order up Factor of to 512, for original and modified codes. It shows Bandwidth No. of boards Remarks that as the number of processors increases, for the Increase 1 2 4 8 16 bandwidth scaling by a considerable scale (say half (k) the number of processors), the performance 1.0 1.8 2.7 2.7 3.4 Old speedup in the case of modified code is almost 1 1.0 2.0 4.0 5.8 4.4 Modified double. 1.0 1.9 3.2 3.2 5.6 Old 2 1.0 2.0 4.0 7.8 8.8 Modified 7. CONCLUSION 1.0 1.9 3.6 3.6 8.2 Old The VARSHA code has been assessed for 4 its bandwidth requirements for both its existing and 1.0 2.0 4.0 7.8 15.3 Modified modified forms. It is noted that, if the overall 1.0 2.0 3.8 3.8 10.7 Old processing time (TABLE I) has to be decreased 8 1.0 2.0 4.0 7.8 15.3 Modified from 3079 msec to around 800 msec i.e. processing 1.0 2.0 3.9 3.9 12.6 Old time be reduced roughly by a factor of 4, it is not 16 enough to increase the number of boards from 1 to 1.0 2.0 4.0 7.8 15.3 Modified 16 which is far more than 4, actually required for expected reduction. Instead if the number of Fig. 4 shows the comparison graph for various boards are only increased from 1 to 4 and the bandwidth scaling for original and modified codes. bandwidth increased from 1 to 8 (TABLE II), the It is observed that the performance speedup in the timing requirements shall be very well met. This case of modified code is large for sizeable scaling clearly brings out the fact that by increasing of bandwidth. number of boards or CPUs or cores is insufficient for reducing processing time, it needs to be backed up by the increase in communication bandwidth. The assessment suggests that a present day parallel processing centre needs to have network hardware which supports bandwidth on demand keeping overall resources intact. It is understandable that at a given point of time all the tasks will not require peak bandwidth. Hence it is meaningful to consider that the resources can be combined. Currently such hardware do not exist, suggesting that there is a need to develop such class (a) Bandwidth inference of original code of hardware if these centres are not identified with specific application programs. This idea shall give rise to a completely new paradigm of bandwidth on demand in parallel computing. 1266 | P a g e
  • 7. Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.1261-1268 NAL, Head, Flosolver Unit and all the scientists at Flosolver for their unwavering support in carrying TABLE VI SIGNIFICANCE OF BANDWIDTH out this work. IN ORIGINAL AND MODIFIED CODES (MODERATE PARALLEL PROCESSING) REFERENCES Factor of [1] Mark Delderfield, Lee Kitching, Gareth Smith, No. of boards Bandwidth Remarks David Hoyle and Iain Buchan, Shared Increase(k) 32 64 128 256 512 Genomics: Accessible High Performance 3.1 2.7 2.4 2.1 1.9 Old Computing for Genomic Medical Research, in 1 Proc. of the 2008 Fourth IEEE International 3.4 2.8 2.4 2.1 1.9 Modified Conference on eScience (IEEE Computer 20.2 26.6 29.9 30.0 28.5 Old Society), Washington DC, USA, 2008, 404-405. 16 32.1 45.6 39.1 34.0 30.2 Modified [2] Hausheer F H, Numerical simulation, parallel 24.8 37.6 48.6 53.7 54.1 Old clusters, and the design of novel pharmaceutical 32 agents for cancer treatment, in Robert Werner 32.1 64.1 78.1 68.0 60.5 Modified (Ed.) Proc. of the 1992 ACM/IEEE conference 28 47.4 70.4 89.0 97.8 Old 64 on Supercomputing, (IEEE Computer Society 32.1 64.1 128.2 136.0 120.9 Modified Press, Los Alamitos, CA, USA) 1992, 636-637. 29.9 54.5 90.9 132.0 164.3 Old [3] Fermi E, Pasta J R, Tsingou M and Ulam S, 128 Studies of nonlinear problems I, Technical 32.1 64.1 128.2 256.4 241.7 Modified Report LA1940 (Los Alamos Scientific 30.9 58.9 106.4 174.3 248.9 Old 256 Laboratory, Los Alamos, NM, USA) 1955. 32.1 64.1 128.2 256.4 483.8 Modified [4] Jon Squire S and Sandra Palais M, Programming and design considerations of a highly parallel computer, in Proc. of the spring joint computer conference, ACM, New York, USA, 1963, 395-400. [5] Koczela L J and Wang G Y, The Design of a Highly Parallel Computer Organization, IEEE Trans. Computers, 18, 1969 520529. [6] Akira Kasahara, Computer Simulations of the Global Circulation of the Earths Atmosphere, in S Fernbach and A Taub (Eds.), Computer and their role in the physical sciences, Chapter 23, (Gordon and Breach Science Publishers, New (a) Bandwidth inference of original code York) 1970, 571594. [7] Herman Goldstine H, The Computer: from Pascal to Von Neumann, 2nd edn., (Princeton University Press, New Jersey) 1973. [8] Charles Eames and Ray Eames, A Computer Perspective, Glen Fleck (Ed.), (Harvard University Press, Cambridge, Massachusetts) 1973. [9] David J Kuck, The structure of Computers and Computation, Vol 1, (John Wiley & Sons Inc., New York) 1978 [10] Presper Eckart Jr J, The ENIAC, in A History of Computing in the Twentieth Century, N (b) Bandwidth inference of modified code Metropolis, J Howlett and Gian Carlo Rotta Fig. 5. Comparative significance of higher (Eds.), (Academic Press, New York) 1980 525- Bandwidth in original and modified test codes 540. [11] John Mauchly W, The ENIAC, in A History of Computing in the Twentieth Century, N Metropolis, J Howlett and Gian Carlo Rotta 8. ACKNOWLEDGEMENTS (Eds.),, (Academic Press, New York) 1980 541 One of the authors Mohammed Mahfooz 550. Sheikh would like to acknowledge CSIR for the [12] Barry Smith F, Petter Bjorstad and William support in the form of fellowship as CSIR-SRF Gropp, Domain Decomposition: Parallel vide CSIR award No. 31/3(38)/2009-EMR-I. Multilevel Methods for Elliptic Partial Authors are extremely thankful to the Director, Differential Equations, (Cambridge University 1267 | P a g e
  • 8. Mohammed Mahfooz Sheikh, A M Khan, T Venkatesh, U N Sinha/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 4, July-August 2012, pp.1261-1268 Press) 1996. [13] Shioda, S and Mase, K, A new approach to bandwidth requirement estimation and its accuracy verification, IEEE Intl. Conf. on Communications 2004, Vol 4, 2004, 1953- 1957. [14] Yanping Zhao, Eager, D.L. ; Vernon, M.K. , Network Bandwidth Requirements for Scalable OnDemand Streaming, IEEE/ACM Transactions on Networking, 15(4), 2007, 878- 891. [15] T N Venkatesh, V R Sarasamma, S Rajalakshmy, Kirti Chandra Sahu, Rama Govindarajan, Superlinear speedup of a parallel multigrid Navier Stokes solver on Flosolver, Current Science, 88(4), 2005, 589-593. [16] U. N. Sinha, V. R. Sarasamma, S. Rajalakshmy, K. R. Subramanian, P. V. R. Bharadwaj, C. S. Chandrashekar, T.N.Venkatesh, R.Sunder, B.K.Basu, Sulochana Gadgil and A. Raju, Monsoon Forecasting on Parallel Computers, Current Science, 67(3), 1994, 178-184. [17] Holton, J. R., and HsiuChi Tan, The influence of the equatorial quasibiennial oscillation on the global circulation at 50 mb. J. Atmos. Sci., 37, 1980, 2200-2208. [18] Holton, J. R. and H.C. Tan, The quasibiennial oscillation in the Northern Hemisphere lower stratosphere. J. Meteor. Soc. Japan, 60, 1982, 140-148. 1268 | P a g e