SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and
                    Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                     Vol. 3, Issue 1, January -February 2013, pp.1150-1156
  Embedded Reconfigurable Architectures for Multidimensional
                        Transforms
                            Ayman Elnaggar, Mokhtar Aboelaze
       Department of Computer Science & Engineering, German University in Cairo, New Cairo City, Egypt
                 Department of Computer Science, York University, Toronto, Canada M3J 1P3


Abstract
          This paper presents a general approach          AB  CD  ( A  B)(C  D)                                (1)
for    generating higher order (longer size)
multidimensional (m-d) architectures from 2
                                                   m
                                                           ( A  B)  C  A  ( B  C)                          (2)
lower order (shorter sizes) architectures. The            If n  n1n2 , then
objective of our work is to derive a unified
framework and a design methodology that allows             An1  Bn2  Pn,n1 ( I n2  An1 ) Pn,n2 ( I n1  Bn2 ) (3)
direct mapping of the proposed algorithms into            If n  n1n2 n3 , then
embedded reconfigurable architectures such as
FPGAs. Our methodology is based on                         I n1  An2  I n3  Pn, n1n2 ( I n1n3  An2 ) Pn, n3 (4)
manipulating tensor product forms so that they            If 2n  n1n2 , then
can be mapped directly into modular parallel               Pn, 2  Pn, n1 Pn, n2                                (5)
architectures. The resulting circuits have very
simple modular structure and regular topology.                    Where  denotes the tensor product, I n is
                                                          the identity matrix of size n, and Pn, s , the
Keywords       – Reconfigurable Architectures,
Recursive algorithms, multidimensional transforms,        permutation matrix, is n  n binary matrix whose
tensor products, permutation matrices.                    entries are zeroes and ones, such that each row or
                                                          column of has a single 1 entry. If n  rs then Pn, s
I.INTRODUCTION                                                                                               n
         This paper proposes an efficient and cost-       is an n  n binary matrix specifying an              -shuffle
effective general methodology for mapping                                                                    s
multidimensional     transforms       onto  efficient     (or s-stride) permutation. The effect of the
reconfigurable architectures such as FPGAs. The           permutation matrix Pn, s on an input vector X n of
main objective of this paper is to derive a design
methodology and recursive formulation for the             length n is to shuffle the elements of X n by
multidimensional transforms which is useful for the       grouping all the r elements separated by distance s
true modularization and parallelization of the            together. The first r element will be
resulting computation.                                     x0 , x s , x 2 s ,  , x( r 1) s , the next r elements are
         Our methodology employs tensor product
(or Kronecker products) decompositions and                x1 , x1 s , x1 2 s ,  , x1( r 1) s , and so on.
permutation matrices as the main tools for
expressing      the    general      framework     for              The main result reported in this paper
multidimensional DSP transforms. We employ                shows that a large two-dimensional (2-d)
several     techniques    to     manipulate     such      computation for a given DSP transform on an n  n
decompositions into suitable recursive expressions        input array can be decomposed recursively into three
which can be mapped efficiently onto reconfigurable       stages as shown in Fig. 1 for the case n  4 . The
FPGAs structures.                                         middle stage is constructed recursively from 22
Our work is based on a non-trivial generalization of      parallel (data-independent) blocks each realizing a
the one-dimensional DSP transforms. It has been           smaller-size computation of the same DSP
shown that when coupled with stride permutation           transform. The pre-additions and the post-
matrices, tensor products provide a unifying              permutations stages serve as "glue" circuits that
framework for describing and programming a wide           combine the 22 lower order blocks to construct the
range of fast recursive algorithms for various            higher order architecture. We also show that the
transform. This unifying framework is suited for          proposed unified approach can be extended such that
parallel processing machines and vector processing        an m-d DSP transform can be constructed from 2m
machines [6], [10].                                       smaller size m-d ones. The objective of our work is
Some of the tensor product properties that will be        to derive a unified framework and a design
used throughout this paper are [6], [10]:                 methodology that allows direct mapping of the



                                                                                                   1150 | P a g e
Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and
                  Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                   Vol. 3, Issue 1, January -February 2013, pp.1150-1156
proposed algorithms into reconfigurable FPGAs
                                                                       1 0
                                                                                                   1 0 0
                                                                   A  1 1  ,
architectures.
                                                                                            B  1 1  1
         Observe that, we have drawn our networks
                                                                       0 1                       0 0 1
such that data flows from right to left. We chose this                     
convention to show the direct correspondence                       In this case j  3 (three parallel blocks of the
between the derived algorithms and the proposed                    convolution of smaller size n / 2) ). The realization
reconfigurable architecture.                                       of the 1-d linear convolution is shown in Fig. 1.
I. A GENERAL FRAMEWORK FOR 1-D RECURSIVE
DSP TRANSFORMS
         In this section, we present a general
framework to derive recursive formulations for
multidimensional transforms. Given a 1-d DSP
algorithm in a matrix-vector form
  Ym  Tm, n X n                           (6)

           Where, Tm, n is the transform matrix, X n
and Ym are the input vector of size n and the
                                                                   Fig. 1. The realization of the 1-d linear convolution
output vector of size m , respectively. Then, using
sparse matrix factorization approach [9], the matrix               B.          The 1-d DCT
Tm, n can be factorized so that
Tm, n  T1 T2 Tk                                            (7)   The 1-d DCT Tn of size n can be written as [3], [7]


         Where, each of the matrices T1 , T2 ,, Tk                Tn  Rn ( I 3  Tn / 2 ) Qn .                                   (10)
is sparse. Sparseness implies that either most of the                Where
elements of the matrix are zeros or the matrix in the              Rn  Pn, n / 2 ( I n / 2  Ln / 2 )   ,
block diagonal form. By applying tensor product
property (3) to the block diagonal matrices of                     Qn  ( I 2 V 1 ) ( I n / 2  Cn / 2 ) ( F2  I n / 2 ) Vn .
                                                                                 n/2
equation (7), we have
                                                                              1 
                                                                   C n  diag         ,
Tm, n  ( Ri ) ( I j  Tm / 2 , n / 2 ) (Qk )                (8)              2 cosn 
                                                                           
                                                                   n      (4M  1), M  0, 1, , n  1,
          Where, Q k and Ri are the pre- and post-                     2n
processing glue structure that combine j blocks in                      1 1 0 0  0 0
parallel of the lower-order transform of size                           0 1 1 0  0 0
Tm / 2 , n / 2 .                                                                                       
                                                                        0 0 1 1  0 0
                                                                   Lm                                  ,
A. The 1-d Linear Convolution                                              0   
          The 1-d linear convolution matrix C (n) of                    0 0 0 0  1 0
                                                                                                       
            
size n  2 , where  is an integer can be written                       0 0 0 0  0 1
                                                                                                       
as [5], [6]                                                         Vn  ( I n / 2  J n / 2 ) Pn,2 ,
C (n)  Rn ( I 3 C (n / 2) ) Qn .                          (9)         1 1 
                                                                   F2       ,
                                                                        1  1
  Where,
                                                                    I n / 2 is the identity matrix of dimension n / 2 , 
Qn ( P  1   2 ) 1[( I  1  A)P   1 ].
       2    3,2   3         2          2 ,2                        is the direct sum operator, J n / 2 is the exchange
Rn  R  k ( P                 B) P 
               3( 2 1),3 2 1       3( 2 1),( 2 1)
                         (I                             )
      2
                                                                   matrix of order n / 2 defined as




                                                                                                                1151 | P a g e
Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and
                  Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                   Vol. 3, Issue 1, January -February 2013, pp.1150-1156
         0             0             0   0                         third stage) will be replaced by the single
         0             0             1   0
                                                                      permutation P8,2 as shown in Fig. 3 (b). Similarly,
                                                                    equation (13) can be simplified to
Jn / 2                                ,                                k 1
                                                                      Wn     Pn , 2 ( I 2k 1  W2 )                   (15)
         0             1             0   0                                  i 0
         1
                       0             0   0
                                                                                  Thus, Wn can be computed by the
                                                                      cascaded product of k similar stages (independent of
In this case j  2 (two parallel blocks of the DCT of                 i) of double matrix products instead of the triple
                                                                      matrix products in equation (8). Alternatively, we
smaller size n / 2) ). The realization of the 1-d DCT
                                                                      can realize (15) by a single block of
is shown in Fig. 2.
                                                                       Pn , 2 ( I 2k 1  W2 ) and take the output after k
                                                                      iterations that allows a hardware saving without
                                                                      slowing down the processing speed and reduction in
                                                                      the hardware size as shown in Fig. 4 for the case
                                                                       n 8.
                                                                                It should be mentioned that we have applied
                                                                      property (5) to reduce the shuffling inherited in the
                                                                      original WHT algorithm to allow a uniform
                                                                      hardware blocks as shown in Fig. 3 (b). We haven’t
                                                                      modified the original complexity of the WHT that
                                                                      are centered in the W2 blocks as shown in Fig. 3
                                                                      and Fig. 4.
Fig. 2. The realization of the 1-d DCT
                                                                      Applying property (1), equation (12) can be
C. The 1-d WHT
                                                                      modified to
         Our last example is the 1-d WHT. The
original 1-d WHT transform matrix is defined as [1],                   Wn  W2  Wn / 2  I 2 W2  Wn / 2 I n / 2
[2]                                                                         ( I 2  Wn / 2 ) (W2  I n / 2 )     (16)
      W       Wn / 2          1 1                                        ( I 2  Wn / 2 ) Qn
Wn   n / 2
      W                , W2  
                               1  1 ,
                                              (11)
       n / 2  Wn / 2              
                                                                      Where, Qn  (W2  I n / 2 )                          (17)
Where, W2 is the 2-point WHT. Let k  log 2 n , we
                                                                                Equation (16) represents the two-stage
can write equation (11) in the iterative tensor-                      recursive tensor product formulation of the 1-d WHT
product form
                                                                      (in this case j  2 ) in which the first stage is the pre-
  Wn  W2  Wn / 2  W2  W2    W2
                                                                      additions ( Qn ), followed by the second stage of the
       k 1                                                    (12)
       ( I i  W2  I k i 1 )
            2           2
                                                                      core computation I 2  Wn / 2  that consists of a
       i 0                                                           parallel blocks of two identical smaller WHT
which using property (4), can be modified to
                                                                      computations each of size n / 2 as shown in Fig. 5.
       k 1
Wn        Pn , 2i 1 ( I 2k 1  W2 ) Pn , 2k i 1         (13)
       i 0                                                           I. A GENERAL FRAMEWORK FOR 2-D RECURSIVE
As an example, we can express W8 as                                   DSP TRANSFORMS
                                                       
W8  P8 , 2 ( I 4  W2 ) P8 , 4 P8 , 4 ( I 4  W2 ) P8 , 2 .          For a 2-d input data, X n1 , n 2 , of size n1  n2 , and
      
     P8 , 8 ( I 4  W2 ) P8 ,1                                       a separable 2-d transform, Tn1 , n 2 , we can write the
                                                               (14)   output, Yn1 , n 2 , in the form
The realization of W8 is shown in Fig. 3 (a).
                                                                      Yn1 , n 2 = Yn1 , n 2 X n1 , n 2                     (18)
             Applying property (5) to equation (14)
and noting that now the permutations in two adjacent                  where, X n1 , n 2 and Yn1 , n 2 are the input and
stages can be grouped together into a single
                                                                      output column-scanned vectors, respectively. For
permutation, the adjacent permutations P8,2 P8,8
                                                                      separable matrices, the 2-d transform matrix
(from the first and the second stage) will be replaced                Tn1 , n 2 can be written in the tensor product form as
by the single permutation P8,2 and the adjacent
                                                                      [9]
permutations P8,4 P8,4 (from the second and the



                                                                                                             1152 | P a g e
Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and
                       Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                        Vol. 3, Issue 1, January -February 2013, pp.1150-1156
Tn1 , n 2  Tn1  Tn 2                        (19)




   Where Tn1 and T n 2 are the row and column 1-            Applying properties (1) to (4) to derive the
                                                   2-d recursive form
d transforms, respectively as defined in (8). By
replacing Tn1 and T n 2 by their corresponding                 ~                                    ~
                                                   Tn1, n2  ( Rn1, n2 ) ( I 2  Tn1 / 2, n2 / 2 ) (Qn1, n2 )   (20)
                                                                            j
values from equation (8) and




                                                                                                 1153 | P a g e
Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and
                  Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                   Vol. 3, Issue 1, January -February 2013, pp.1150-1156
       ~          ~                               Since the convolution matrix C(n/2) is of dimension
Where, Qn , n and Rn , n are the 2-d pre- and
         1 2        1 2                                             [(n  1)  n / 2 ] , we can write C(n2 / 2) as
post-processing glue structure, respectively that
                                                                    C(n2 / 2)  I n2 1 .C(n2 / 2) . I n2 / 2                   (28)
combine j 2 of the lower-order (smaller size) 2-d
                                                                    Substituting (21) in (20) and applying property (1),
transform          Tn1 / 2, n 2 / 2        of       dimension       ~
                                                                    C n1 , n 2  ( P9( n1 1),3( n1 1) ( I 9 C (n1 / 2)
 n1 / 2  n2 / 2 .                                                         P9( n1 / 2),3 )  ( I n 2 1.C (n 2 / 2). I n 2 / 2 )
A. The 2-d convolution                                               ( P9( n1 1),3( n1 1)  I n 2 1 )
        Let n1  21 and n2  2 2 . Pratt [9] has                      ( I 9 C (n1 / 2) C (n 2 / 2))
shown that for an n1 n2 input data image, the 2-d                      ( P9( n1 / 2),3  I n 2 / 2 ).
convolution output is given by                                                                                                  (29)
      q  Cn1, n2 f                          (21)                   Now, substituting (29) in (24) gives
                                                                               ~
                                                                               ~                          ~
                                                                                                          ~
Where, Cn1 ,n2 is the 2-d convolution transform                      Cn1 ,n2  R ( I 9  Cn1 / 2,n2 / 2 ) Q ,                   (30)
matrix; and q and f are the output and input column-
scanned vectors, respectively of size n  n1 n2 .                   Where, Cn1 / 2, n2 / 2  C(n1 / 2)  C(n2 / 2)        is       the
Pratt has also shown that, for separable transforms,                lower order 2-d convolution matrix for an
the matrix Cn1, n2 can be decomposed into the                        n1 / 2  n2 / 2                      input      image,
                                                                      ~
                                                                      ~
tensor form                                                          Q  ( P9( n1 / 2),3  I n2 / 2 ) (Q1  Q2 ) and
 Cn1,n2  C ( n1)C ( n2 )                                 (22)     ~
                                                                    ~
                                                                    R ( R1 R2 ) ( P9(n1 1),3(n1 1) I n 2 1 )                 are
Where, C (n1 ) and C (n 2 ) represent row and
column 1-d convolution operators on f, respectively,                the new 2-d pre- and post-additions, respectively.
as defined in (8) and (9). From (9) and (22), we can                Equation (30) represents the recursive 2-d
express the 2-d convolution matrix as a function of                 convolution algorithm. In this case we use 9 (
1-d convolutions as follows [4]                                      j 2  32  9 ) of the lower-order C                  n1 / 2 , n2 / 2
C n1 , n2  [ R1 ( I 3 C (n1 / 2)Q1 ][ R2 ( I 3 C (n2 / 2)Q2 ]
                                                                    convolution blocks in parallel to generate the higher
                                                           (23)     order Cn1 , n2 convolution as shown before in Fig. 6.
Applying property (1), leads to
 C n1 , n 2  [( R1  R2 )(( I 3 C (n1 / 2))
               (( I 3 C ( n 2 / 2))(Q1 Q2 )],             (24)
               ~~           ~
              R C n1 , n 2 Q.

Where,
     ~
     R ( R1  R2 ),
     ~
     C n1 , n 2  ( I 3 C (n1 / 2))( I 3 C (n2 / 2)) ,
     ~
     Q (Q1 Q2 )].                                                 B. The 2-d DCT
                                                        (25)                Since the DCT matrix is separable, the 2-d
                                ~
           Note that the matrix Cn1 ,n2      contains the 1-d       DCT for an image of dimension n1  n 2 can be

convolutions matrices C(n1 / 2) and C(n2 / 2) in an                 computed by a stage of n 2 parallel 1-d DCT
involved tensor product expression. By applying                     computations on n1 points each, followed by
property (2), we can write (24) as
  ~                                                                 another stage of n1 parallel 1-d DCT computations
  C n , n  (( I 3 C (n1 / 2) I 3 ) C (n2 / 2)) , (26)
     1 2
                                                                    on n 2 points each. This can be represented by the
Applying property (4), yields to
 ~                                                                  matrix-vector form
C n1 , n 2  ( P9(n1 1),3(n1 1)                                     X  Tn1 , n 2 x ,                          (31)
              ( I 9 C (n1 / 2) P9(n1 / 2),3 ) C (n2 / 2)) ,
                                                                    Where Tn1 , n 2 is the 2-d DCT transform matrix for
                                                       (27)
                                                                    an n1  n 2 image, X and x are the output and input
                                                                    column-scanned vectors, respectively.           By


                                                                                                                1154 | P a g e
Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and
                 Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                  Vol. 3, Issue 1, January -February 2013, pp.1150-1156
substituting (10) in (31), we have
 X  (Tn1  Tn 2 ) x .                                (32)         ˆ
                                                                                  m
                                                                                              ˆ
                                                             X  [ R (I       m    Wni / 2 ) Q] x ,             (36)
By further manipulation of equation (32) in a similar                     2       i1
way to that we did to (23) of the 2-d convolution,                  m
we can write (23) as [3]                                     Where  Wni / 2 is the lower order m-d WHT and
                                                                   i 1
     ~                                 ~
X  (Rn1 , n2 (I 4  Tn1 / 2, n2 / 2 ) Qn1, n2 ) x (33)           m 1          i     m      
Where,                                                       Q     Pu1 ,u 2  I u3     Qi  ,
                                                              ˆ
                                                                              k 1      i 1 
~                                                                 i 1                       
Qn1 , n2  ( P2n1 ,2  I n2 / 2 ) Qn1 , n2 , and                                                          (37)
                                                                 m 1         m 1    
~                                                            R    Pw1 , w2  I w3  ,
                                                              ˆ
Rn1 , n2  Rn1 , n2 ( P2n1 , n1  I n2 / 2 ) .                    i 1 
                                                                              l i    
                                                                                       
Equation (33) represents the truly recursive 2-d                    m i                         1 i
                 ~            ~                              u1  2  n j , u 2  2 , u 3   (n m  j 1 ),
DCT in which     Qn1 , n2 and Rn1 , n 2 are the pre-                 j 1                        2 j 1
and post-processing glue structures, respectively,                  1 i                i             1 i
                 2                                           w1        (n k ), w2   (n k ), w3       (n j ) ,
that combine 2 ( j 2  2 2  4 ) identical lower-                   2 k 1           k 1            2 j  i 1
order 2-d DCT modules each of size
 n1 / 2  n2 / 2 in parallel, to construct the higher        Qi is the 1-d pre-processing as defined by (17).
                                                             Equation (37) extends our results by showing that a
order 2-d DCT of size n1  n 2 .
                                                             large m-d WHT can be computed from a single
                                                             stage of smaller m-d WHTs.
II. A GENERAL FRAMEWORK FOR M-D
RECURSIVE DSP TRANSFORMS                                     III. CONCLUSIONS
         We can extend the steps in deriving                          In this paper, we presented a general
recursive formulae of the 1-d and the 2-d transforms         approach for decomposing higher order (longer size)
to the multidimensional case. For an m-d transform           multidimensional (m-d) architectures from 2
                                                                                                              m
Tni ,The general form will be                                lower order (shorter sizes) architectures. We have
  m                     m                                    shown several examples for the 1-d and 2-d
         ˆ                   ˆ
   Tn  R ( I m  Tni / 2 ) Q                        (34)   common transforms such as linear convolution,
 i 1 i       j i 1                                         DCT, and WHT. We have extended our results to
       ˆ     ˆ
Where, Q and R are the m-d pre- and post-
                                                             cover the m-d case as well. The objective of our
                                                             work was to derive a unified framework and a
processing glue structures that combine j m                  design methodology that allows direct mapping of
                                                             the proposed algorithms into reconfigurable
parallel blocks of the lower-order m-d transforms of
                                                             architectures. The resulting circuits have very
size Tni / 2 .                                               simple modular structure and regular topology that
                                                             can be mapped directly to FPGAs.
A. The m-d WHT
         We can extend the 2-d WHT derivation to             REFERENCES
the m-d case. From (12) and (34), the m-d WHT can
                                                               1.      E. Cetin, O. N. Gerek, and S. Ulukus,
be written in the tensor product form
                                                                       "Block Wavelet Transforms for Image
                                                                       Coding," IEEE Trans. on Circuits and
  X  (Wn1  Wn 2    Wn m ) x                                       systems for Video Technology, Vol. 3, pp.
         m                                            (35)             433-435, 1993.
       (  Wn i ) x                                           2.      Elnaggar, Mokhtar Aboelaze, “A Scalable
         i 1                                                          Formulation for 2-D WHT,” Proc. of the
                                                                       IEEE International Symposium on Circuits
Where, (Wn1  Wn2    Wnm ) is the m-d                               and Systems (ISCAS' 2003), pp IV484-
                                                                       IV487, Thailand, May 2003.
WHT transform matrix for an m-d input, Wn is the               3.      Elnaggar, H. M. Alnuweiri, "A New Multi-
                                                  i
1-d WHT coefficient matrix for an input vector of                      Dimensional Recursive Architecture for
                                                                       Computing       The     Discrete    Cosine
length n i as defined in (16), X and x are the output
                                                                       Transform," IEEE Transactions on Circuits
and input column-scanned vectors, respectively.                        and Systems for Video Technology, Vol.
Using properties (1) to (4), we can write (35) in the                  10, No. 1, pp. 113-119, February 2000.
form [2]



                                                                                                       1155 | P a g e
Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and
               Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                Vol. 3, Issue 1, January -February 2013, pp.1150-1156
4.    Elnaggar and M. Aboelaze, "An Efficient
      Architecture     for     Multi-Dimensional
      Convolution," IEEE Trans. on Circuits and
      Systems II, Vol. 47, No. 12, pp. 1520-1523,
      2000.
5.    Elnaggar and M. Aboelaze, “A Modified
      Shuffle Free Architecture for Linear
      Convolution,” IEEE Trans. on Circuits and
      Systems II, Vol. 48, No. 9, pp. 862-866,
      2001.
6.    J. Granata, M. Conner, R. Tolimieri, "A
      Tensor Product Factorization of the Linear
      Convolution Matrix", IEEE Trans on
      Circuits and Systems, Vol. 38, p. 1364--6,
      1991.
7.    H. S. Hou, "A Fast Recursive Algorithm
      for Computing the Discrete Cosine
      Transform," IEEE Trans. On ASSP, Vol.
      Assp-35, No. 10, 1987.
8.    K. R. Rao, P. Yip, "Discrete Cosine
      Transform: Algorithms, Advantages, and
      Applications," Academic Press, 1990.
9.    W. K. Pratt, Digital Image Processing,
      John Wiley & Sons, Inc., 1991.
10.   R. Tolimieri, M. An, C. Lu, Algorithms for
      Discrete     Fourier     Transform     and
      Convolution, Springer-Verlag, New York
      1989.




                                                                          1156 | P a g e

Contenu connexe

Tendances

FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABJournal For Research
 
Efficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureEfficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureIJMER
 
An Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video CodingAn Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video CodingCSCJournals
 
MEMS Extraction & Verification
MEMS Extraction & VerificationMEMS Extraction & Verification
MEMS Extraction & Verificationintellisense
 
MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...
MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...
MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...Dagmar Monett
 
A Novel Algorithm for Watermarking and Image Encryption
A Novel Algorithm for Watermarking and Image Encryption A Novel Algorithm for Watermarking and Image Encryption
A Novel Algorithm for Watermarking and Image Encryption cscpconf
 
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...IJERA Editor
 
Low complexity design of non binary ldpc decoder using extended min-sum algor...
Low complexity design of non binary ldpc decoder using extended min-sum algor...Low complexity design of non binary ldpc decoder using extended min-sum algor...
Low complexity design of non binary ldpc decoder using extended min-sum algor...eSAT Journals
 
論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」kurotaki_weblab
 
Analytical Delay Model for Distributed On-Chip RLCG Global Interconnects for ...
Analytical Delay Model for Distributed On-Chip RLCG Global Interconnects for ...Analytical Delay Model for Distributed On-Chip RLCG Global Interconnects for ...
Analytical Delay Model for Distributed On-Chip RLCG Global Interconnects for ...IDES Editor
 

Tendances (17)

H0545156
H0545156H0545156
H0545156
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
 
Hz2514321439
Hz2514321439Hz2514321439
Hz2514321439
 
Efficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureEfficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT Architecture
 
Lh2419001903
Lh2419001903Lh2419001903
Lh2419001903
 
20120140506008 2
20120140506008 220120140506008 2
20120140506008 2
 
An Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video CodingAn Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video Coding
 
Jc2415921599
Jc2415921599Jc2415921599
Jc2415921599
 
48
4848
48
 
MEMS Extraction & Verification
MEMS Extraction & VerificationMEMS Extraction & Verification
MEMS Extraction & Verification
 
MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...
MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...
MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...
 
A Novel Algorithm for Watermarking and Image Encryption
A Novel Algorithm for Watermarking and Image Encryption A Novel Algorithm for Watermarking and Image Encryption
A Novel Algorithm for Watermarking and Image Encryption
 
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
 
Low complexity design of non binary ldpc decoder using extended min-sum algor...
Low complexity design of non binary ldpc decoder using extended min-sum algor...Low complexity design of non binary ldpc decoder using extended min-sum algor...
Low complexity design of non binary ldpc decoder using extended min-sum algor...
 
論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」
 
Analytical Delay Model for Distributed On-Chip RLCG Global Interconnects for ...
Analytical Delay Model for Distributed On-Chip RLCG Global Interconnects for ...Analytical Delay Model for Distributed On-Chip RLCG Global Interconnects for ...
Analytical Delay Model for Distributed On-Chip RLCG Global Interconnects for ...
 

En vedette (6)

Notas plan hidrológico guadalquivir
Notas plan hidrológico guadalquivirNotas plan hidrológico guadalquivir
Notas plan hidrológico guadalquivir
 
Decisiones para la paz.
Decisiones para la paz.Decisiones para la paz.
Decisiones para la paz.
 
Trabalho de filosofia
Trabalho de filosofiaTrabalho de filosofia
Trabalho de filosofia
 
Oscommerce rogerespin
Oscommerce rogerespinOscommerce rogerespin
Oscommerce rogerespin
 
Power lectura miguel
Power lectura miguelPower lectura miguel
Power lectura miguel
 
Los huesos
Los huesosLos huesos
Los huesos
 

Similaire à Fx3111501156

Modelling Quantum Transport in Nanostructures
Modelling Quantum Transport in NanostructuresModelling Quantum Transport in Nanostructures
Modelling Quantum Transport in Nanostructuresiosrjce
 
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...ijistjournal
 
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...ijistjournal
 
Paper id 25201478
Paper id 25201478Paper id 25201478
Paper id 25201478IJRAT
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...ieijjournal
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...ieijjournal
 
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationInvestigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationIvan Kitov
 
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMSFINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMScsandit
 
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMSFINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMScsandit
 
5. CNTFET-based design of ternary.pdf
5. CNTFET-based design of ternary.pdf5. CNTFET-based design of ternary.pdf
5. CNTFET-based design of ternary.pdfRaghu Vamsi
 
Performance evaluation with a
Performance evaluation with aPerformance evaluation with a
Performance evaluation with aijmnct
 
Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic AlgorithmPerformance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic Algorithmrahulmonikasharma
 
QRC-ESPRIT Method for Wideband Signals
QRC-ESPRIT Method for Wideband SignalsQRC-ESPRIT Method for Wideband Signals
QRC-ESPRIT Method for Wideband SignalsIDES Editor
 
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...IRJET Journal
 

Similaire à Fx3111501156 (20)

E010632226
E010632226E010632226
E010632226
 
Modelling Quantum Transport in Nanostructures
Modelling Quantum Transport in NanostructuresModelling Quantum Transport in Nanostructures
Modelling Quantum Transport in Nanostructures
 
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...
 
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...
DESPECKLING OF SAR IMAGES BY OPTIMIZING AVERAGED POWER SPECTRAL VALUE IN CURV...
 
Paper id 25201478
Paper id 25201478Paper id 25201478
Paper id 25201478
 
40120140501004
4012014050100440120140501004
40120140501004
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
 
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
FITTED OPERATOR FINITE DIFFERENCE METHOD FOR SINGULARLY PERTURBED PARABOLIC C...
 
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationInvestigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlation
 
06075626 (1)
06075626 (1)06075626 (1)
06075626 (1)
 
06075626
0607562606075626
06075626
 
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMSFINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
 
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMSFINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMS
 
45
4545
45
 
5. CNTFET-based design of ternary.pdf
5. CNTFET-based design of ternary.pdf5. CNTFET-based design of ternary.pdf
5. CNTFET-based design of ternary.pdf
 
Performance evaluation with a
Performance evaluation with aPerformance evaluation with a
Performance evaluation with a
 
Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic AlgorithmPerformance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
 
BNL_Research_Report
BNL_Research_ReportBNL_Research_Report
BNL_Research_Report
 
QRC-ESPRIT Method for Wideband Signals
QRC-ESPRIT Method for Wideband SignalsQRC-ESPRIT Method for Wideband Signals
QRC-ESPRIT Method for Wideband Signals
 
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...
IRJET- An Efficient Reverse Converter for the Three Non-Coprime Moduli Set {4...
 

Fx3111501156

  • 1. Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1150-1156 Embedded Reconfigurable Architectures for Multidimensional Transforms Ayman Elnaggar, Mokhtar Aboelaze Department of Computer Science & Engineering, German University in Cairo, New Cairo City, Egypt Department of Computer Science, York University, Toronto, Canada M3J 1P3 Abstract This paper presents a general approach AB  CD  ( A  B)(C  D) (1) for generating higher order (longer size) multidimensional (m-d) architectures from 2 m ( A  B)  C  A  ( B  C) (2) lower order (shorter sizes) architectures. The If n  n1n2 , then objective of our work is to derive a unified framework and a design methodology that allows An1  Bn2  Pn,n1 ( I n2  An1 ) Pn,n2 ( I n1  Bn2 ) (3) direct mapping of the proposed algorithms into If n  n1n2 n3 , then embedded reconfigurable architectures such as FPGAs. Our methodology is based on I n1  An2  I n3  Pn, n1n2 ( I n1n3  An2 ) Pn, n3 (4) manipulating tensor product forms so that they If 2n  n1n2 , then can be mapped directly into modular parallel Pn, 2  Pn, n1 Pn, n2 (5) architectures. The resulting circuits have very simple modular structure and regular topology. Where  denotes the tensor product, I n is the identity matrix of size n, and Pn, s , the Keywords – Reconfigurable Architectures, Recursive algorithms, multidimensional transforms, permutation matrix, is n  n binary matrix whose tensor products, permutation matrices. entries are zeroes and ones, such that each row or column of has a single 1 entry. If n  rs then Pn, s I.INTRODUCTION n This paper proposes an efficient and cost- is an n  n binary matrix specifying an -shuffle effective general methodology for mapping s multidimensional transforms onto efficient (or s-stride) permutation. The effect of the reconfigurable architectures such as FPGAs. The permutation matrix Pn, s on an input vector X n of main objective of this paper is to derive a design methodology and recursive formulation for the length n is to shuffle the elements of X n by multidimensional transforms which is useful for the grouping all the r elements separated by distance s true modularization and parallelization of the together. The first r element will be resulting computation. x0 , x s , x 2 s ,  , x( r 1) s , the next r elements are Our methodology employs tensor product (or Kronecker products) decompositions and x1 , x1 s , x1 2 s ,  , x1( r 1) s , and so on. permutation matrices as the main tools for expressing the general framework for The main result reported in this paper multidimensional DSP transforms. We employ shows that a large two-dimensional (2-d) several techniques to manipulate such computation for a given DSP transform on an n  n decompositions into suitable recursive expressions input array can be decomposed recursively into three which can be mapped efficiently onto reconfigurable stages as shown in Fig. 1 for the case n  4 . The FPGAs structures. middle stage is constructed recursively from 22 Our work is based on a non-trivial generalization of parallel (data-independent) blocks each realizing a the one-dimensional DSP transforms. It has been smaller-size computation of the same DSP shown that when coupled with stride permutation transform. The pre-additions and the post- matrices, tensor products provide a unifying permutations stages serve as "glue" circuits that framework for describing and programming a wide combine the 22 lower order blocks to construct the range of fast recursive algorithms for various higher order architecture. We also show that the transform. This unifying framework is suited for proposed unified approach can be extended such that parallel processing machines and vector processing an m-d DSP transform can be constructed from 2m machines [6], [10]. smaller size m-d ones. The objective of our work is Some of the tensor product properties that will be to derive a unified framework and a design used throughout this paper are [6], [10]: methodology that allows direct mapping of the 1150 | P a g e
  • 2. Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1150-1156 proposed algorithms into reconfigurable FPGAs 1 0  1 0 0 A  1 1  , architectures.   B  1 1  1 Observe that, we have drawn our networks 0 1  0 0 1 such that data flows from right to left. We chose this   convention to show the direct correspondence In this case j  3 (three parallel blocks of the between the derived algorithms and the proposed convolution of smaller size n / 2) ). The realization reconfigurable architecture. of the 1-d linear convolution is shown in Fig. 1. I. A GENERAL FRAMEWORK FOR 1-D RECURSIVE DSP TRANSFORMS In this section, we present a general framework to derive recursive formulations for multidimensional transforms. Given a 1-d DSP algorithm in a matrix-vector form Ym  Tm, n X n (6) Where, Tm, n is the transform matrix, X n and Ym are the input vector of size n and the Fig. 1. The realization of the 1-d linear convolution output vector of size m , respectively. Then, using sparse matrix factorization approach [9], the matrix B. The 1-d DCT Tm, n can be factorized so that Tm, n  T1 T2 Tk (7) The 1-d DCT Tn of size n can be written as [3], [7] Where, each of the matrices T1 , T2 ,, Tk Tn  Rn ( I 3  Tn / 2 ) Qn . (10) is sparse. Sparseness implies that either most of the Where elements of the matrix are zeros or the matrix in the Rn  Pn, n / 2 ( I n / 2  Ln / 2 ) , block diagonal form. By applying tensor product property (3) to the block diagonal matrices of Qn  ( I 2 V 1 ) ( I n / 2  Cn / 2 ) ( F2  I n / 2 ) Vn . n/2 equation (7), we have  1  C n  diag , Tm, n  ( Ri ) ( I j  Tm / 2 , n / 2 ) (Qk ) (8)  2 cosn   n  (4M  1), M  0, 1, , n  1, Where, Q k and Ri are the pre- and post- 2n processing glue structure that combine j blocks in 1 1 0 0  0 0 parallel of the lower-order transform of size 0 1 1 0  0 0 Tm / 2 , n / 2 .   0 0 1 1  0 0 Lm    , A. The 1-d Linear Convolution    0    The 1-d linear convolution matrix C (n) of 0 0 0 0  1 0    size n  2 , where  is an integer can be written 0 0 0 0  0 1   as [5], [6] Vn  ( I n / 2  J n / 2 ) Pn,2 , C (n)  Rn ( I 3 C (n / 2) ) Qn . (9) 1 1  F2   , 1  1 Where, I n / 2 is the identity matrix of dimension n / 2 ,  Qn ( P  1   2 ) 1[( I  1  A)P   1 ]. 2 3,2 3 2 2 ,2 is the direct sum operator, J n / 2 is the exchange Rn  R  k ( P   B) P  3( 2 1),3 2 1 3( 2 1),( 2 1) (I ) 2 matrix of order n / 2 defined as 1151 | P a g e
  • 3. Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1150-1156 0 0  0 0 third stage) will be replaced by the single 0 0  1 0 permutation P8,2 as shown in Fig. 3 (b). Similarly,   equation (13) can be simplified to Jn / 2        , k 1   Wn   Pn , 2 ( I 2k 1  W2 ) (15) 0 1  0 0 i 0 1  0  0 0  Thus, Wn can be computed by the cascaded product of k similar stages (independent of In this case j  2 (two parallel blocks of the DCT of i) of double matrix products instead of the triple matrix products in equation (8). Alternatively, we smaller size n / 2) ). The realization of the 1-d DCT can realize (15) by a single block of is shown in Fig. 2. Pn , 2 ( I 2k 1  W2 ) and take the output after k iterations that allows a hardware saving without slowing down the processing speed and reduction in the hardware size as shown in Fig. 4 for the case n 8. It should be mentioned that we have applied property (5) to reduce the shuffling inherited in the original WHT algorithm to allow a uniform hardware blocks as shown in Fig. 3 (b). We haven’t modified the original complexity of the WHT that are centered in the W2 blocks as shown in Fig. 3 and Fig. 4. Fig. 2. The realization of the 1-d DCT Applying property (1), equation (12) can be C. The 1-d WHT modified to Our last example is the 1-d WHT. The original 1-d WHT transform matrix is defined as [1], Wn  W2  Wn / 2  I 2 W2  Wn / 2 I n / 2 [2]  ( I 2  Wn / 2 ) (W2  I n / 2 ) (16) W Wn / 2  1 1   ( I 2  Wn / 2 ) Qn Wn   n / 2 W  , W2    1  1 ,  (11)  n / 2  Wn / 2    Where, Qn  (W2  I n / 2 ) (17) Where, W2 is the 2-point WHT. Let k  log 2 n , we Equation (16) represents the two-stage can write equation (11) in the iterative tensor- recursive tensor product formulation of the 1-d WHT product form (in this case j  2 ) in which the first stage is the pre- Wn  W2  Wn / 2  W2  W2    W2 additions ( Qn ), followed by the second stage of the k 1 (12)   ( I i  W2  I k i 1 ) 2 2 core computation I 2  Wn / 2  that consists of a i 0 parallel blocks of two identical smaller WHT which using property (4), can be modified to computations each of size n / 2 as shown in Fig. 5. k 1 Wn   Pn , 2i 1 ( I 2k 1  W2 ) Pn , 2k i 1 (13) i 0 I. A GENERAL FRAMEWORK FOR 2-D RECURSIVE As an example, we can express W8 as DSP TRANSFORMS    W8  P8 , 2 ( I 4  W2 ) P8 , 4 P8 , 4 ( I 4  W2 ) P8 , 2 . For a 2-d input data, X n1 , n 2 , of size n1  n2 , and  P8 , 8 ( I 4  W2 ) P8 ,1  a separable 2-d transform, Tn1 , n 2 , we can write the (14) output, Yn1 , n 2 , in the form The realization of W8 is shown in Fig. 3 (a). Yn1 , n 2 = Yn1 , n 2 X n1 , n 2 (18) Applying property (5) to equation (14) and noting that now the permutations in two adjacent where, X n1 , n 2 and Yn1 , n 2 are the input and stages can be grouped together into a single output column-scanned vectors, respectively. For permutation, the adjacent permutations P8,2 P8,8 separable matrices, the 2-d transform matrix (from the first and the second stage) will be replaced Tn1 , n 2 can be written in the tensor product form as by the single permutation P8,2 and the adjacent [9] permutations P8,4 P8,4 (from the second and the 1152 | P a g e
  • 4. Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1150-1156 Tn1 , n 2  Tn1  Tn 2 (19) Where Tn1 and T n 2 are the row and column 1- Applying properties (1) to (4) to derive the 2-d recursive form d transforms, respectively as defined in (8). By replacing Tn1 and T n 2 by their corresponding ~ ~ Tn1, n2  ( Rn1, n2 ) ( I 2  Tn1 / 2, n2 / 2 ) (Qn1, n2 ) (20) j values from equation (8) and 1153 | P a g e
  • 5. Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1150-1156 ~ ~ Since the convolution matrix C(n/2) is of dimension Where, Qn , n and Rn , n are the 2-d pre- and 1 2 1 2 [(n  1)  n / 2 ] , we can write C(n2 / 2) as post-processing glue structure, respectively that C(n2 / 2)  I n2 1 .C(n2 / 2) . I n2 / 2 (28) combine j 2 of the lower-order (smaller size) 2-d Substituting (21) in (20) and applying property (1), transform Tn1 / 2, n 2 / 2 of dimension ~ C n1 , n 2  ( P9( n1 1),3( n1 1) ( I 9 C (n1 / 2) n1 / 2  n2 / 2 . P9( n1 / 2),3 )  ( I n 2 1.C (n 2 / 2). I n 2 / 2 ) A. The 2-d convolution ( P9( n1 1),3( n1 1)  I n 2 1 ) Let n1  21 and n2  2 2 . Pratt [9] has ( I 9 C (n1 / 2) C (n 2 / 2)) shown that for an n1 n2 input data image, the 2-d ( P9( n1 / 2),3  I n 2 / 2 ). convolution output is given by (29) q  Cn1, n2 f (21) Now, substituting (29) in (24) gives ~ ~ ~ ~ Where, Cn1 ,n2 is the 2-d convolution transform Cn1 ,n2  R ( I 9  Cn1 / 2,n2 / 2 ) Q , (30) matrix; and q and f are the output and input column- scanned vectors, respectively of size n  n1 n2 . Where, Cn1 / 2, n2 / 2  C(n1 / 2)  C(n2 / 2) is the Pratt has also shown that, for separable transforms, lower order 2-d convolution matrix for an the matrix Cn1, n2 can be decomposed into the n1 / 2  n2 / 2 input image, ~ ~ tensor form Q  ( P9( n1 / 2),3  I n2 / 2 ) (Q1  Q2 ) and Cn1,n2  C ( n1)C ( n2 ) (22) ~ ~ R ( R1 R2 ) ( P9(n1 1),3(n1 1) I n 2 1 ) are Where, C (n1 ) and C (n 2 ) represent row and column 1-d convolution operators on f, respectively, the new 2-d pre- and post-additions, respectively. as defined in (8) and (9). From (9) and (22), we can Equation (30) represents the recursive 2-d express the 2-d convolution matrix as a function of convolution algorithm. In this case we use 9 ( 1-d convolutions as follows [4] j 2  32  9 ) of the lower-order C n1 / 2 , n2 / 2 C n1 , n2  [ R1 ( I 3 C (n1 / 2)Q1 ][ R2 ( I 3 C (n2 / 2)Q2 ] convolution blocks in parallel to generate the higher (23) order Cn1 , n2 convolution as shown before in Fig. 6. Applying property (1), leads to C n1 , n 2  [( R1  R2 )(( I 3 C (n1 / 2)) (( I 3 C ( n 2 / 2))(Q1 Q2 )], (24) ~~ ~  R C n1 , n 2 Q. Where, ~ R ( R1  R2 ), ~ C n1 , n 2  ( I 3 C (n1 / 2))( I 3 C (n2 / 2)) , ~ Q (Q1 Q2 )]. B. The 2-d DCT (25) Since the DCT matrix is separable, the 2-d ~ Note that the matrix Cn1 ,n2 contains the 1-d DCT for an image of dimension n1  n 2 can be convolutions matrices C(n1 / 2) and C(n2 / 2) in an computed by a stage of n 2 parallel 1-d DCT involved tensor product expression. By applying computations on n1 points each, followed by property (2), we can write (24) as ~ another stage of n1 parallel 1-d DCT computations C n , n  (( I 3 C (n1 / 2) I 3 ) C (n2 / 2)) , (26) 1 2 on n 2 points each. This can be represented by the Applying property (4), yields to ~ matrix-vector form C n1 , n 2  ( P9(n1 1),3(n1 1) X  Tn1 , n 2 x , (31) ( I 9 C (n1 / 2) P9(n1 / 2),3 ) C (n2 / 2)) , Where Tn1 , n 2 is the 2-d DCT transform matrix for (27) an n1  n 2 image, X and x are the output and input column-scanned vectors, respectively. By 1154 | P a g e
  • 6. Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1150-1156 substituting (10) in (31), we have X  (Tn1  Tn 2 ) x . (32) ˆ m ˆ X  [ R (I m  Wni / 2 ) Q] x , (36) By further manipulation of equation (32) in a similar 2 i1 way to that we did to (23) of the 2-d convolution, m we can write (23) as [3] Where  Wni / 2 is the lower order m-d WHT and i 1 ~ ~ X  (Rn1 , n2 (I 4  Tn1 / 2, n2 / 2 ) Qn1, n2 ) x (33)  m 1 i  m  Where, Q     Pu1 ,u 2  I u3     Qi  , ˆ   k 1    i 1  ~  i 1    Qn1 , n2  ( P2n1 ,2  I n2 / 2 ) Qn1 , n2 , and (37) m 1 m 1  ~ R    Pw1 , w2  I w3  , ˆ Rn1 , n2  Rn1 , n2 ( P2n1 , n1  I n2 / 2 ) . i 1   l i   Equation (33) represents the truly recursive 2-d m i 1 i ~ ~ u1  2  n j , u 2  2 , u 3   (n m  j 1 ), DCT in which Qn1 , n2 and Rn1 , n 2 are the pre- j 1 2 j 1 and post-processing glue structures, respectively, 1 i i 1 i 2 w1   (n k ), w2   (n k ), w3   (n j ) , that combine 2 ( j 2  2 2  4 ) identical lower- 2 k 1 k 1 2 j  i 1 order 2-d DCT modules each of size n1 / 2  n2 / 2 in parallel, to construct the higher Qi is the 1-d pre-processing as defined by (17). Equation (37) extends our results by showing that a order 2-d DCT of size n1  n 2 . large m-d WHT can be computed from a single stage of smaller m-d WHTs. II. A GENERAL FRAMEWORK FOR M-D RECURSIVE DSP TRANSFORMS III. CONCLUSIONS We can extend the steps in deriving In this paper, we presented a general recursive formulae of the 1-d and the 2-d transforms approach for decomposing higher order (longer size) to the multidimensional case. For an m-d transform multidimensional (m-d) architectures from 2 m Tni ,The general form will be lower order (shorter sizes) architectures. We have m m shown several examples for the 1-d and 2-d ˆ ˆ  Tn  R ( I m  Tni / 2 ) Q (34) common transforms such as linear convolution, i 1 i j i 1 DCT, and WHT. We have extended our results to ˆ ˆ Where, Q and R are the m-d pre- and post- cover the m-d case as well. The objective of our work was to derive a unified framework and a processing glue structures that combine j m design methodology that allows direct mapping of the proposed algorithms into reconfigurable parallel blocks of the lower-order m-d transforms of architectures. The resulting circuits have very size Tni / 2 . simple modular structure and regular topology that can be mapped directly to FPGAs. A. The m-d WHT We can extend the 2-d WHT derivation to REFERENCES the m-d case. From (12) and (34), the m-d WHT can 1. E. Cetin, O. N. Gerek, and S. Ulukus, be written in the tensor product form "Block Wavelet Transforms for Image Coding," IEEE Trans. on Circuits and X  (Wn1  Wn 2    Wn m ) x systems for Video Technology, Vol. 3, pp. m (35) 433-435, 1993.  (  Wn i ) x 2. Elnaggar, Mokhtar Aboelaze, “A Scalable i 1 Formulation for 2-D WHT,” Proc. of the IEEE International Symposium on Circuits Where, (Wn1  Wn2    Wnm ) is the m-d and Systems (ISCAS' 2003), pp IV484- IV487, Thailand, May 2003. WHT transform matrix for an m-d input, Wn is the 3. Elnaggar, H. M. Alnuweiri, "A New Multi- i 1-d WHT coefficient matrix for an input vector of Dimensional Recursive Architecture for Computing The Discrete Cosine length n i as defined in (16), X and x are the output Transform," IEEE Transactions on Circuits and input column-scanned vectors, respectively. and Systems for Video Technology, Vol. Using properties (1) to (4), we can write (35) in the 10, No. 1, pp. 113-119, February 2000. form [2] 1155 | P a g e
  • 7. Ayman Elnaggar, Mokhtar Aboelaze / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1150-1156 4. Elnaggar and M. Aboelaze, "An Efficient Architecture for Multi-Dimensional Convolution," IEEE Trans. on Circuits and Systems II, Vol. 47, No. 12, pp. 1520-1523, 2000. 5. Elnaggar and M. Aboelaze, “A Modified Shuffle Free Architecture for Linear Convolution,” IEEE Trans. on Circuits and Systems II, Vol. 48, No. 9, pp. 862-866, 2001. 6. J. Granata, M. Conner, R. Tolimieri, "A Tensor Product Factorization of the Linear Convolution Matrix", IEEE Trans on Circuits and Systems, Vol. 38, p. 1364--6, 1991. 7. H. S. Hou, "A Fast Recursive Algorithm for Computing the Discrete Cosine Transform," IEEE Trans. On ASSP, Vol. Assp-35, No. 10, 1987. 8. K. R. Rao, P. Yip, "Discrete Cosine Transform: Algorithms, Advantages, and Applications," Academic Press, 1990. 9. W. K. Pratt, Digital Image Processing, John Wiley & Sons, Inc., 1991. 10. R. Tolimieri, M. An, C. Lu, Algorithms for Discrete Fourier Transform and Convolution, Springer-Verlag, New York 1989. 1156 | P a g e