SlideShare une entreprise Scribd logo
1  sur  39
Low Power Architecture for
     JPEG 2000

Dr. P. R. Panda                           Rahul Jain
Associate Professor                       2004JVL2433
IIT-Delhi                                 M.Tech (VDTT)
                                          IIT-Delhi
                  S. Krishnakumar
                  Cypress Semiconductor
                  Bangalore
Agenda
   JPEG2000 and 2-D DWT
   Memory Power Optimization
   Existing 2D-DWT Scan Based Architectures
   Proposed Architectures
       Low Power Z-Scan
       Low Power Block Scan
   Optimization and Pipelining Exploration for 2D-DWT
       Proposed DFG Optimization
       Pipeline Study
JPEG2000 Computation Blocks

   Pre-processing (Image Tiling)
   Discrete Wavelet Transform
   Quantization
   Tier-1 Coding (EBCOT)
   Tier-2 Coding (File Formatting and Packing)
Discrete Wavelet Transform
   2D wavelet transform:
       1st:1D wavelet transform to all rows
       2nd:1D wavelet transform to all columns
   Each Row/Column can be computed
    independently
                                            LL        HL




                          LL     HL              LL        HL
                                            LH        HH




        Image

                          LH    HH               LH        HH


                         1-Level DWT       2-Level DWT
Importance of Optimizing Memory System
Energy
   Many emerging media applications like
    JPEG2000 are data intensive
   For ASICs and embedded systems, memory
    system can contribute up to 90% energy
   Multiple memories exist in a SoC design
Optimization approaches
   Fixed memory access patterns
       Optimize memory architecture
   Fixed memory architecture
       Optimize memory access patterns
   Concurrently optimize Memory Architecture and
    Accesses
       Highest Potential
       Algorithm Level
           Reduce memory requirement
           Improve regularity of accesses
       Build optimized memory architecture
           Memory Partitioning
           Custom Circuits
       Option Explored in this Work
Memory Partitioning

   Partition the memory array into smaller banks
    so that only the addressed bank is activated
       improves speed and lowers power
       bit line capacitance reduced
       number of bit cells activated reduced
   At some point the delay and power overhead
    associated with the bank decoding circuit
    dominates (2 to 8 banks typical)
2D-DWT Architectures
   Direct
   Line Based
   Z-Scan
   Optimal Z-Scan (Ref:Optimal data transfer and buffering schemes
    for JPEG2000 encoder, Mu-Yu Chiu; Kun-Bin Lee; Chein-Wei Jen; Signal
    Processing Systems, 2003. SIPS 2003. IEEE Workshop on 27-29 Aug.
    2003 Page(s):177 – 182)
Direct DWT

   Straightforward Architecture
   First Read the Image Row wise computing
    Row-wise 1-D DWT
   Then Read the Image Column wise
    computing Column-wise 1-D DWT
   No On-Chip Buffer Required
   Reads + Writes to Off-Chip Memory =
    2MN+2MN (M =Image Tile ht, N = Image Tile wd)
Data Dependency in (9,7)DWT

   0   1   2   3   4   5   6   7   8     X(i)



       1       3       5       7       Y(2i+1)



   0       2       4       6       8     Y(2i)



       1       3       5       7         Z(2i+1)


   0       2       4       6       8     Z(2i)
Line-Based DWT
   Read pixels line by line
   Keep the min required number of lines in
    memory
   Row Operation gets full line data
   Column operation is activated as it gets
    Column data to reduce buffer
   On-Chip Buffer Required = 6*N
   Reads + Writes to Off-Chip Memory =
    MN+MN (M =Image Tile ht, N = Image Tile wd)
Z-Scan DWT
   Do a Z-Scan instead of Line by Line Scan
   Column Processing can start early
   On-Chip Buffer Required = 4*M
   Reads + Writes to Off-Chip Memory =
    MN+MN (M =Image Tile ht, N = Image Tile wd)
Optimal Z-Scan
    Considers the Code-Block size (CW*CH) required by
     Encoding Block in the next phase

• On-Chip Buffer Required
 = 4*M+4*2*CW
• Reads + Writes to
 Off-Chip Memory
= MN+MN
(M =Image Tile ht, N = Image Tile
wd)                               2* CH




                                          2* CW
Low Power Z-Scan
   Compute r elements in a row before starting
    with the next row
   For Z Scan r =1
   For Optimal Z-Scan r = 2*CW
                                               r   r
• On-Chip Buffer Required =
      4*M+4*2*CW
• Reads + Writes to Off-Chip
                                        2*CH
Memory = MN+MN
(M =Image Tile ht, N = Image Tile wd)
Low Power Z-Scan
   r will be a sub-integral multiple of 2*CW
       This considers the Code Block Size
   No of Wakeups to the Column Buffer Banks depend
    on r
       Large Value of r not desirable
   Between the resumption of a row computation and
    storing back of intermediate values after calculating
    r row elements the buffer can go into a Low Power
    state
       Large Value of r is desirable
   Access to the buffers
       Row Buffer = 2 per ‘r’ element computation
       Column Buffer = 1 per element computation
Low Power Block Scan
   Extend the concept of ‘r’ for column processing also
   Reduces the access to column buffer from 1 per
    element to 2/s per element
   To maintain the throughput introduce 2 Transpose
    Buffers (TB1 & TB2)                    r

   Transpose Buffer Accesses
                                   s     B1        B3
       Row Processor Writes
       Column Processor Reads
       i.e 2 access per element
   TB must be much smaller        s
                                         B2        B4
    than Column Buffer
Working: Low Power Block Scan
   2D-DWT computed in blocks of r*s
   Step 1: Row Processor (RP) computes 1D-DWT on B1
    and writes into TB1
   Step 2: Column Processor (CP) computes 1D-DWT on
    the data in TB1 (B1) and RP computes on B2 and
    writes into TB2
   Similarly RP and CP      RP:
                                  TB1
                                      CP:  RP:
                                                TB1
                                                    CP:
                             B1            B3       B2
    alternate between             TB2           TB2

    TB1 and TB2
                                      TB1                            TB1
                            RP:                 CP:        RP:                 CP:
                            B2                  B1         B4                  B3
                                      TB2                            TB2

                          B: Block, RP/CP: Row/Column Processor, TB: Transpose Buffer
Memory Power Analysis
   Memory can be in 3 modes
     Active (Read/Write being done) P (n)
                                      a
     Standby (No Access being done) P
                                        Standby(n)
     Sleep Mode (Data Retention Mode and Cannot Access) P (n)
                                                          Sleep
         To Access from this mode, first wakeup the memory
         Wakeup incurs energy penalty PWakeup(n)
         Let ‘T’ be the minimum clock cycles for the memory to be in sleep mode to
          get any power advantage
   To account for memory banking overhead, multiplexer power
    considered
     P (i,j) be the power for a i:1 multiplexer of bit width j
        Mux
   Assumption: on-chip memory access latency to fit into the clock
    period equal to 15ns
   Power values refer to average power dissipation per coefficient
    computation for the corresponding memory component
Row and Column Buffer Power
   With 4-Stage pipelined DWT,10 16-bit registers need to
    be stored/transferred incase of suspension/resumption
    of line computation
   Row Buffer
       Size = 160*M (M: Ht of Image Tile)
       ‘b’ banks, each having 160 column and M/b rows
       One b:1 Mux of 160 bits required
   Column Buffer
       Size = 160*2*CW (CW: EBCOT code block width, usually 128)
       ‘c’ banks, each having 160 column and 2*CW/c rows
       One c:1 Mux of 160 bits required
   Column Buffer Power analysis Similar to Row Buffer
    Power analysis
Row Buffer Power
   Accesses to Row Buffer
       2 per ‘r’ element ie 2/r per element computation
       Only one Bank active at a time, others in Sleep Mode
   Row Buffer Power is:
       Prow= [2*Pa(M/b)+Pmux(b,160)+(r-2)*Ps(M/b)]/r +
        Psleep(M/b)* (b-1)
       Ps = Psleep if (r-2) >= ‘T’ else Ps = Pstandby
   Due to sequential access to the Row Buffer each
    Bank is woken up Once
   Total Row Buffer Power
   PTotal_Row = Prow + [Pw(M/b) * b/(M*r) ]
Transpose Buffer Power
   2 buffers required of size r*s*16 bits partitioned into ‘d’ banks
    each
   Access and No of Wakeups
     RP: Sequential Order hence d wakeups for r*s elements

     CP: Sequential Order, but in jumps of r elements
          CP reads s elements from d banks
          Each bank has s/d elements
          If s-s/d > ‘T’, then put banks in Sleep mode and no of wakeups per
           element = d/s
   Power
     If (s-s/d >= T) P
                          Buffer = 2* Pa(r*s/d) + Mux Power + 2*(d-1) *
      Psleep(r*s/d)
      Else PBuffer =2* Pa(r*s/d) + Mux Power + (d-1) * Psleep(r*s/d)+ (d-1) *
      Pstandby(r*s/d)
     Mux Power = P
                        mux (d,16) ) + Pmux (2,16)
     Wakeup Power = P (r*s/d) * P
                                 w           Buffer_Wake
Memory Architecture
   Row and Column Buffers
       Used as Circular FIFOs
       Replace General Row Decoder with Custom Circuit for
        Addressing
       Similar observation for Transpose Buffer
   Custom Row Decoder                         Log (n) Bit
                                               Counter
                                                             Log (n)

                                                                       Row Decoder
                                                                                     n



       Counter and a Decoder
       Circular Shift Register (CSR)
           Flip Flop corresponding to the accessed row stores ‘1’
           A lot of power dissipated at FF clock pins
       Proposed Power Efficient CSR
           During shifting only 2 FF
            need to be enabled
           Use Clock Gating for others
Comparison of 3 Row Decoders
                 3000                                                               45000
                                                                                    40000
                 2500       Power Comparison                                        35000
                                                                                            Area Comparison
                 2000                                                               30000
     Power(uW)




                                                                      Area (um^2)
                                                                                    25000
                 1500
                                                                                    20000
                 1000                                                               15000
                                                                                    10000
                 500
                                                                                    5000
                   0                                                                   0
                        8    16    32    64     128    256      512                         8   16    32       64    128    256      512
                                        Bits                                                                  Bits

                             CSR    ClockGated CSR    Cntr+RD                                   CSR        ClockGated CSR         Cntr+RD




   Proposed Row Decoder is up to 90% and
    84% power efficient compared to CSR and
    Cntr+Decoder
   Area Penalty of about 15%
Memory Energy Modeling
    Active Energy modeled using eCACTI
        eCACTI models leakage current also
        Models Cache Power
        Modified to get SRAM power
    Standby Energy
        IStandby = 1.83 nA at Vdd = 1V [Qin05]
    Sleep Mode Energy
        ISleep = 0.55 nA at Vdd = 0.49V [Qin05]
    Wakeup Energy
        Ewakeup = 0.57 fJ * no of bits in SRAM
H. Qin, et.al, "Standy supply voltage minimization for deep sub-micron
SRAM", IEEE Microelectronics Journal, Aug 2005, vol. 36, pp. 789-800
Architecture Comparison




   8 Banks for row and column buffer in all the 3
    architectures
   Low Power Block Scan
       r =16 and s = 16
Optimization and Pipeline
Exploration
DFG Optimization
4 Stage Pipelining
                    Critical Path is Ta + Tm
                    Initiation Interval =1,
                     Resource Requirement
                        4 Multipliers
                        8 Adders
                        11 Registers
                            6 Pipelining Registers
                            4 for e1-e4
                            1 for Z4
                    Initiation Interval =2
                     Resource Requirement
                        2 Multipliers
                        4 Adders
                        9 Registers
Reducing Scaling Step Multipliers
   After Each1D DWT, multiply Low Pass Coeffs with k
    and High Pass with 1/k
   Delay the De-Interleaving of coefficients to save
    75% Multiplications
   With Throughput of 2,
    1 multiplication per cycle,
    hence 1 multiplier required
   Other Architectures require
    4 multipliers, 2 each for
    row and column processor
Pipeline Study
   Optimized DFG pipelined from 2-Stages to 8-
    Stages
   Study done to get the most power efficient
    strategy
   Impact of Pipelining on Clock Network Power
    also Accounted
Clock Tree Power Model

    H-Tree Network Assumed
    Buffer Energy also considered
    No of levels increase with
     increasing registers
          More Interconnect
          More Buffers




http://www.acsel-lab.com/Projects/detclocking/power_comparison.htm
Energy Components of Different Pipeline
Schemes
Conclusion
   “Low-Power Z-Scan” and “Low Power
    Block Scan” derived using different memory
    subsystem optimization techniques
   Optimizing the memory subsystem can result
    in up to 90% power savings
   1D-DWT DFG optimization proposed
   4-Stage pipelining on the optimized DFG is
    most energy efficient pipelined architecture
Thank You
   “A Power-Efficient Architecture for the 2-D
    Discrete Wavelet Transform”, Submitted to IEEE
    VLSI Design and Test Symposium, 2006
   “Memory Architecture Exploration for Power-
    Efficient 2D-Discrete Wavelet Transform”,
    Submitted to CODES+ISSS 2006
   “Optimization and Pipeline Exploration of 2D-
    Discrete Wavelet Transform”, Submitted to
    CASES 2006

Contenu connexe

Tendances

Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transformop205
 
Unit 3-pipelining & vector processing
Unit 3-pipelining & vector processingUnit 3-pipelining & vector processing
Unit 3-pipelining & vector processingvishal choudhary
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
Pulse Code Modulation
Pulse Code Modulation Pulse Code Modulation
Pulse Code Modulation ZunAib Ali
 
Video Compression Basics
Video Compression BasicsVideo Compression Basics
Video Compression BasicsSanjiv Malik
 
DSP architecture
DSP architectureDSP architecture
DSP architecturejstripinis
 
First order sigma delta modulator with low-power
First order sigma delta modulator with low-powerFirst order sigma delta modulator with low-power
First order sigma delta modulator with low-powereSAT Publishing House
 
Tele3113 wk9wed
Tele3113 wk9wedTele3113 wk9wed
Tele3113 wk9wedVin Voro
 
Nyquist criterion for distortion less baseband binary channel
Nyquist criterion for distortion less baseband binary channelNyquist criterion for distortion less baseband binary channel
Nyquist criterion for distortion less baseband binary channelPriyangaKR1
 
Introduction to Digital Signal processors
Introduction to Digital Signal processorsIntroduction to Digital Signal processors
Introduction to Digital Signal processorsPeriyanayagiS
 
Design of a high speed low power Brent Kung Adder in 45nM CMOS
Design of a high speed low power Brent Kung Adder in 45nM CMOSDesign of a high speed low power Brent Kung Adder in 45nM CMOS
Design of a high speed low power Brent Kung Adder in 45nM CMOSNirav Desai
 
디지털통신 9
디지털통신 9디지털통신 9
디지털통신 9KengTe Liao
 

Tendances (20)

Lesson 18
Lesson 18Lesson 18
Lesson 18
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transform
 
48
4848
48
 
Unit 3-pipelining & vector processing
Unit 3-pipelining & vector processingUnit 3-pipelining & vector processing
Unit 3-pipelining & vector processing
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Pulse Code Modulation
Pulse Code Modulation Pulse Code Modulation
Pulse Code Modulation
 
Video Compression Basics
Video Compression BasicsVideo Compression Basics
Video Compression Basics
 
Ch04
Ch04Ch04
Ch04
 
Digital Signal Processing Course Help
Digital Signal Processing Course HelpDigital Signal Processing Course Help
Digital Signal Processing Course Help
 
DSP architecture
DSP architectureDSP architecture
DSP architecture
 
Pcm
PcmPcm
Pcm
 
Chap 5
Chap 5Chap 5
Chap 5
 
Lecture set 2
Lecture set 2Lecture set 2
Lecture set 2
 
First order sigma delta modulator with low-power
First order sigma delta modulator with low-powerFirst order sigma delta modulator with low-power
First order sigma delta modulator with low-power
 
Tele3113 wk9wed
Tele3113 wk9wedTele3113 wk9wed
Tele3113 wk9wed
 
Gn3311521155
Gn3311521155Gn3311521155
Gn3311521155
 
Nyquist criterion for distortion less baseband binary channel
Nyquist criterion for distortion less baseband binary channelNyquist criterion for distortion less baseband binary channel
Nyquist criterion for distortion less baseband binary channel
 
Introduction to Digital Signal processors
Introduction to Digital Signal processorsIntroduction to Digital Signal processors
Introduction to Digital Signal processors
 
Design of a high speed low power Brent Kung Adder in 45nM CMOS
Design of a high speed low power Brent Kung Adder in 45nM CMOSDesign of a high speed low power Brent Kung Adder in 45nM CMOS
Design of a high speed low power Brent Kung Adder in 45nM CMOS
 
디지털통신 9
디지털통신 9디지털통신 9
디지털통신 9
 

En vedette

A Power Efficient Architecture for 2-D Discrete Wavelet Transform
A Power Efficient Architecture for 2-D Discrete Wavelet TransformA Power Efficient Architecture for 2-D Discrete Wavelet Transform
A Power Efficient Architecture for 2-D Discrete Wavelet TransformRahul Jain
 
Passive Low Energy Architecture Conference Paper 2009
Passive Low Energy Architecture Conference Paper 2009Passive Low Energy Architecture Conference Paper 2009
Passive Low Energy Architecture Conference Paper 2009Farah Naz
 
Cadence Ppt
Cadence PptCadence Ppt
Cadence Pptcirand
 
Design And Analysis Of Low Power High Performance Single Bit Full Adder
Design And Analysis Of Low Power High Performance Single Bit Full AdderDesign And Analysis Of Low Power High Performance Single Bit Full Adder
Design And Analysis Of Low Power High Performance Single Bit Full AdderIJTET Journal
 
Vlsi cadence tutorial_ahmet_ilker_şin
Vlsi cadence tutorial_ahmet_ilker_şinVlsi cadence tutorial_ahmet_ilker_şin
Vlsi cadence tutorial_ahmet_ilker_şinilker Şin
 
Low power & area efficient carry select adder
Low power & area efficient carry select adderLow power & area efficient carry select adder
Low power & area efficient carry select adderSai Vara Prasad P
 
Design & implementation of high speed carry select adder
Design & implementation of high speed carry select adderDesign & implementation of high speed carry select adder
Design & implementation of high speed carry select adderssingh7603
 
Project report on design & implementation of high speed carry select adder
Project report on design & implementation of high speed carry select adderProject report on design & implementation of high speed carry select adder
Project report on design & implementation of high speed carry select adderssingh7603
 
Energy Efficient Design Education Through Architectural Design Studio Projects
Energy Efficient Design Education Through Architectural Design Studio ProjectsEnergy Efficient Design Education Through Architectural Design Studio Projects
Energy Efficient Design Education Through Architectural Design Studio ProjectsKhaled Ali
 
Advanced architecture theory and criticism lecture 01
Advanced architecture theory and criticism lecture 01Advanced architecture theory and criticism lecture 01
Advanced architecture theory and criticism lecture 01Khaled Ali
 
Climate Responsive Architecture
Climate Responsive ArchitectureClimate Responsive Architecture
Climate Responsive ArchitectureDeepthi Deepu
 
Design half ,full Adder and Subtractor
Design half ,full Adder and SubtractorDesign half ,full Adder and Subtractor
Design half ,full Adder and SubtractorJaimin@prt.ltd.
 
Explain Half Adder and Full Adder with Truth Table
Explain Half Adder and Full Adder with Truth TableExplain Half Adder and Full Adder with Truth Table
Explain Half Adder and Full Adder with Truth Tableelprocus
 
Low power vlsi design ppt
Low power vlsi design pptLow power vlsi design ppt
Low power vlsi design pptAnil Yadav
 

En vedette (20)

A Power Efficient Architecture for 2-D Discrete Wavelet Transform
A Power Efficient Architecture for 2-D Discrete Wavelet TransformA Power Efficient Architecture for 2-D Discrete Wavelet Transform
A Power Efficient Architecture for 2-D Discrete Wavelet Transform
 
Low Energy Architecture: An Overview
Low Energy Architecture: An OverviewLow Energy Architecture: An Overview
Low Energy Architecture: An Overview
 
Passive Low Energy Architecture Conference Paper 2009
Passive Low Energy Architecture Conference Paper 2009Passive Low Energy Architecture Conference Paper 2009
Passive Low Energy Architecture Conference Paper 2009
 
Cadence Ppt
Cadence PptCadence Ppt
Cadence Ppt
 
Design And Analysis Of Low Power High Performance Single Bit Full Adder
Design And Analysis Of Low Power High Performance Single Bit Full AdderDesign And Analysis Of Low Power High Performance Single Bit Full Adder
Design And Analysis Of Low Power High Performance Single Bit Full Adder
 
Vlsi cadence tutorial_ahmet_ilker_şin
Vlsi cadence tutorial_ahmet_ilker_şinVlsi cadence tutorial_ahmet_ilker_şin
Vlsi cadence tutorial_ahmet_ilker_şin
 
Low power
Low powerLow power
Low power
 
Low power & area efficient carry select adder
Low power & area efficient carry select adderLow power & area efficient carry select adder
Low power & area efficient carry select adder
 
Design & implementation of high speed carry select adder
Design & implementation of high speed carry select adderDesign & implementation of high speed carry select adder
Design & implementation of high speed carry select adder
 
Project report on design & implementation of high speed carry select adder
Project report on design & implementation of high speed carry select adderProject report on design & implementation of high speed carry select adder
Project report on design & implementation of high speed carry select adder
 
Energy Efficient Architecture-Sustainable Habitat
Energy Efficient Architecture-Sustainable HabitatEnergy Efficient Architecture-Sustainable Habitat
Energy Efficient Architecture-Sustainable Habitat
 
My Report on adders
My Report on addersMy Report on adders
My Report on adders
 
Energy Efficient Design Education Through Architectural Design Studio Projects
Energy Efficient Design Education Through Architectural Design Studio ProjectsEnergy Efficient Design Education Through Architectural Design Studio Projects
Energy Efficient Design Education Through Architectural Design Studio Projects
 
Adder ppt
Adder pptAdder ppt
Adder ppt
 
Advanced architecture theory and criticism lecture 01
Advanced architecture theory and criticism lecture 01Advanced architecture theory and criticism lecture 01
Advanced architecture theory and criticism lecture 01
 
Energy Efficient and sustainable Buildings
Energy Efficient  and sustainable BuildingsEnergy Efficient  and sustainable Buildings
Energy Efficient and sustainable Buildings
 
Climate Responsive Architecture
Climate Responsive ArchitectureClimate Responsive Architecture
Climate Responsive Architecture
 
Design half ,full Adder and Subtractor
Design half ,full Adder and SubtractorDesign half ,full Adder and Subtractor
Design half ,full Adder and Subtractor
 
Explain Half Adder and Full Adder with Truth Table
Explain Half Adder and Full Adder with Truth TableExplain Half Adder and Full Adder with Truth Table
Explain Half Adder and Full Adder with Truth Table
 
Low power vlsi design ppt
Low power vlsi design pptLow power vlsi design ppt
Low power vlsi design ppt
 

Similaire à Low Power Architecture for JPEG2000

Microcontroller architecture programming and interfacing
Microcontroller architecture programming and interfacingMicrocontroller architecture programming and interfacing
Microcontroller architecture programming and interfacingthejasmeetsingh
 
error_correction.ppt
error_correction.ppterror_correction.ppt
error_correction.pptSysteDesig
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxMannyK4
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsNaughty Dog
 
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...Rahul Jain
 
CS520 Computer Architecture Project 2 � Spring 2023 Due date 0326.pdf
CS520 Computer Architecture Project 2 � Spring 2023 Due date 0326.pdfCS520 Computer Architecture Project 2 � Spring 2023 Due date 0326.pdf
CS520 Computer Architecture Project 2 � Spring 2023 Due date 0326.pdfameerandsons
 
ADC Conveter Performance and Limitations.ppt
ADC Conveter Performance and Limitations.pptADC Conveter Performance and Limitations.ppt
ADC Conveter Performance and Limitations.pptBEVARAVASUDEVAAP1813
 
HEVC Definitions and high-level syntax
HEVC Definitions and high-level syntaxHEVC Definitions and high-level syntax
HEVC Definitions and high-level syntaxYoss Cohen
 
Micro c lab8(serial communication)
Micro c lab8(serial communication)Micro c lab8(serial communication)
Micro c lab8(serial communication)Mashood
 

Similaire à Low Power Architecture for JPEG2000 (20)

Microcontroller architecture programming and interfacing
Microcontroller architecture programming and interfacingMicrocontroller architecture programming and interfacing
Microcontroller architecture programming and interfacing
 
Line coding
Line codingLine coding
Line coding
 
error_correction.ppt
error_correction.ppterror_correction.ppt
error_correction.ppt
 
Baseline Wandering
Baseline WanderingBaseline Wandering
Baseline Wandering
 
Mast content
Mast contentMast content
Mast content
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsx
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT Methods
 
Chap4
Chap4Chap4
Chap4
 
Lecture9
Lecture9Lecture9
Lecture9
 
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
 
amba.ppt
amba.pptamba.ppt
amba.ppt
 
amba.ppt
amba.pptamba.ppt
amba.ppt
 
amba (1).ppt
amba (1).pptamba (1).ppt
amba (1).ppt
 
CS520 Computer Architecture Project 2 � Spring 2023 Due date 0326.pdf
CS520 Computer Architecture Project 2 � Spring 2023 Due date 0326.pdfCS520 Computer Architecture Project 2 � Spring 2023 Due date 0326.pdf
CS520 Computer Architecture Project 2 � Spring 2023 Due date 0326.pdf
 
ADC Conveter Performance and Limitations.ppt
ADC Conveter Performance and Limitations.pptADC Conveter Performance and Limitations.ppt
ADC Conveter Performance and Limitations.ppt
 
Memory systems n
Memory systems nMemory systems n
Memory systems n
 
HEVC Definitions and high-level syntax
HEVC Definitions and high-level syntaxHEVC Definitions and high-level syntax
HEVC Definitions and high-level syntax
 
Ld2519361941
Ld2519361941Ld2519361941
Ld2519361941
 
Ld2519361941
Ld2519361941Ld2519361941
Ld2519361941
 
Micro c lab8(serial communication)
Micro c lab8(serial communication)Micro c lab8(serial communication)
Micro c lab8(serial communication)
 

Dernier

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Dernier (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Low Power Architecture for JPEG2000

  • 1. Low Power Architecture for JPEG 2000 Dr. P. R. Panda Rahul Jain Associate Professor 2004JVL2433 IIT-Delhi M.Tech (VDTT) IIT-Delhi S. Krishnakumar Cypress Semiconductor Bangalore
  • 2. Agenda  JPEG2000 and 2-D DWT  Memory Power Optimization  Existing 2D-DWT Scan Based Architectures  Proposed Architectures  Low Power Z-Scan  Low Power Block Scan  Optimization and Pipelining Exploration for 2D-DWT  Proposed DFG Optimization  Pipeline Study
  • 3. JPEG2000 Computation Blocks  Pre-processing (Image Tiling)  Discrete Wavelet Transform  Quantization  Tier-1 Coding (EBCOT)  Tier-2 Coding (File Formatting and Packing)
  • 4. Discrete Wavelet Transform  2D wavelet transform:  1st:1D wavelet transform to all rows  2nd:1D wavelet transform to all columns  Each Row/Column can be computed independently LL HL LL HL LL HL LH HH Image LH HH LH HH 1-Level DWT 2-Level DWT
  • 5. Importance of Optimizing Memory System Energy  Many emerging media applications like JPEG2000 are data intensive  For ASICs and embedded systems, memory system can contribute up to 90% energy  Multiple memories exist in a SoC design
  • 6. Optimization approaches  Fixed memory access patterns  Optimize memory architecture  Fixed memory architecture  Optimize memory access patterns  Concurrently optimize Memory Architecture and Accesses  Highest Potential  Algorithm Level  Reduce memory requirement  Improve regularity of accesses  Build optimized memory architecture  Memory Partitioning  Custom Circuits  Option Explored in this Work
  • 7. Memory Partitioning  Partition the memory array into smaller banks so that only the addressed bank is activated  improves speed and lowers power  bit line capacitance reduced  number of bit cells activated reduced  At some point the delay and power overhead associated with the bank decoding circuit dominates (2 to 8 banks typical)
  • 8. 2D-DWT Architectures  Direct  Line Based  Z-Scan  Optimal Z-Scan (Ref:Optimal data transfer and buffering schemes for JPEG2000 encoder, Mu-Yu Chiu; Kun-Bin Lee; Chein-Wei Jen; Signal Processing Systems, 2003. SIPS 2003. IEEE Workshop on 27-29 Aug. 2003 Page(s):177 – 182)
  • 9. Direct DWT  Straightforward Architecture  First Read the Image Row wise computing Row-wise 1-D DWT  Then Read the Image Column wise computing Column-wise 1-D DWT  No On-Chip Buffer Required  Reads + Writes to Off-Chip Memory = 2MN+2MN (M =Image Tile ht, N = Image Tile wd)
  • 10. Data Dependency in (9,7)DWT 0 1 2 3 4 5 6 7 8 X(i) 1 3 5 7 Y(2i+1) 0 2 4 6 8 Y(2i) 1 3 5 7 Z(2i+1) 0 2 4 6 8 Z(2i)
  • 11. Line-Based DWT  Read pixels line by line  Keep the min required number of lines in memory  Row Operation gets full line data  Column operation is activated as it gets Column data to reduce buffer  On-Chip Buffer Required = 6*N  Reads + Writes to Off-Chip Memory = MN+MN (M =Image Tile ht, N = Image Tile wd)
  • 12. Z-Scan DWT  Do a Z-Scan instead of Line by Line Scan  Column Processing can start early  On-Chip Buffer Required = 4*M  Reads + Writes to Off-Chip Memory = MN+MN (M =Image Tile ht, N = Image Tile wd)
  • 13. Optimal Z-Scan  Considers the Code-Block size (CW*CH) required by Encoding Block in the next phase • On-Chip Buffer Required = 4*M+4*2*CW • Reads + Writes to Off-Chip Memory = MN+MN (M =Image Tile ht, N = Image Tile wd) 2* CH 2* CW
  • 14. Low Power Z-Scan  Compute r elements in a row before starting with the next row  For Z Scan r =1  For Optimal Z-Scan r = 2*CW r r • On-Chip Buffer Required = 4*M+4*2*CW • Reads + Writes to Off-Chip 2*CH Memory = MN+MN (M =Image Tile ht, N = Image Tile wd)
  • 15. Low Power Z-Scan  r will be a sub-integral multiple of 2*CW  This considers the Code Block Size  No of Wakeups to the Column Buffer Banks depend on r  Large Value of r not desirable  Between the resumption of a row computation and storing back of intermediate values after calculating r row elements the buffer can go into a Low Power state  Large Value of r is desirable  Access to the buffers  Row Buffer = 2 per ‘r’ element computation  Column Buffer = 1 per element computation
  • 16. Low Power Block Scan  Extend the concept of ‘r’ for column processing also  Reduces the access to column buffer from 1 per element to 2/s per element  To maintain the throughput introduce 2 Transpose Buffers (TB1 & TB2) r  Transpose Buffer Accesses s B1 B3  Row Processor Writes  Column Processor Reads  i.e 2 access per element  TB must be much smaller s B2 B4 than Column Buffer
  • 17. Working: Low Power Block Scan  2D-DWT computed in blocks of r*s  Step 1: Row Processor (RP) computes 1D-DWT on B1 and writes into TB1  Step 2: Column Processor (CP) computes 1D-DWT on the data in TB1 (B1) and RP computes on B2 and writes into TB2  Similarly RP and CP RP: TB1 CP: RP: TB1 CP: B1 B3 B2 alternate between TB2 TB2 TB1 and TB2 TB1 TB1 RP: CP: RP: CP: B2 B1 B4 B3 TB2 TB2 B: Block, RP/CP: Row/Column Processor, TB: Transpose Buffer
  • 18. Memory Power Analysis  Memory can be in 3 modes  Active (Read/Write being done) P (n) a  Standby (No Access being done) P Standby(n)  Sleep Mode (Data Retention Mode and Cannot Access) P (n) Sleep  To Access from this mode, first wakeup the memory  Wakeup incurs energy penalty PWakeup(n)  Let ‘T’ be the minimum clock cycles for the memory to be in sleep mode to get any power advantage  To account for memory banking overhead, multiplexer power considered  P (i,j) be the power for a i:1 multiplexer of bit width j Mux  Assumption: on-chip memory access latency to fit into the clock period equal to 15ns  Power values refer to average power dissipation per coefficient computation for the corresponding memory component
  • 19. Row and Column Buffer Power  With 4-Stage pipelined DWT,10 16-bit registers need to be stored/transferred incase of suspension/resumption of line computation  Row Buffer  Size = 160*M (M: Ht of Image Tile)  ‘b’ banks, each having 160 column and M/b rows  One b:1 Mux of 160 bits required  Column Buffer  Size = 160*2*CW (CW: EBCOT code block width, usually 128)  ‘c’ banks, each having 160 column and 2*CW/c rows  One c:1 Mux of 160 bits required  Column Buffer Power analysis Similar to Row Buffer Power analysis
  • 20. Row Buffer Power  Accesses to Row Buffer  2 per ‘r’ element ie 2/r per element computation  Only one Bank active at a time, others in Sleep Mode  Row Buffer Power is:  Prow= [2*Pa(M/b)+Pmux(b,160)+(r-2)*Ps(M/b)]/r + Psleep(M/b)* (b-1)  Ps = Psleep if (r-2) >= ‘T’ else Ps = Pstandby  Due to sequential access to the Row Buffer each Bank is woken up Once  Total Row Buffer Power  PTotal_Row = Prow + [Pw(M/b) * b/(M*r) ]
  • 21. Transpose Buffer Power  2 buffers required of size r*s*16 bits partitioned into ‘d’ banks each  Access and No of Wakeups  RP: Sequential Order hence d wakeups for r*s elements  CP: Sequential Order, but in jumps of r elements  CP reads s elements from d banks  Each bank has s/d elements  If s-s/d > ‘T’, then put banks in Sleep mode and no of wakeups per element = d/s  Power  If (s-s/d >= T) P Buffer = 2* Pa(r*s/d) + Mux Power + 2*(d-1) * Psleep(r*s/d) Else PBuffer =2* Pa(r*s/d) + Mux Power + (d-1) * Psleep(r*s/d)+ (d-1) * Pstandby(r*s/d)  Mux Power = P mux (d,16) ) + Pmux (2,16)  Wakeup Power = P (r*s/d) * P w Buffer_Wake
  • 22. Memory Architecture  Row and Column Buffers  Used as Circular FIFOs  Replace General Row Decoder with Custom Circuit for Addressing  Similar observation for Transpose Buffer  Custom Row Decoder Log (n) Bit Counter Log (n) Row Decoder n  Counter and a Decoder  Circular Shift Register (CSR)  Flip Flop corresponding to the accessed row stores ‘1’  A lot of power dissipated at FF clock pins  Proposed Power Efficient CSR  During shifting only 2 FF need to be enabled  Use Clock Gating for others
  • 23. Comparison of 3 Row Decoders 3000 45000 40000 2500 Power Comparison 35000 Area Comparison 2000 30000 Power(uW) Area (um^2) 25000 1500 20000 1000 15000 10000 500 5000 0 0 8 16 32 64 128 256 512 8 16 32 64 128 256 512 Bits Bits CSR ClockGated CSR Cntr+RD CSR ClockGated CSR Cntr+RD  Proposed Row Decoder is up to 90% and 84% power efficient compared to CSR and Cntr+Decoder  Area Penalty of about 15%
  • 24. Memory Energy Modeling  Active Energy modeled using eCACTI  eCACTI models leakage current also  Models Cache Power  Modified to get SRAM power  Standby Energy  IStandby = 1.83 nA at Vdd = 1V [Qin05]  Sleep Mode Energy  ISleep = 0.55 nA at Vdd = 0.49V [Qin05]  Wakeup Energy  Ewakeup = 0.57 fJ * no of bits in SRAM H. Qin, et.al, "Standy supply voltage minimization for deep sub-micron SRAM", IEEE Microelectronics Journal, Aug 2005, vol. 36, pp. 789-800
  • 25. Architecture Comparison  8 Banks for row and column buffer in all the 3 architectures  Low Power Block Scan  r =16 and s = 16
  • 26.
  • 27.
  • 30. 4 Stage Pipelining  Critical Path is Ta + Tm  Initiation Interval =1, Resource Requirement  4 Multipliers  8 Adders  11 Registers  6 Pipelining Registers  4 for e1-e4  1 for Z4  Initiation Interval =2 Resource Requirement  2 Multipliers  4 Adders  9 Registers
  • 31. Reducing Scaling Step Multipliers  After Each1D DWT, multiply Low Pass Coeffs with k and High Pass with 1/k  Delay the De-Interleaving of coefficients to save 75% Multiplications  With Throughput of 2, 1 multiplication per cycle, hence 1 multiplier required  Other Architectures require 4 multipliers, 2 each for row and column processor
  • 32.
  • 33.
  • 34. Pipeline Study  Optimized DFG pipelined from 2-Stages to 8- Stages  Study done to get the most power efficient strategy  Impact of Pipelining on Clock Network Power also Accounted
  • 35. Clock Tree Power Model  H-Tree Network Assumed  Buffer Energy also considered  No of levels increase with increasing registers  More Interconnect  More Buffers http://www.acsel-lab.com/Projects/detclocking/power_comparison.htm
  • 36.
  • 37. Energy Components of Different Pipeline Schemes
  • 38. Conclusion  “Low-Power Z-Scan” and “Low Power Block Scan” derived using different memory subsystem optimization techniques  Optimizing the memory subsystem can result in up to 90% power savings  1D-DWT DFG optimization proposed  4-Stage pipelining on the optimized DFG is most energy efficient pipelined architecture
  • 39. Thank You  “A Power-Efficient Architecture for the 2-D Discrete Wavelet Transform”, Submitted to IEEE VLSI Design and Test Symposium, 2006  “Memory Architecture Exploration for Power- Efficient 2D-Discrete Wavelet Transform”, Submitted to CODES+ISSS 2006  “Optimization and Pipeline Exploration of 2D- Discrete Wavelet Transform”, Submitted to CASES 2006