SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
13 Years of Experience
Automated Services
24/7 Help Desk Support
Experience & Expertise Developers
Advanced Technologies & Tools
Legitimate Member of all Journals
Having 1,50,000 Successive records in
all Languages
More than 12 Branches in Tamilnadu,
Kerala & Karnataka.
Ticketing & Appointment Systems.
Individual Care for every Student.
Around 250 Developers & 20
Researchers
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
227-230 Church Road, Anna Nagar, Madurai – 625020.
0452-4390702, 4392702, + 91-9944793398.
info@elysiumtechnologies.com, elysiumtechnologies@gmail.com
S.P.Towers, No.81 Valluvar Kottam High Road, Nungambakkam,
Chennai - 600034. 044-42072702, +91-9600354638,
chennai@elysiumtechnologies.com
15, III Floor, SI Towers, Melapudur main Road, Trichy – 620001.
0431-4002234, + 91-9790464324.
trichy@elysiumtechnologies.com
577/4, DB Road, RS Puram, Opp to KFC, Coimbatore – 641002
0422- 4377758, +91-9677751577.
coimbatore@elysiumtechnologies.com
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
Plot No: 4, C Colony, P&T Extension, Perumal puram, Tirunelveli-
627007. 0462-2532104, +919677733255,
tirunelveli@elysiumtechnologies.com
1st Floor, A.R.IT Park, Rasi Color Scan Building, Ramanathapuram
- 623501. 04567-223225,
+919677704922.ramnad@elysiumtechnologies.com
74, 2nd floor, K.V.K Complex,Upstairs Krishna Sweets, Mettur
Road, Opp. Bus stand, Erode-638 011. 0424-4030055, +91-
9677748477 erode@elysiumtechnologies.com
No: 88, First Floor, S.V.Patel Salai, Pondicherry – 605 001. 0413–
4200640 +91-9677704822
pondy@elysiumtechnologies.com
TNHB A-Block, D.no.10, Opp: Hotel Ganesh Near Busstand. Salem
– 636007, 0427-4042220, +91-9894444716.
salem@elysiumtechnologies.com
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL
VLSI-001
Pragmatic Integration of an SRAM Row Cache in Heterogeneous 3-D DRAM
Architecture Using TSV
Abstract: As scaling DRAM cells becomes more challenging and energy-efficient DRAM chips are in
high demand, the DRAM industry has started to undertake an alternative approach to address these
looming issues-that is, to vertically stack DRAM dies with through-silicon-vias (TSVs) using 3-D-IC
technology. Furthermore, this emerging integration technology also makes heterogeneous die stacking in
one DRAM package possible. Such a heterogeneous DRAM chip provides a unique, promising
opportunity for computer architects to contemplate a new memory hierarchy for future system design. In
this paper, we study how to design such a heterogeneous DRAM chip for improving both performance
and energy efficiency. In particular, we found that, if we want to design an SRAM row cache in a DRAM
chip, simple stacking alone cannot address the majority of traditional SRAM row cache design issues. In
this paper, to address these issues, we propose a novel floorplan and several architectural techniques that
fully exploit the benefits of 3-D stacking technology. Our multi-core simulation results with memory-
intensive applications suggest that, by tightly integrating a small row cache with its corresponding DRAM
array, we can improve performance by 30% while saving dynamic energy by 31%.
ETPL
VLSI-002
A Low-Complexity Turbo Decoder Architecture for Energy-Efficient Wireless Sensor
Networks
Abstract: Turbo codes have recently been considered for energy-constrained wireless communication
applications, since they facilitate a low transmission energy consumption. However, in order to reduce the
overall energy consumption, lookup table-log-BCJR (LUT-Log-BCJR) architectures having a low
processing energy consumption are required. In this paper, we decompose the LUT-Log-BCJR
architecture into its most fundamental add compare select (ACS) operations and perform them using a
novel low-complexity ACS unit. We demonstrate that our architecture employs an order of magnitude
fewer gates than the most recent LUT-Log-BCJR architectures, facilitating a 71% energy consumption
reduction. Compared to state-of-the-art maximum logarithmic Bahl-Cocke-Jelinek-Raviv
implementations, our approach facilitates a 10% reduction in the overall energy consumption at ranges
above 58 m.
ETPL
VLSI-003
Pipelined Radix- 2k
Feedforward FFT Architectures
Abstract: The appearance of radix-22 was a milestone in the design of pipelined FFT hardware
architectures. Later, radix-22 was extended to radix-2k . However, radix-2k was only proposed for single-
path delay feedback (SDF) architectures, but not for feedforward ones, also called multi-path delay
commutator (MDC). This paper presents the radix-2k feedforward (MDC) FFT architectures. In
feedforward architectures radix-2k can be used for any number of parallel samples which is a power of
two. Furthermore, both decimation in frequency (DIF) and decimation in time (DIT) decompositions can
be used. In addition to this, the designs can achieve very high throughputs, which makes them suitable for
the most demanding applications. Indeed, the proposed radix-2k feedforward architectures require fewer
hardware resources than parallel feedback ones, also called multi-path delay feedback (MDF), when
several samples in parallel must be processed. As a result, the proposed radix-2k feedforward
architectures not only offer an attractive solution for current applications, but also open up a new research
line on feedforward structures.
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL
VLSI-004
Algorithm and Architecture Design of Bandwidth-Oriented Motion Estimation for
Real-Time Mobile Video Applications
Abstract: This paper proposes a data bandwidth-oriented motion estimation design for resource-limited
mobile video applications using an integrated bandwidth rate distortion optimization framework. This
framework predicts and allocates the appropriate data bandwidth for motion estimation under a limited
bandwidth supply to fit a dynamically changing bandwidth supply. The simulation results show that our
proposed algorithm can achieve 66% and 41% memory bandwidth savings while maintaining an
equivalent rate-distortion performance and meeting real-time targets, when compared with conventional
approaches for low-motion and high-motion D1 (704 ×  576)-size video, respectively.
The final implementation costs 122 K gate counts with TSMC 0.13-μ m CMOS technology and consumes
74 mW of power for D1 resolution at 30 frames/s which is 40% of that achieved in previous designs.
ETPL
VLSI-005
STBC-OFDM Downlink Baseband Receiver for Mobile WMAN
Abstract: This paper proposes a space time block code-orthogonal frequency division multiplexing
downlink baseband receiver for mobile wireless metropolitan area network. The proposed baseband
receiver applied in the system with two transmit antennas and one receive antenna aims to provide high
performance in outdoor mobile environments. It provides a simple and robust synchronizer and an
accurate but hardware affordable channel estimator to overcome the challenge of multipath fading
channels. The coded bit error rate performance for 16 quadrature amplitude modulation can achieve less
than 10-6 under the vehicle speed of 120 km/hr. The proposed baseband receiver designed in 90-nm
CMOS technology can support up to 27.32 Mb/s uncoded data transmission under 10 MHz channel
bandwidth. It requires a core area of 2.41 × 2.41 mm2 and dissipates 68.48 mW at 78.4 MHz with 1 V
power supply.
ETPL
VLSI-006
Glitch-Free NAND-Based Digitally Controlled Delay-Lines
Abstract: The recently proposed NAND-based digitally controlled delay-lines (DCDL) present a glitching
problem which may limit their employ in many applications. This paper presents a glitch-free NAND-
based DCDL which overcame this limitation by opening the employ of NAND-based DCDLs in a wide
range of applications. The proposed NAND-based DCDL maintains the same resolution and minimum
delay of previously proposed NAND-based DCDL. The theoretical demonstration of the glitch-free
operation of proposed DCDL is also derived in the paper. Following this analysis, three driving circuits
for the delay control-bits are also proposed. Proposed DCDLs have been designed in a 90-nm CMOS
technology and compared, in this technology, to the state-of-the-art. Simulation results show that novel
circuits result in the lowest resolution, with a little worsening of the minimum delay with respect to the
previously proposed DCDL with the lowest delay. Simulations also confirm the correctness of developed
glitching model and sizing strategy. As example application, proposed DCDL is used to realize an All-
digital spread-spectrum clock generator (SSCG). The employ of proposed DCDL in this circuit allows to
reduce the peak-to-peak absolute output jitter of more than the 40% with respect to a SSCG using three-
state inverter based DCDLs.
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL
VLSI-007
A High-Efficiency, Wide Workload Range, Digital Off-Time Modulation (DOTM) DC-
DC Converter With Asynchronous Power Saving Technique
Abstract: Conventionally for wide workload range applications, to keep good stability and high
efficiency, a switching converter with multi-mode operation is necessary. With the advanced digital
signal processing, this work presents an asynchronous digital controller with dynamic power saving
technique to achieve high power efficiency. The regulation is based on the off-time modulation, in which
an adaptive resolution adjustment is proposed for the extension toward light-loaded range. The DC-DC
converter is fabricated in a 0.18- μm CMOS process. The input voltage is from 2.7 to 3.6 V and the
regulated output is 1.8 V. The switching frequency is from 44 kHz to 1.65 MHz and the maximum output
ripple is 20 mV with a 10-μF capacitor and a 2.2-μH inductor. The power efficiency is higher than 91%
for the workload range from 3 to 400 mA.
ETPL
VLSI-008
Formal Verification of Architectural Power Intent
Abstract: This paper presents a verification framework that attempts to bridge the disconnect between
high-level properties capturing the architectural power management strategy and the implementation of
the power management control logic using low-level per-domain control signals. The novelty of the
proposed framework is in demonstrating that the architectural power intent properties developed using
high-level artifacts can be automatically translated into properties over low-level control sequences
gleaned from UPF specifications of power domains, and that the resulting properties can be used to
formally verify the global on-chip power management logic. The proposed translation uses a considerable
amount of domain knowledge and is also not purely syntactic, because it requires formal extraction of
timing information for the low-level control sequences. We present a tool, called POWER-TRUCTOR
which enables the proposed framework, and several test cases of significant complexity to demonstrate
the feasibility of the proposed framework.
ETPL
VLSI-009
Statistical SRAM Read Access Yield Improvement Using Negative Capacitance
Circuits
Abstract: SRAM has become the dominant block in modern ICs and constitutes more than 50% of the die
area. The increase of process variations with continued CMOS technology scaling is considered one of
the major challenges for SRAM designers. This process variations increase causes the SRAM cells to
functionally fail and reduces the chip functional yield considering the static noise margin stability failures
(i.e., cell flips when accessed), write failures (i.e., cell is not written within the write window), and read
access failures (i.e., incorrect read operation). In this paper, novel negative capacitance circuits are
developed, for the first time, to statistically improve the SRAM read access yield under process variations
by reducing the bitlines parasitic capacitance. Post layout simulation results, referring to an industrial
hardware-calibrated TSMC 65-nm CMOS technology, show that the adoption of the negative capacitance
circuit to a 512 SRAM cells column is capable of improving the read access yield from 61.9% to 100%.
ETPL
VLSI-010
An Energy-Efficient L2 Cache Architecture Using Way Tag Information Under Write-
Through Policy
Abstract: Many high-performance microprocessors employ cache write-through policy for performance
improvement and at the same time achieving good tolerance to soft errors in on-chip caches. However,
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
write-through policy also incurs large energy overhead due to the increased accesses to caches at the
lower level (e.g., L2 caches) during write operations. In this paper, we propose a new cache architecture
referred to as way-tagged cache to improve the energy efficiency of write-through caches. By maintaining
the way tags of L2 cache in the L1 cache during read operations, the proposed technique enables L2 cache
to work in an equivalent direct-mapping manner during write hits, which account for the majority of L2
cache accesses. This leads to significant energy reduction without performance degradation. Simulation
results on the SPEC CPU2000 benchmarks demonstrate that the proposed technique achieves 65.4%
energy savings in L2 caches on average with only 0.02% area overhead and no performance degradation.
Similar results are also obtained under different L1 and L2 cache configurations. Furthermore, the idea of
way tagging can be applied to existing low-power cache design techniques to further improve energy
efficiency.
ETPL
VLSI-011
An Analytical Latency Model for Networks-on-Chip
Abstract: We propose an analytical model based on queueing theory for delay analysis in a wormhole-
switched network-on-chip (NoC). The proposed model takes as input an application communication
graph, a topology graph, a mapping vector, and a routing matrix, and estimates average packet latency
and router blocking time. It works for arbitrary network topology with deterministic routing under
arbitrary traffic patterns. This model can estimate per-flow average latency accurately and quickly, thus
enabling fast design space exploration of various design parameters in NoC designs. Experimental results
show that the proposed analytical model can predict the average packet latency more than four orders of
magnitude faster than an accurate simulation, while the computation error is less than 10% in non-
saturated networks for different system-on-chip platforms.
ETPL
VLSI-012
Built-In Generation of Functional Broadside Tests Using a Fixed Hardware Structure
Abstract: Functional broadside tests are two-pattern scan-based tests that avoid overtesting by ensuring
that a circuit traverses only reachable states during the functional clock cycles of a test. In addition, the
power dissipation during the fast functional clock cycles of functional broadside tests does not exceed that
possible during functional operation. On-chip test generation has the added advantage that it reduces test
data volume and facilitates at-speed test application. This paper shows that on-chip generation of
functional broadside tests can be done using a simple and fixed hardware structure, with a small number
of parameters that need to be tailored to a given circuit, and can achieve high transition fault coverage for
testable circuits. With the proposed on-chip test generation method, the circuit is used for generating
reachable states during test application. This alleviates the need to compute reachable states offline.
ETPL
VLSI-013
Checkpointing for Virtual Platforms and SystemC-TLM
Abstract: Integrating simulation models created using different simulation systems is a common problem
when constructing virtual platforms. Different companies and different departments can create models,
and virtual platforms for different purposes using different tools. There are also existing models that need
to be integrated into new tools, or the other way around. The simulators can be quite different in details,
even in the case of transaction-level models. We present work in integrating SystemC transaction-level
models into two typical full-system simulation environments, QEMU and Simics. We present issues in
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
reconciling the semantics of the different platforms, and our proposed solutions. In the Simics integration,
we additionally enable checkpointing in the models, based on the Simics checkpoint mechanism.
ETPL
VLSI-014
Design of a Practical Nanometer-Scale Redundant Via-Aware Standard Cell Library
for Improved Redundant Via1 Insertion Rate
Abstract: Despite the rapid advances in process technology, via failure is still problematic in nanometer-
scale semiconductor manufacturing. Adding redundant vias is a typical approach for improving yield and
reliability. Cell-based design methodologies are widely adopted in the industry for application-specific
integrated circuits. Standard cells are effective for increasing the insertion rate of redundant via1s in cell-
based designs. This study proposes an efficient library check and staggered pin arrangement approach that
compares redundant via1 insertion rate in different configurations such as double-via and rectangle-via.
To compare the variability in standard cell (SC) libraries, accurate characterization results are provided.
Moreover, the proposed SC library is easily implemented in all currently available routers. The
experimental results reveal that the proposed library improves total inserted redundant vias, total inserted
redundant via1s, and total run time by 20.2%, 51.9%, and 42.3%, respectively. In double-via pattern, the
proposed approach improves average via1 insertion rate by 14.6%. In rectangle-via pattern, the proposed
approach achieves a 100% via1 insertion rate.
ETPL
VLSI-015
Scaling Energy Per Operation via an Asynchronous Pipeline
Abstract: Statistical analysis of computations per unit energy in processors over the last 30 years is given
that illustrates a sharp reduction in the rate of energy efficiency improvements over the last several years
resulting in the formation of an asymptotic “wall” with our dataset; we use the measure of giga multiply
accumulates per Joule. We have developed an energy model which takes into account the realities of
scaling, specifically for asynchronous systems. Studies of an energy efficient asynchronous pipeline show
fabricated results of 17 Giga Operations per Joule in 0.6 μm at subthreshold when fully pipelined, and
simulations at a more modern 65 nm process show a further order of magnitude improvement on that.
ETPL
VLSI-016
A High Speed Low Power CAM With a Parity Bit and Power-Gated ML Sensing
Abstract: Content addressable memory (CAM) offers high-speed search function in a single clock cycle.
Due to its parallel match-line (ML) comparison, CAM is power-hungry. Thus, robust, high-speed and
low-power ML sense amplifiers are highly sought-after in CAM designs. In this paper, we introduce a
parity bit that leads to 39% sensing delay reduction at a cost of less than 1% area and power overhead.
Furthermore, we propose an effective gated-power technique to reduce the peak and average power
consumption and enhance the robustness of the design against process variations. A feedback loop is
employed to auto-turn off the power supply to the comparison elements and hence reduce the average
power consumption by 64%. The proposed design can work at a supply voltage down to 0.5 V.
ETPL
VLSI-017
Error Detection in Majority Logic Decoding of Euclidean Geometry Low Density
Parity Check (EG-LDPC) Codes
Abstract: In a recent paper, a method was proposed to accelerate the majority logic decoding of difference
set low density parity check codes. This is useful as majority logic decoding can be implemented serially
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
with simple hardware but requires a large decoding time. For memory applications, this increases the
memory access time. The method detects whether a word has errors in the first iterations of majority logic
decoding, and when there are no errors the decoding ends without completing the rest of the iterations.
Since most words in a memory will be error-free, the average decoding time is greatly reduced. In this
brief, we study the application of a similar technique to a class of Euclidean geometry low density parity
check (EG-LDPC) codes that are one step majority logic decodable. The results obtained show that the
method is also effective for EG-LDPC codes. Extensive simulation results are given to accurately
estimate the probability of error detection for different code sizes and numbers of errors.
ETPL
VLSI-018
Techniques for Compensating Memory Errors in JPEG2000
Abstract: This paper presents novel techniques to mitigate the effects of SRAM memory failures caused
by low voltage operation in JPEG2000 implementations. We investigate error control coding schemes,
specifically single error correction double error detection code based schemes, and propose an unequal
error protection scheme tailored for JPEG2000 that reduces memory overhead with minimal effect in
performance. Furthermore, we propose algorithm-specific techniques that exploit the characteristics of the
discrete wavelet transform coefficients to identify and remove SRAM errors. These techniques do not
require any additional memory, have low circuit overhead, and more importantly, reduce the memory
power consumption significantly with only a small reduction in image quality.
ETPL
VLSI-019
Spatial Distribution Measurement of Dynamic Voltage Drop Caused by Pulse and
Periodic Injection of Spot Noise
Abstract: This paper presents measured results of dynamic voltage drop caused by pulse and periodic
injection of spot noise. The test structure being fabricated by a 45 nm low-power process has 1024 delay
probes to measure spatial distributions in response to the spot-noise generation. The test structure is the
advanced version of our predecessor being fabricated by a 65-nm node, and can trace changes in the
spatial distributions with time after the noise injection. The measured results are compared with SPICE
simulations, in which package/socket LCR as well as power-line RC within the die is modeled. It is found
that the simple model agrees well with the measured results.
ETPL
VLSI-020
Low-Complexity Multiplier for GF(2^{m}) Based on All-One Polynomials
Abstract: This paper presents an area-time-efficient systolic structure for multiplication over GF(2m)
based on irreducible all-one polynomial (AOP). We have used a novel cut-set retiming to reduce the
duration of the critical-path to one XOR gate delay. It is further shown that the systolic structure can be
decomposed into two or more parallel systolic branches, where the pair of parallel systolic branches has
the same input operand, and they can share the same input operand registers. From the application-
specific integrated circuit and field-programmable gate array synthesis results we find that the proposed
design provides significantly less area-delay and power-delay complexities over the best of the existing
designs.
ETPL
VLSI-021
Design and Implementation of an On-Chip Permutation Network for Multiprocessor
System-On-Chip
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
Abstract: This paper presents the silicon-proven design of a novel on-chip network to support guaranteed
traffic permutation in multiprocessor system-on-chip applications. The proposed network employs a
pipelined circuit-switching approach combined with a dynamic path-setup scheme under a multistage
network topology. The dynamic path-setup scheme enables runtime path arrangement for arbitrary traffic
permutations. The circuit-switching approach offers a guarantee of permuted data and its compact
overhead enables the benefit of stacking multiple networks. A 0.13-μ m CMOS test-chip validates the
feasibility and efficiency of the proposed design. Experimental results show that the proposed on-chip
network achieves 1.9× to 8.2× reduction of silicon overhead compared to other design approaches.
ETPL
VLSI-022
An On-Chip Network Fabric Supporting Coarse-Grained Processor Array
Abstract: Coarse grained arrays (CGAs) with run-time reconfigurability play an important role in
accelerating reconfigurable computing applications. It is challenging to design on-chip communication
networks (OCNs) for such CGAs with dynamic run-time reconfigurability whilst satisfying the tight
budgets of power and area for an embedded system. This paper presents a silicon-proven design of a 64-
PE circuit-switched OCN fabric with a dynamic path-setup scheme capable of supporting an embedded
coarse-grained processor array. A proof-of-concept test chip fabricated in a 0.13 μm CMOS process
occupies a silicon area of 23 mm2 and consumes a peak power of 200 mW @ 128 MHz and 1.2 Vcc, at
room temperature. The OCN overhead consumes 9.4% of the area and 18% of the power of the total chip.
Experimental results and analysis show that the proposed OCN fabric with its dynamic path-setup is
suitable for use in an embedded CGA supporting fast run-time reconfigurability.
ETPL
VLSI-023
A Very Linear Low-Pass Filter with Automatic Frequency Tuning
Abstract: A Gm-C third-order Chebyshev low-pass filter with a novel switched capacitor frequency
tuning technique for a zero-IF Bluetooth receiver has been designed. The frequency tuning scheme is
simpler and has more relaxed specifications than conventional ones. Furthermore, a highly linear pseudo-
differential transconductor with a compact feedback loop able to operate with low supply voltage has
been used. This control loop holds the input transistors in triode region and provides high output
resistance, keeping high linearity in a wide range of transconductance. The filter bandwidth is 0.5 MHz
and the overall scheme consumes 1.1 mA from a 1.8-V supply. The measured third-order intermodulation
(IM3) distortion of the filter for a 1 Vpp two-tone signal centered at 300 kHz is -65 dB.
ETPL
VLSI-024
A High-Speed Low-Complexity Modified {rm Radix}-2^{5} FFT Processor for High
Rate WPAN Applications
Abstract: This paper presents a high-speed low-complexity modified radix-25 512-point fast Fourier
transform (FFT) processor using an eight data-path pipelined approach for high rate wireless personal
area network applications. A novel modified radix-25 FFT algorithm that reduces the hardware
complexity is proposed. This method can reduce the number of complex multiplications and the size of
the twiddle factor memory. It also uses a complex constant multiplier instead of a complex Booth
multiplier. The proposed FFT processor achieves a signal-to-quantization noise ratio of 35 dB at 12 bit
internal word length. The proposed processor has been designed and implemented using 90-nm CMOS
technology with a supply voltage of 1.2 V. The results demonstrate that the total gate count of the
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
proposed FFT processor is 290 K. Furthermore, the highest throughput rate is up to 2.5 GS/s at 310 MHz
while requiring much less hardware complexity.
ETPL
VLSI-025
Application Space Exploration of a Heterogeneous Run-Time Configurable Digital
Signal Processor
Abstract: This paper describes the application space exploration of a heterogeneous digital signal
processor with dynamic reconfiguration capabilities. The device is built around three reconfigurable
engines featuring different flavours and computation granularities that make it suitable for a wide range of
signal processing application domains such as video coding, image processing, telecommunications, and
cryptography. Performance of signal processing applications is evaluated from measurements performed
on a CMOS 90 nm prototype. In order to characterize the application space of the processor, performance
is compared with state-of-the-art devices, taking programmability, computational capabilities, and energy
efficiency as the main metrics. The device exploits performance and energy efficiency significantly more
than general purpose processors, while still maintaining a user-friendly programming approach that
mainly relies on software-oriented languages. The device is able to achieve 1.2 to 15 GOPS with an
energy efficiency from 2 to 50 GOPS/W when running the selected applications
ETPL
VLSI-026
A Unified Graphics and Vision Processor With a 0.89 mu W/fps Pose Estimation
Engine for Augmented Reality
Abstract: A unified vision and graphics processor with three layers is shown to provide a fast pipeline for
augmented reality. In the image-level layer, a 153.6 GOPS massively parallel processing unit with eight
SIMD processors, each containing 128 processing elements, performs highly data-parallel operations. In
the sub-image layer, a rasterizer and a pixel arranger respectively generate and reduce data-level
parallelism. In the descriptor-level layer, a pose estimation engine executes sequential programs. Our
processor can provide images for augmented reality at 100 fps, for a power consumption of 413 mW. This
is 39% faster than a comparable smartphone implementation. Our chip is fabricated in a 0.18 μm CMOS
process and contains 0.95 M gates.
ETPL
VLSI-027
CORDIC Designs for Fixed Angle of Rotation
Abstract: Rotation of vectors through fixed and known angles has wide applications in robotics, digital
signal processing, graphics, games, and animation. But, we do not find any optimized coordinate rotation
digital computer (CORDIC) design for vector-rotation through specific angles. Therefore, in this paper,
we present optimization schemes and CORDIC circuits for fixed and known rotations with different
levels of accuracy. For reducing the area- and time-complexities, we have proposed a hardwired pre-
shifting scheme in barrel-shifters of the proposed circuits. Two dedicated CORDIC cells are proposed for
the fixed-angle rotations. In one of those cells, micro-rotations and scaling are interleaved, and in the
other they are implemented in two separate stages. Pipelined schemes are suggested further for cascading
dedicated single-rotation units and bi-rotation CORDIC units for high-throughput and reduced latency
implementations. We have obtained the optimized set of micro-rotations for fixed and known angles. The
optimized scale-factors are also derived and dedicated shift-add circuits are designed to implement the
scaling. The fixed-point mean-squared-error of the proposed CORDIC circuit is analyzed statistically, and
strategies for reducing the error are given. We have synthesized the proposed CORDIC cells by Synopsys
Design Compiler using TSMC 90-nm library, and shown that the proposed designs offer higher
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
throughput, less latency and less area-delay product than the reference CORDIC design for fixed and
known angles of rotation. We find similar results of synthesis for different Xilinx field-programmable
gate-array platforms.
ETPL
VLSI-028
Application-Driven End-to-End Traffic Predictions for Low Power NoC Design
Abstract: As chip multiprocessors keep increasing the number of cores on the chip, the network-on-chip
(NoC) technology is becoming essential for interconnecting the cores. While NoCs result in noticeable
performance boost over conventional bus systems, they consume a non-negligible fraction of the system
power. One promising solution is to dynamically adjust the working frequencies/voltages of the switches
as well as the links between switches in the NoC to match the traffic flows. The question is when to adjust
and by how much. Most previous works take a passive approach by reacting to fluctuations in local traffic
flows. Unfortunately, this approach may be too slow and too conservative in adjusting the working
frequencies/voltages. Since applications often exhibit periodic behaviors, we propose a hardware
mechanism to proactively adjust the frequencies/voltages of switches and/or links in NoC by predicting
the application runtime traffic. The evaluations show that our design achieves 86% dynamic power
savings of the links in the on-chip network, and the resulting overheads from mispredictions are tolerable.
ETPL
VLSI-029
Thermal-Constrained Task Allocation for Interconnect Energy Reduction in 3-D
Homogeneous MPSoCs
Abstract: 3-D technology that stacks silicon dies with through silicon vias (TSVs) is a promising solution
to overcome the interconnect scaling problem in giga-scale integrated circuits (ICs). Thermal dissipation
is a major challenge for 3-D integration and prior thermal-balanced task scheduling methods for 3-D
multiprocessor system-on-chips (MPSoCs) typically balance power gradient across vertical stacks based
on the assumption of strong thermal correlation among processing cores within a stack. On the other
hand, 3-D MPSoCs typically employ network-on-chip (NoC) as the communication infrastructure which
consumes a large portion of the energy budget. As TSVs consume much less energy than horizontal links
in 3-D MPSoCs when transmitting the same amount data due to the reduced interconnect distance
between vertical adjacent cores, it motivates to allocate heavily communicating tasks within the same
vertical stack as much as possible, and thus traffic is restricted in the third dimension to reduce
interconnect energy. However, aggregating active tasks within the same stack probably exacerbates the
power density and result in hot spots. In this paper, we explore the tradeoff between thermal and
interconnect energy when allocating tasks in 3-D Homogeneous MPSoCs, and propose an efficient
heuristic. Experimental results show that the proposed technique can reduce interconnect energy by more
than 25% on average with almost the same peak temperature when compared with prior thermal-balanced
solutions.
ETPL
VLSI-030
A Wide-Range PLL Using Self-Healing Prescaler/VCO in 65-nm CMOS
Abstract: The variability and leakage current in nanoscale CMOS technology may degrade the circuit
performances significantly. To accommodate the above issues in a wide-range phase-locked loop (PLL), a
self-healing prescaler, a self-healing voltage-controlled oscillator (VCO), and a calibrated charge pump
(CP) are presented. This PLL is fabricated in a 65-nm CMOS technology and its active area is 0.0182
mm2 . For the self-healing VCO, its measured frequency range is from 60 to 1489 MHz. When this PLL
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
operates at 855 MHz, the measured rms and peak-to-peak jitters are 8.03 and 55.6 ps, respectively. The
measured reference spur is -52.89 dBc. This PLL consumes 4.3 mW from 1.2 V supply without buffers.
ETPL
VLSI-031
A Clock Control Strategy for Peak Power and RMS Current Reduction Using Path
Clustering
Abstract: Peak power reduction has been a critical challenge in the design of integrated circuits impacting
the chip's performance and reliability. The reduction of peak power also reduces the power density of
integrated circuits. Due to large IR-voltage drops in circuits, transistor switching slows down giving rise
to timing violations and logic failures. In this paper, we present a new clock control strategy for peak-
power reduction in VLSI circuits. In the proposed method, the simultaneous switching of combinational
paths is minimized by taking advantage of the delay slacks among the paths and clustering the paths with
similar slack values. Once the paths are identified based on the path delays and their slack values, the
clustering algorithm determines the ideal number of clusters for the given circuit and for each cluster the
maximum possible phase shift that can be applied to the clock. The paths are assigned to clusters in a load
balanced manner based on the slack values and each cluster will have a phase shift possible on its clock
depending on the slack. Thus, the proposed register-transfer level (RTL) method takes advantage of the
logic-path timing slack to re-schedule circuit activities at optimal intervals within the unaltered clock
period. When switching activities are redistributed more evenly across the clock period, the IC supply-
current consumption is also spread across a wider range of time within the clock period. This has the
beneficial effect of reducing peak-current draw in addition to reducing RMS power draw without having
to change the operating frequency and without utilizing additional power supply voltages as in dual or
multi VT approaches. The proposed method is implemented and tested through simulations using an
experimental setup with Synopsys Tools Suite and Cadence Tools on the ISCAS'85 benchmark circuits,
OpenCore circuits and LEON processor multiplier circuit. Experimental results indicate that peak power
can be reduced significantly to at- least 72% depending on the number of clusters and the phase-shifted
clock identified as suitable for the given circuit by the proposed algorithms. Although the proposed
method incurs some power overhead compared to the traditional clocking method, the overhead can be
made negligible compared to the peak-power reduction as seen in the experimental results presented.
ETPL
VLSI-032
A Fast-Locking All-Digital Deskew Buffer With Duty-Cycle Correction
Abstract: In this paper, a fast-locking all-digital deskew buffer with duty cycle correction is proposed and
implemented. A cyclic time-to-digital converter is introduced to decrease the locking time in conventional
register-controlled delay-locked loop to only two input clock cycles in coarse tuning. With the aid of the
three half delay lines technique, the mismatch between half delay lines causing the duty cycle distortion
can be alleviated by interpolation. A balanced edge combiner to achieve a precise 50% output clock is
also presented. A test chip is fabricated in 0.18-μm technology to demonstrate the feasibility of the
proposed architecture. The circuit can accept the input clock rates from 250 to 625 MHz with the duty
cycle variation within 30% and 70% to generate 50% output clocks. It preserves the capability of closed-
loop control with a small area and power consumption.
ETPL
VLSI-033
A Built-In Repair Analyzer With Optimal Repair Rate for Word-Oriented Memories
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
Abstract: This paper presents a built-in self repair analyzer with the optimal repair rate for memory arrays
with redundancy. The proposed method requires only a single test, even in the worst case. By performing
the must-repair analysis on the fly during the test, it selectively stores fault addresses, and the final
analysis to find a solution is performed on the stored fault addresses. To enumerate all possible solutions,
existing techniques use depth first search using a stack and a finite-state machine. Instead, we propose a
new algorithm and its combinational circuit implementation. Since our formulation for the circuit allows
us to use the parallel prefix algorithm, it can be configured in various ways to meet area and test time
requirements. The total area of our infrastructure is dominated by the number of content addressable
memory entries to store the fault addresses, and it only grows quadratically with respect to the number of
repair elements. The infrastructure is also extended to support various types of word-oriented memories.
ETPL
VLSI-034
System-Level Modeling and Analysis of Thermal Effects in Optical Networks-on-Chip
Abstract: The performance of multiprocessor systems, such as chip multiprocessors (CMPs), is
determined not only by individual processor performance, but also by how efficiently the processors
collaborate with one another. It is the communication architecture that determines the collaboration
efficiency on the hardware side. Optical networks-on-chip (ONoCs) are emerging communication
architectures that can potentially offer ultra-high communication bandwidth and low latency to
multiprocessor systems. Thermal sensitivity is an intrinsic characteristic of photonic devices used by
ONoCs as well as a potential issue. This paper systematically modeled and quantitatively analyzed the
thermal effects in ONoCs. We used an 8 × 8 mesh-based ONoC as a case study and evaluated the impacts
of thermal effects in the average power efficiency for real MPSoC applications. We revealed three
important factors regarding ONoC power efficiency under temperature variations, and proposed several
techniques to reduce the temperature sensitivity of ONoCs. These techniques include the optimal initial
setting of microresonator resonant wavelength, increasing the 3-dB bandwidth of optical switching
elements by parallel coupling multiple microresonators, and the use of passive-routing optical router Crux
to minimize the number of switching stages in mesh-based ONoCs. We gave a mathematical analysis of
periodically parallel coupling of multiple microresonators and show that the 3-dB bandwidth of optical
switching elements can be widened nearly linearly with the ring number. Evaluation results for different
real MPSoC applications show that, on the basis of thermal tuning, the optimal device setting improves
the average power efficiency by 54% to 1.2 pJ/bit when chip temperature reaches 85 °C. The findings in
this paper can help support the further development of this emerging technology.
ETPL
VLSI-035
A Study of Tapered 3-D TSVs for Power and Thermal Integrity
Abstract: 3-D integration presents a path to higher performance, greater density, increased functionality
and heterogeneous technology implementation. However, 3-D integration introduces many challenges for
power and thermal integrity due to large switching currents, longer power delivery paths, and increased
parasitics compared to 2-D integration. In this work, we provide an in-depth study of power and thermal
issues while incorporating the physical design characteristics unique to 3-D integration. We provide a
qualitative perspective of the power and thermal dissipation issues in 3-D and study the impact of
Through Silicon Vias (TSVs) size for their mitigation. We investigate and discuss the design implications
of power and thermal issues in the presence of decoupling capacitors, TSV/on-die/package parasitics,
various resonance effects and power gating. Our study is based on a ten-tier system utilizing existing 3-D
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
technology specifications. Based on detailed power distribution and heat dissipation models, we present a
comprehensive analysis of TSV tapering for alleviating power and thermal integrity issues in 3-D ICs.
ETPL
VLSI-036
Improved Trace Buffer Observation via Selective Data Capture Using 2-D Compaction
for Post-Silicon Debug
Abstract: This paper presents a novel technique for extending the capacity of trace buffers when capturing
debug data during post-silicon debug. It exploits the fact that is it not necessary to capture error-free data
in the trace buffer since that information can be obtained from simulation. A selective data capture
method is proposed in this paper that only captures debug data during clock cycles in which errors are
present. The proposed debug method requires only three debug sessions. The first session estimates a
rough error rate, the second session identifies a set of suspect clock cycles where errors may be present,
and the third session captures the suspect clock cycles in the trace buffer. The suspect clock cycles are
determined through a 2-D compaction technique using multiple-input signature register signatures and
cycling register signatures. Intersecting both signatures generates a small number of suspect clock cycles
for which the trace buffer needs to capture. The effective observation window of the trace buffer can be
expanded significantly, by up to orders of magnitude. Experimental results indicate very significant
increases in the effective observation window for a trace buffer can be obtained.
ETPL
VLSI-037
AC-Plus Scan Methodology for Small Delay Testing and Characterization
Abstract: Small delay defects escaping traditional delay testing could cause a device to malfunction in the
field and thus detecting these defects is often necessary. To address this issue, we propose three test
modes in a new methodology called AC-plus scan, in which versatile test clocks can be generated on the
chip by embedding an all-digital phase-locked loop (ADPLL) into the circuit under test (CUT). AC-plus
scan can be executed on an in-house wireless test platform called HOY system. The first test mode of our
AC-plus scan provides a more efficient way to measure the longest path delay associated with each test
pattern. Experimental result shows that our method could greatly reduce the test time by 81.8%. The
second test mode is designed for volume production test. It could effectively detect small delay defects
and provide fast characterization on those defective chips for further processing. This mode could be used
to help predict which chips are more likely to fall victim to operational failure in the field. The third test
mode is to extract the waveform of each flip-flop's output in a real chip. This is made possible by taking
advantage of the almost unlimited test memory our HOY test platform provides, so that we could easily
store a great volume of data and reconstruct the waveform for post-silicon debugging. We have
successfully fabricated a Viterbi decoder chip with such an AC-plus scan methodology inside to
demonstrate its capability.
ETPL
VLSI-038
A Variation Tolerant Current-Mode Signaling Scheme for On-Chip Interconnects
Abstract: Current-mode signaling (CMS) with dynamic overdriving is one of the most promising scheme
for high-speed low-power communication over long on-chip interconnects. However, they are sensitive to
parameter variations due to reduced voltage swings on the line. In this paper, we propose a variation
tolerant dynamic overdriving CMS scheme. The proposed CMS scheme and a competing CMS scheme
(CMS-Fb) are fabricated in 180-nm CMOS technology. Measurement results show that the proposed
scheme offers 34% reduction in energy/bit and 42% reduction in energy-delay-product over CMS-Fb
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
scheme for a 10 mm line operating at 0.64 Gbps of data rate. Simulations indicate that the proposed CMS
scheme consumes 0.297 pJ/bit for data transfer over the 10 mm line at 2.63 Gb/s. Measurements indicate
that the delay of CMS-Fb becomes 2.5 times its nominal value in the presence of intra-die variations
whereas the delay of the proposed scheme changes by only 5% for the same amount of intra-die
variations. Measurement and simulation results show that both the schemes are robust against inter-die
variations. Experiments and simulations also indicate that the proposed CMS scheme is more robust
against practical variations in supply and temperature as compared to CMS-Fb scheme.
ETPL
VLSI-039
Modeling and Analysis of Power Distribution Networks in 3-D ICs
Abstract: This paper addresses the modeling and analysis problems for power distribution networks
(PDNs) in 3-D ICs. An on-chip distributed model is proposed for 3-D power grids, in which the details of
metal layers are considered. The distributed model is demonstrated to be essential to identifying the
unique noise behavior of 3-D PDNs. A lumped model is proposed based on the distributed model. The
lumped model features the connection impedance between tiers and is proven to be useful for designers to
understand the global effects of 3-D PDNs. Based on the models, an analysis flow is designed for 3-D
PDNs in both frequency domain and time domain. With the analysis flow, the electrical characteristics of
3-D PDNs are studied systematically for the first time. The frequency-domain analysis identifies the
global and local resonance phenomena in 3-D PDNs that are distinct from those in 2-D PDNs. The
physical mechanisms behind the resonance phenomena are investigated. The time-domain analysis
predicts the worst-case supply noise based on distributed current constraints. The “Rogue Wave” concept
is introduced to explain the spatial and temporal relations of the worst-case on-chip noise responses in 3-
D PDNs.
ETPL
VLSI-040
A Low-Cost, Systematic Methodology for Soft Error Robustness of Logic Circuits
Abstract: Due to current technology scaling trends such as shrinking feature sizes and decreasing supply
voltages, circuit reliability is becoming more susceptible to radiation-induced transient faults (soft errors).
Soft errors, which have been a great concern in memories, are now a main factor in reliability degradation
of logic circuits as well. In this paper, we present a systematic and integrated methodology for circuit
robustness to soft errors. The proposed soft error rate (SER) reduction framework, based on redundancy
addition and removal (RAR), aims at eliminating those gates with large contribution to the overall SER.
Several metrics and constraints are introduced to guide the RAR-based approach toward SER reduction.
Furthermore, we integrate a resizing strategy into our framework, as post-RAR additive SER
optimization. The strategy can identify most critical gates to be upsized and thereby, minimize area and
power overheads while maintaining a high level of soft error robustness. Experimental results show that
the proposed RAR-based framework can achieve up to 70% reduction in output failure probability. On
average, about 23% SER reduction is obtained with less than 4% area overhead.
ETPL
VLSI-041
Low Complexity Out-of-Order Issue Logic Using Static Circuits
Abstract: In this paper a single-cycle issue queue circuit architecture that simplifies the wakeup and
selection logic is proposed. The micro-architecture and fully static CMOS circuits are presented for a 32-
entry queue that issues four instructions per cycle. The instruction-ready signals are divided into groups
and processed in parallel to issue the four oldest ready instructions. The complete issue queue and
prioritization logic requires 20 inversions, allowing simulated circuit operation at over 4 GHz in a foundry
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
45 nm SOI fabrication process.
ETPL
VLSI-042
Low Latency Systolic Montgomery Multiplier for Finite Field GF(2^{m}) Based on
Pentanomials
Abstract: In this paper, we present a low latency systolic Montgomery multiplier over GF(2m) based on
irreducible pentanomials. An efficient algorithm is presented to decompose the multiplication into a
number of independent units to facilitate parallel processing. Besides, a novel so-called “pre-computed
addition” technique is introduced to further reduce the latency. The proposed design involves significantly
less area-delay and power-delay complexities compared with the best of the existing designs. It has the
same or shorter critical-path and involves nearly one-fourth of the latency of the other in case of the
National Institute of Standards and Technology recommended irreducible pentanomials.
ETPL
VLSI-043
Power-Up Sequence Control for MTCMOS Designs
Abstract: Power gating is effective for reducing standby leakage power as multi-threshold CMOS
(MTCMOS) designs have become popular in the industry. However, a large inrush current and dynamic
IR drop may occur when a circuit domain is powered up with MTCMOS switches. This could in turn lead
to improper circuit operation. We propose a novel framework for generating a proper power-up sequence
of the switches to control the inrush current of a power-gated domain while minimizing the power-up
time and reducing the dynamic IR drop of the active domains. We also propose a configurable domino-
delay circuit for implementing the sequence. Experimental results based on state-of-the-art industrial
designs demonstrate the effectiveness of the proposed framework in limiting the inrush current,
minimizing the power-up time, and reducing the dynamic IR drop. Results further confirm the efficiency
of the framework in handling large-scale designs with more than 80 K power switches and 100 M
transistors.
ETPL
VLSI-044
Architecture and Design Flow for a Highly Efficient Structured ASIC
Abstract: As fabrication process technology continues to advance, mask set costs have become
prohibitively expensive. Structured application specific integrated circuits (sASICs) offer a middle ground
in price and performance between ASICs and field-programmable gate arrays (FPGAs) by sharing masks
across different designs. In this paper, two sASIC architectures are proposed, the first being based on
three-input lookup-tables, and the second on AOI22 gates. The sASICs are programmed using a standard-
cell compatible design flow. They are customized using a minimum of three masks, i.e., two metals and
one via. The area and delay of the sASIC are compared with ASICs and FPGAs. Results over a set of
benchmark circuits show that our AOI22-based sASIC had an average of 1.76x/1.41x increase in
area/delay compared to ASICs, a considerable improvement compared with the 26.56x/5.09x increase for
FPGAs. This is, to the best of our knowledge, the best performance reported in the literature for a
practical sASIC. A prototype using the sASIC was fabricated using a universal machine control 0.13-μm
mixed-mode/RF process. It was fully verified using scan and functional tests, and used in a demonstration
system.
ETPL
VLSI-045
Secure Dual-Core Cryptoprocessor for Pairings Over Barreto-Naehrig Curves on
FPGA Platform,
Abstract: This paper is devoted to the design and the physical security of a parallel dual-core flexible
cryptoprocessor for computing pairings over Barreto-Naehrig (BN) curves. The proposed design is
specifically optimized for field-programmable gate-array (FPGA) platforms. The design explores the in-
built features of an FPGA device for achieving an efficient cryptoprocessor for computing 128-bit secure
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
pairings. The work further pinpoints the vulnerability of those pairing computations against side-channel
attacks and demonstrates experimentally that power consumptions of such devices can be used to attack
these ciphers. Finally, we suggest a suitable countermeasure to overcome the respective weaknesses. The
proposed secure cryptoprocessor needs 1 730 000, 1 206 000, and 821 000 cycles for the computation of
Tate, ate, and optimal-ate pairings, respectively. The implementation results on a Virtex-6 FPGA device
shows that it consumes 23 k Slices and computes the respective pairings in 11.93, 8.32, and 5.66 ms.
ETPL
VLSI-046
In-Situ Method for TSV Delay Testing and Characterization Using Input Sensitivity
Analysis
Abstract: In this paper, we propose a method and the required architecture for characterizing the
propagation delays of the through Silicon vias (TSVs) in a 3-D IC. First of all, every two TSVs are paired
up to form an oscillation ring with some peripheral circuits. Their joint performance can thus be measured
roughly by the oscillation period of the ring. Next, we utilize a technique called sensitivity analysis to
further derive the propagation delay of each individual TSV participating in an oscillation ring-a distilling
process. In this process, we perturb the strength of the two TSV drivers, and then measure their effects in
terms of the change of the oscillation ring's period. By some following analysis, the propagation delay of
each TSV can be revealed. On top of scheme, we also present an architecture that can activate the
performance characterization process of each test unit - that consists of two TSVs - one at a time in a
proper sequence. The area overhead is only 18.97 equivalent two-input NAND gate per TSV, by which
one can gain the ability to profile the capacitances and the propagation delays of the TSVs on a 3-D IC.
ETPL
VLSI-047
Low-Resolution DAC-Driven Linearity Testing of Higher Resolution ADCs Using
Polynomial Fitting Measurements
Abstract: A low-cost linearity test methodology for high-resolution analog-to-digital converters (ADCs) is
presented in this paper. Linearity testing of ADCs requires high-precision digital-to-analog conversion
(DAC) capability, commonly 3-bit higher resolution than the ADC under test. Further, a large number of
ADC output data samples must be collected making conventional histogram testing impractical for high-
resolution ADCs with 18-24 bit precision. In the proposed test methodology, two low-precision and low-
cost DACs are used to generate a high-resolution ADC test stimulus. Significant reductions in test cost
and test time are achieved by using low-cost instrumentation and by making fewer measurements than
required for conventional histogram test. A least-squares-based polynomial fitting approach is used to
determine the transfer function of the ADC under test. The generated transfer function is used to compute
the non-linearity of the ADC accurately. No assumption is made regarding the linearity of the lower
precision signal generators (DACs) used in the testing procedure. Software simulations and hardware
experiments are performed to validate the proposed test methodology
ETPL
VLSI-048
Low-Cost Error Tolerance Scheme for 3-D CMOS Imagers
Abstract: This paper presents an error tolerance scheme for 3-D CMOS imagers that are constructed by
stacking a pixel array of imager sensors, an analog-to-digital converter (ADC) array, and an image signal
processor (ISP) array using microbumps (μbumps) and through silicon vias (TSVs). To deliver high-
quality images in the presence of single or multiple μbump, ADC, or TSV failures, we propose to
interleave the connections from pixels to ADCs and recover the corrupted data in the ISPs. Key design
parameters, such as the interleaving stride and the grouping ratio are determined by analyzing the
employed error correction algorithm. Architectural simulation results demonstrate that the error tolerance
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
scheme enhances the effective yield of an exemplar 3-D imager from 44% to 97%.
ETPL
VLSI-049
Computing Two-Pattern Test Cubes for Transition Path Delay Faults
Abstract: Considering full-scan circuits, incompletely-specified tests, or test cubes, are used for test data
compression. When considering path delay faults, certain specified input values in a test cube are needed
only for determining the lengths of the paths associated with detected faults. Path delay faults, and
therefore, small delay defects, would still be detected if such values are unspecified. The goal of this
paper is to explore the possibility of increasing the number of unspecified input values in a test set for
path delay faults by unspecifying such values in order to make the test set more amenable to test data
compression. Experimental results indicate that significant numbers of such values exist. The proposed
procedure unspecifies them gradually to obtain a series of test sets with increasing numbers of unspecified
values and decreasing path lengths. Experimental results also indicate that filling the unspecified values
randomly (as with some test data compression methods) recovers some or all of the path lengths
associated with detected path delay faults. The procedure uses a matching of the sets of detected faults for
the comparison of path lengths
ETPL
VLSI-050
Integrated Energy-Harvesting Photodiodes With Diffractive Storage Capacitance
Abstract: Integrating energy-harvesting photodiodes with logic and exploiting on-die interconnect
capacitance for energy storage can enable new, ultraminiaturized wireless systems. Unlike CMOS imager
pixels, the proposed photodiode designs utilize p-diffusion fingers and are implemented in a conventional
logic process. Also unlike specialized solar cell processes, the designs utilize the on-chip metal
interconnect to form a diffraction grating above the p-diffusion fingers which also provides capacitive
energy storage. To explore the tradeoffs between optical efficiency and energy storage for integrated
photodiodes, an array of photovoltaics with various diffractive storage capacitors was designed in a 90-
nm CMOS logic process. The diffractive effects can be exploited to increase the photodiodes' response to
off-axis illumination. Transient effects from interfacing the photodiodes with switched-capacitor DC-DC
converters were examined, with measurements indicating a 50% reduction in the output voltage ripple
due to the diffractive storage capacitance. A quantitative comparison between 90-nm and 0.35-μm CMOS
logic processes for energy-harvesting capabilities was carried out. Measurements show an increase in
power generation for the newer CMOS technology, however at the cost of reduced output voltage. One
potential application for the integrated photodiodes is harvesting energy for a subdermal biomedical
device.
ETPL
VLSI-051
Fast Fixed-Outline 3-D IC Floorplanning With TSV Co-Placement
Abstract: Through-silicon vias (TSVs) are used to connect inter-die signals in a 3-D IC. Unlike
conventional vias, TSVs occupy device area and are very large compared to logic gates. However, most
previous 3-D floorplanners only view TSVs as points. As a result, whitespace redistribution is necessary
for TSV insertion after the initial floorplan is computed, which leads to suboptimal layouts. In this paper,
we propose a very efficient 3-D floorplanner to simultaneously floorplan the functional modules and
place the TSVs and to optimize the total wirelength under fixed-outline constraint. Compared to the state-
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
of-the-art 3-D floorplanner with TSV planning, our design consistently produces better floorplans with
15% shorter wirelength and 31% fewer TSVs on average. Our algorithm is extremely fast and only takes
a few seconds to floorplan benchmarks with hundreds of modules compared to hours as required by the
previous state-of-the-art floorplanner.
ETPL
VLSI-052
Reactivation Noise Suppression With Sleep Signal Slew Rate Modulation in MTCMOS
Circuits
Abstract: Multi-threshold CMOS (MTCMOS) is commonly used for suppressing leakage currents in idle
integrated circuits. Power and ground distribution network noise produced during SLEEP to ACTIVE
mode transitions is an important reliability concern in MTCMOS circuits. Sleep signal slew rate
modulation techniques for suppressing mode-transition noise are explored in this paper. A triple-phase
sleep signal slew rate modulation (TPS) technique with a novel digital sleep signal generator is proposed.
Reactivation time, mode-transition energy consumption, leakage power consumption, and layout area of
different MTCMOS circuits are characterized under an equal-noise constraint. Influences of within-die
and die-to-die parameter variations on the reactivation noise, time, and energy consumption of sleep
signal slew rate modulated MTCMOS circuits are evaluated with a process imperfections aware
robustness metric. The proposed triple-phase sleep signal slew rate modulation technique enhances the
tolerance to process parameter fluctuations by up to 183.1× as compared to various alternative MTCMOS
noise suppression techniques in a UMC 80-nm CMOS technology.
ETPL
VLSI-053
Sub-mW LC Dual-Input Injection-Locked Oscillator for Autonomous WBSNs
Abstract: This paper presents a sub-mW, current-reused first-harmonic LC injection-locked oscillator
(ILO) using in-phase dual-input injection technique. It can be used as a power oscillator in the injection-
locked transmitter of wireless biomedical sensor nodes (WBSNs) integrated into a wireless body area
network. A prototype chip, implemented in a standard 0.13-μm CMOS process occupying 200 × 380 μm,
operates in the medical implantable communications service (MICS) band for medical implants.
Measurement results show that the proposed ILO features a wide locking range of 800 MHz (150-950
MHz) at an input power of 0 dBm. More importantly, it has a high input sensitivity of -30 dBm to lock
the 3-MHz bandwidth of the MICS band, while consuming only 660 μW at 1-V supply. This ultralow
power consumption enables autonomous WBSNs
ETPL
VLSI-054
Constant Delay Logic Style
Abstract: A constant delay (CD) logic style is proposed in this paper, targeting at full-custom high-speed
applications. The CD characteristic of this logic style regardless of the logic type makes it suitable in
implementing complicated logic expressions such as addition. CD logic exhibits a unique characteristic
where the output is pre-evaluated before the inputs from the preceding stage is ready. This feature offers
performance advantage over static and dynamic domino logic styles in a single-cycle multistage circuit
block. Several design considerations including timing window width adjustment and clock distribution
are discussed. Using 65-nm general-purpose CMOS technology, the proposed logic demonstrates an
average speedup of 94% and 56% over static and dynamic domino logic, respectively, in five different
logic gates. Simulation results of 8-bit ripple carry adders show that CD logic is 39% and 23% faster than
the static and dynamic-based adders, respectively. CD logic also demonstrates 39% speedup and 64%
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
(22%) energy-delay product (EDP) reduction from static logic at 100% (10%) data activity in 32-bit carry
lookahead adders. For 8-bit Wallace tree multiplier, CD logic achieves a similar speedup with at least
50% EDP reduction across all data activities.
ETPL
VLSI-055
A Compact Clock Generator for Heterogeneous GALS MPSoCs in 65-nm CMOS
Technology
Abstract: This paper presents an all-digital phase-locked loop (ADPLL) clock generator for globally
asynchronous locally synchronous (GALS) multiprocessor systems-on-chip (MPSoCs). With its low
power consumption of 2.7 mW and ultra small chip area of 0.0078 mm2 it can be instantiated per core for
fine-grained power management like DVFS. It is based on an ADPLL providing a multiphase clock
signal from which core frequencies from 83 to 666 MHz with 50% duty cycle are generated by phase
rotation and frequency division. The clock meets the specification for DDR2/DDR3 memory interfaces.
Additionally, it provides a dedicated high-speed clock up to 4 GHz for serial network-on-chip data links.
Core frequencies can be changed arbitrarily within one clock cycle for fast dynamic frequency scaling
applications. The performance including statistical analysis of mismatch has been verified by a prototype
in 65-nm CMOS technology.
ETPL
VLSI-056
A Colpitts CMOS Quadrature VCO Using Direct Connection of Substrates for
Coupling
Abstract: A new low-phase noise low-power quadrature voltage-controlled oscillator (QVCO) using
differential Colpitts oscillator is presented. The proposed QVCO is composed of two identical current-
switching differential Colpitts VCOs in which the first core VCO is coupled to the second in an in-phase
manner, and the second core VCO is coupled to the first in an anti-phase manner. To couple the two core
VCOs, the substrates of the cross-connected transistors as well as the substrates of MOS varactors are
used; alleviating the need for any extra elements for coupling, which could add noise and increase power
dissipation. A linear (sinusoidal) analysis is presented that confirms that the proposed circuit generates
quadrature waveforms. The proposed coupling technique can be generalized to N differential Colpitts
VCOs for multiphase signals generation
ETPL
VLSI-057
A Self-Calibrated DLL-Based Clock Generator for an Energy-Aware EISC Processor
Abstract: This paper describes a low-jitter delay-locked loop (DLL)-based clock generator for dynamic
frequency scaling in the extendable instruction set computing (EISC) processor. The DLL-based clock
generator provides the system clock with frequencies of 0.5× to 8× of the reference clock, according to
the workload of the EISC processor. The proposed analog self-calibration method and a phase detector
with an auxiliary charge pump can effectively reduce the delay mismatch between delay cells in the
voltage-controlled delay line and the static phase offset due to the current mismatch in the charge pump,
respectively. The self-calibrated output waveform exhibits 9.7 ps of RMS jitter and 73.7 ps of peak-to-
peak jitter at 120 MHz. The prototype clock generator implemented in a 0.18-μm CMOS process
occupies an active area of 0.27 mm2 and consumes 15.56 mA
ETPL
VLSI-058
Clamping Virtual Supply Voltage of Power-Gated Circuits for Active Leakage
Reduction and Gate-Oxide Reliability
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
Abstract: In an integrated circuit (IC) adopting a power-gating (PG) technique, the virtual supply voltage
(VVDD) is susceptible to: 1) negative-bias temperature instability (NBTI) degradation that weakens the
PG device over time and 2) temporal temperature variation that affects active leakage current (thus total
current) of the IC. The PG device is sized to guarantee a minimum VVDD level over the chip lifetime.
Thus, the NBTI degradation and the worst-case total current at high-temperature must be considered for
sizing the PG device. This leads to higher VVDD (thus active leakage power) than necessary in early chip
lifetime and/or at low temperature, negatively impacting the gate-oxide reliability of transistors. To
reduce active leakage power increase and improve the gate-oxide reliability due to these effects, we
propose two techniques that adjust the strength of a PG device based on its usage and IC's temperature at
runtime. We demonstrate the efficacy of these techniques with an experimental setup using a 32-nm
technology model in the presence of within-die spatial process and temperature variations. On an average
of 100 die samples, they can reduce dynamic and active leakage power by up to 3.7% and 10% in early
chip lifetime. Finally, these techniques also reduce the oxide failure rate by up to 5% across process
corners over a period of 7 years.
ETPL
VLSI-059
10-bit 30-MS/s SAR ADC Using a Switchback Switching Method
Abstract: This brief presents a 10-bit 30-MS/s successive-approximation-register analog-to-digital
converter (ADC) that uses a power efficient switchback switching method. With respect to the monotonic
switching method, the input common-mode voltage variation reduces which improves the dynamic offset
and the parasitic capacitance variation of the comparator. The proposed switchback switching method
does not consume any power at the first digital-to-analog converter switching, which can reduce the
power consumption and design effort of the reference buffer. The prototype was fabricated in a 90-nm
1P9M CMOS technology. At 1-V supply and 30 MS/s, the ADC achieves an sequenced neighbor double
reservation of 56.89 dB and consumes 0.98 mW, resulting in a figure-of-merit (FOM) of 57
fJ/conversion-step. The ADC core occupies an active area of only 190 × 525 μm2.
ETPL
VLSI-060
Spur-Reduction Frequency Synthesizer Exploiting Randomly Selected PFD
Abstract: This brief presents a low-spur phase-locked loop (PLL) system for wireless applications. The
low-spur frequency synthesizer randomizes the periodic ripples on the control voltage of the voltage-
controlled oscillator to reduce the reference spur at the output of the PLL. A novel random clock
generator is presented to perform the random selection of the phase frequency detector control for the
charge pump in locked state. The proposed frequency synthesizer was fabricated in a TSMC 0.18-μm
CMOS process. The proposed PLL achieved phase noise of -93 dBc/Hz with a 600-kHz offset frequency
and reference spurs below -72 dBc.
ETPL
VLSI-061
Gain-Enhanced Monolithic Charge Pump With Simultaneous Dynamic Gate and
Substrate Control
Abstract: This brief presents a gain-enhanced complimentary metal-oxide-semiconductor (CMOS) charge
pump (CP) circuit via dynamically controlling the gate and substrate terminals of each pMOS pass
transistor. The proposed control strategy enables the CP circuit free of the threshold-voltage drops, the
body effect, and the floating substrate terminals of pass devices. The on-resistance of each pass device is
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
also reduced to improve the gain and the power efficiency of the CP circuit. Implemented in a 0.35-μm
single n-well CMOS process, the proposed four-stage monolithic CP circuit can operate with a supply
voltage down to 0.9 V and deliver a maximum output current of about 100 μA. The proposed CP circuit
also achieves a high voltage gain of 4 with two complementary-phase nonoverlapping clock signals.
ETPL
VLSI-062
Embedding Repeaters in Silicon IPs for Cross-IP Interconnections
Abstract: During systems-on-a-chip (SoC) integration, silicon intellectual properties (IPs) are generally
regarded as blockages to long interconnections that connect different IPs. With this constraint,
conventional designs are forced to place those repeaters that drive long interconnections outside the IP.
These designs either lead to a longer interconnection distance requiring more repeaters or result in a
longer signal delay, since the interconnection wire is not appropriately segmented by the repeaters. To
solve these problems, we designed the IPs such that designers can embed the repeaters in the IP for the
SoC integration. In other words, it allows the cross-IP interconnections to be routed over the IP using
repeaters inserted in the IP. The design concept, physical implementation, and application examples of the
embedded repeaters are described in this brief
ETPL
VLSI-063
RATS: Restoration-Aware Trace Signal Selection for Post-Silicon Validation
Abstract: Post-silicon validation is one of the most important and expensive tasks in modern integrated
circuit design methodology. The primary problem governing post-silicon validation is the limited
observability due to storage of a small number of signals in a trace buffer. The signals to be traced should
be carefully selected in order to maximize restoration of the remaining signals. Existing approaches have
two major drawbacks. They depend on partial restorability computations that are not effective in restoring
maximum signal states. They also require long signal selection time due to inefficient computation as well
as operating on gate-level netlist. We have proposed a signal selection approach based on total
restorability at gate-level, which is computationally more efficient (10 times faster) and can restore up to
three times more signals compared to existing methods. We have also developed a register transfer level
signal selection approach, which reduces both memory requirements and signal selection time by several
orders-of-magnitude.
ETPL
VLSI-064
Test Patterns of Multiple SIC Vectors: Theory and Application in BIST Schemes
Abstract: This paper proposes a novel test pattern generator (TPG) for built-in self-test. Our method
generates multiple single-input change (MSIC) vectors in a pattern, i.e., each vector applied to a scan
chain is an SIC vector. A reconfigurable Johnson counter and a scalable SIC counter are developed to
generate a class of minimum transition sequences. The proposed TPG is flexible to both the test-per-clock
and the test-per-scan schemes. A theory is also developed to represent and analyze the sequences and to
extract a class of MSIC sequences. Analysis results show that the produced MSIC sequences have the
favorable features of uniform distribution and low input transition density. The performances of the
designed TPGs and the circuits under test with 45 nm are evaluated. Simulation results with ISCAS
benchmarks demonstrate that MSIC can save test power and impose no more than 7.5% overhead for a
scan design. It also achieves the target fault coverage without increasing the test length.
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL
VLSI-065
Effective and Efficient Approach for Power Reduction by Using Multi-Bit Flip-Flops
Abstract: Power has become a burning issue in modern VLSI design. In modern integrated circuits, the
power consumed by clocking gradually takes a dominant part. Given a design, we can reduce its power
consumption by replacing some flip-flops with fewer multi-bit flip-flops. However, this procedure may
affect the performance of the original circuit. Hence, the flip-flop replacement without timing and
placement capacity constraints violation becomes a quite complex problem. To deal with the difficulty
efficiently, we have proposed several techniques. First, we perform a co-ordinate transformation to
identify those flip-flops that can be merged and their legal regions. Besides, we show how to build a
combination table to enumerate possible combinations of flip-flops provided by a library. Finally, we use
a hierarchical way to merge flip-flops. Besides power reduction, the objective of minimizing the total
wirelength is also considered. The time complexity of our algorithm is $Theta({rm n}^{1.12})$ less than
the empirical complexity of $Theta({rm n}^{2})$. According to the experimental results, our algorithm
significantly reduces clock power by 20–30% and the running time is very short. In the largest test case,
which contains 1 700 000 flip-flops, our algorithm only takes about 5 min to replace flip-flops and the
power reduction can achieve 21%.
ETPL
VLSI-066
Reconfigurable Accelerator for the Word-Matching Stage of BLASTN
Abstract: BLAST is one of the most popular sequence analysis tools used by molecular biologists. It is
designed to efficiently find similar regions between two sequences that have biological significance.
However, because the size of genomic databases is growing rapidly, the computation time of BLAST,
when performing a complete genomic database search, is continuously increasing. Thus, there is a clear
need to accelerate this process. In this paper, we present a new approach for genomic sequence database
scanning utilizing reconfigurable field programmable gate array (FPGA)-based hardware. In order to
derive an efficient structure for BLASTN, we propose a reconfigurable architecture to accelerate the
computation of the word-matching stage. The experimental results show that the FPGA implementation
achieves a speedup around one order of magnitude compared to the NCBI BLASTN software running on
a general purpose computer.
ETPL
VLSI-067
Architecturally Homogeneous Power-Performance Heterogeneous Multicore Systems
Abstract: Dynamic voltage and frequency scaling (DVFS), a widely adopted technique to ensure safe
thermal characteristics while delivering superior energy efficiency, is rapidly becoming inefficient with
technology scaling due to two critical factors: 1) inability to scale the supply voltage due to reliability
concerns and 2) dynamic adaptations through DVFS cannot alter underlying power hungry circuit
characteristics, designed for the nominal frequency. In this paper, we show that DVFS scaled circuits
substantially lag in energy efficiency, by 22%–86%, compared to ground up designs for target frequency
levels. We propose architecturally homogeneous power-performance heterogeneous multicore systems, a
fundamentally alternate means to design energy efficient multicore systems. Using a system level
computer-aided design (CAD) approach, we seamlessly integrate architecturally identical cores, designed
for different voltage-frequency domains. We use a combination of standard cell library based CAD flow
and full system architectural simulation to demonstrate 11%–22% improvement in energy efficiency
using our design paradigm.
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL
VLSI-068
Active Filter-Based Hybrid On-Chip DC–DC Converter for Point-of-Load Voltage
Regulation
Abstract: An active filter-based on-chip DC–DC voltage converter for application to distributed on-chip
power supplies in multivoltage systems is described in this paper. No inductor or output capacitor is
required in the proposed converter. The area of the voltage converter is therefore significantly less than
that of a conventional low-dropout (LDO) regulator. Hence, the proposed circuit is appropriate for point-
of-load voltage regulation for noise sensitive portions of an integrated circuit. The performance of the
circuit has been verified with Cadence Spectre simulations and fabricated with a commercial 110 nm
complimentary metal oxide semiconductor (CMOS) technology. The area of the voltage regulator is
0.015 ${rm mm}^{2}$ and delivers up to 80 mA of output current. The transient response with no output
capacitor ranges from 72 to 192 ns. The parameter sensitivity of the active filter is also described. The
advantages and disadvantages of the active filter-based, conventional switching, linear, and switched
capacitor voltage converters are compared. The proposed circuit is an alternative to classical LDO voltage
regulators, providing a means for distributing multiple local power supplies across an integrated circuit
while maintaining high current efficiency and fast response time within a small area.
ETPL
VLSI-069
CusNoC: Fast Full-Chip Custom NoC Generation
Abstract: We propose a full-chip synthesis methodology to construct custom network-on-chips
(CusNoCs) for NoC-based systems. The proposed scheme generates irregular network topologies for
application-specific designs with known communication demands. In this method, processors and the
communication architecture can be synthesized simultaneously in the floorplanning process, and thus it is
called CusNoC. CusNoC synthesizes CusNoC in two steps. The target network topology is first generated
based on communication analysis. Processing elements are partitioned into groups such that the utility of
routers will be maximized if a router is assigned to each group. In this way, the number of routers passed
by a packet, or hops, is minimized, and so is the power consumption in the network. The final network
topology is formed by properly connecting these groups. A wirelength-aware floor planning is then
carried out to optimize circuit size as well as wirelength. Experimental results show that CusNoC
produces custom NoCs with better performance than previous methods while the computation time is
significantly shorter. This method is also more scalable, which makes it ideal for complicated systems.
ETPL
VLSI-070
Cooperating Virtual Memory and Write Buffer Management for Flash-Based Storage
Systems
Abstract: Flash memory is becoming the preferred choice of secondary storage in mobile devices and
embedded systems. The performance of Flash memory is dictated by asymmetric speeds of read and
write, limited number of erase times, and the absence of in-place updates. To improve the performance of
Flash-based storage systems, the write buffer has been provided in Flash memories recently. At the same
time, new virtual memory management strategies have been proposed in recent studies that consider the
characteristics of Flash memory. Currently, approaches on these two memory layers are considered
separately, which fail to explore the full potential of these two layers. In this paper, we propose
cooperative management schemes for virtual memory and write buffer to maximize the performance of
Flash-memory-based systems. Management on virtual memory is designed to exploit write buffer status
via reordering of the write sequences. The proposed write buffer management scheme works seamlessly
with the proposed virtual memory management scheme. Experimental results show that significant
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
improvement in I/O performance and reduction of the number of erase and write operations can be
achieved compared to the state-of-art approaches.
ETPL
VLSI-071
MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM Systems
Abstract: This paper presents an multipath delay commutator (MDC)-based architecture and memory
scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output-
orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length. Based on the
MDC architecture, we propose to use radix-$N_{s}$ butterflies at each stage, where $N_{s}$ is the
number of data streams, so that there is only one butterfly needed in each stage. Consequently, a 100%
utilization rate in computational elements is achieved. Moreover, thanks to the simple control mechanism
of the MDC, we propose simple memory scheduling methods for input data and output bit/set-reversing,
which again results in a full utilization rate in memory usage. Since the memory requirements usually
dominate the die area of FFT/inverse fast Fourier transform (IFFT) processors, the proposed scheme can
effectively reduce the memory size and thus the die area as well. Furthermore, to apply the proposed
scheme in practical applications, we let $N_{s}=4$ and implement a 4-stream FFT/IFFT processor with
variable length including 2048, 1024, 512, and 128 for MIMO-OFDM systems. This processor can be
used in IEEE 802.16 WiMAX and 3GPP long term evolution applications. The processor was
implemented with an UMC 90-nm CMOS technology with a core area of 3.1 ${rm mm}^{2}$. The
power consumption at 40 MHz was 63.72/62.92/57.51/51.69 mW for 2048/1024/512/128-FFT,
respectively in the post-layout simulation. Finally, we analyze the complexity and performance of the
implemented processor and compare it with other processors. The results show advantages of the
proposed scheme in terms of area and power consumption.
ETPL
VLSI-072
Current-Reused 2.4-GHz Direct-Modulation Transmitter With On-Chip Automatic
Tuning
Abstract: This paper presents the design, analysis, and experimental verification of a self-calibrating
current-reused 2.4-GHz direct-modulation transmitter for short-range wireless applications. The key
contributions are the design/analysis of a stacked power amplifier (PA)/voltage-controlled oscillator
(VCO) architecture, the nonlinear frequency-dependent analysis of a Gilbert-cell-based root-mean-square
detector, and an on-chip $LC$-tank calibration circuit that needs no analog-to-digital convertor
(ADC)/digital signal processor. The stacked architecture reduces the number of required regulators,
utilizes supply headroom effectively, and allows for an “ADC-less” calibration loop that can dynamically
tune the PA center frequency by sensing the transmitted signal. The very nature of direct-modulation
architecture obviates additional high-purity signal generators, reducing complexity and allowing online
calibration. The system was implemented in TSMC 0.18 $mu{rm m}$ CMOS, occupies 0.7 ${rm
mm}^{2}~({rm TX})+0.1~{rm mm}^{2}$ (self-tuning), and was measured in a QFN48 package on an
FR4 PCB. Automatically correcting PA/VCO tank misalignment in this case yielded ${>}{rm 4}~{rm
dB}$ increase in output power. With the automatic tuning active, the transmitter delivers a measured
output power ${>}{rm 0}~{rm dBm}$ to a 100-$Omega$ differential load, and the system consumes
22.9 mA from a 1.8-V core-circuit supply.
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL
VLSI-073
Reconfigurable Adaptive Singular Value Decomposition Engine Design for High-
Throughput MIMO-OFDM Systems
Abstract: Singular value decomposition (SVD) is an optimal method to obtain spatial multiplexing gain in
multi-input multi-output (MIMO) channels. However, the high cost of implementation and high
decomposing latency of the SVD restricts its usage in current wireless communication applications. In
this paper, we present a complete adaptive SVD algorithm and a reconfigurable architecture for high-
throughput MIMO-orthogonal frequency division multiplexing systems. There are several proposed
architectural design techniques: reconfigurable scheme, division-free adaptive step size scheme, early
termination scheme, and data interleaving scheme. The reconfigurable scheme can support all antenna
configurations in a MIMO system. The division-free adaptive step size and early termination schemes are
used to effectively reduce the decomposing latency and improve hardware utilization. The data
interleaving scheme helps to deal with several channel matrices concurrently. Besides, we propose an
orthogonal reconstruction scheme to obtain more accurate SVD outputs, and then the system performance
will be greatly enhanced. We apply our SVD design to the IEEE 802.11 n applications. This design is
implemented and fabricated in UMC 90 nm 1P9M CMOS technology. The maximum operating
frequency is measured to be at 101.2 MHz, and the corresponding power dissipation is at 125 mW. The
core size is 2.17 ${rm mm}^{2}$ and the die size occupies 4.93 ${rm mm}^{2}$. The chip result shows
that the average latency is only 0.33% of the wireless local area network coherence time. Hence, the
proposed reconfigurable adaptive SVD engine design is very suitable for high-throughput wireless
communication applications.
ETPL
VLSI-074
The LUT-SR Family of Uniform Random Number Generators for FPGA Architectures
Abstract: Field-programmable gate array (FPGA) optimized random number generators (RNGs) are more
resource-efficient than software-optimized RNGs because they can take advantage of bitwise operations
and FPGA-specific features. However, it is difficult to concisely describe FPGA-optimized RNGs, so
they are not commonly used in real-world designs. This paper describes a type of FPGA RNG called a
LUT-SR RNG, which takes advantage of bitwise xor operations and the ability to turn lookup tables
(LUTs) into shift registers of varying lengths. This provides a good resource–quality balance compared to
previous FPGA-optimized generators, between the previous high-resource high-period LUT-FIFO RNGs
and low-resource low-quality LUT-OPT RNGs, with quality comparable to the best software generators.
The LUT-SR generators can also be expressed using a simple C++ algorithm contained within this paper,
allowing 60 fully-specified LUT-SR RNGs with different characteristics to be embedded in this paper,
backed up by an online set of very high speed integrated circuit hardware description language (VHDL)
generators and test benches.
ETPL
VLSI-075
Exploring the Use of Emerging Nonvolatile Memory Technologies in Future FPGAs,
Abstract: As new nonvolatile memory technologies become increasingly mature, there has been a
growing interest on investigating their use in future field-programmable gate arrays (FPGAs). Similar to
existing FPGAs with embedded Flash memory, future FPGAs can embed these new nonvolatile
memories to persistently store configuration data. By comparing with prior work, we first propose the
more appropriate design style for new nonvolatile configuration data storage memory. Moreover, this
brief studies a dynamic random-access memory (DRAM)-based FPGA design strategy enabled by high-
Elysium Technologies Private Limited
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
density embedded nonvolatile memory. Existing FPGAs do not use on-chip DRAM cells for
configuration data storage mainly because DRAM self-refresh involves destructive DRAM read. This
problem can be solved, if we use embedded nonvolatile memory as primary FPGA configuration data
storage and externally refresh on-chip DRAM cells. Analysis and simulations have been carried out to
demonstrate the potential advantages of such a design strategy.
ETPL
VLSI-076
Broadside and Skewed-Load Tests Under Primary Input Constraints
Abstract: Tester limitations may impose certain constraints on the primary input vectors applicable as part
of a two-pattern test for delay faults. Under these constraints, the primary input vectors may be held
constant, or the second primary input vector of a test may be obtained by a single shift of a scan chain
relative to the first. The goal of this brief is to study the differences in achievable transition fault coverage
between various primary input constraints that are similar to the commonly used ones of holding or
shifting primary input vectors. This brief also studies the possibility of combining the constraints in order
to increase the transition fault coverage. The combination requires a fixed and circuit-independent
hardware structure similar to the case where shifting of primary input vectors is used. This study is done
using test sets that consist of both broadside and skewed-load tests in order to maximize the transition
fault coverage.
ETPL
VLSI-078
Supply Noise Suppression by Triple-Well Structure
Abstract: This brief discusses the impact of twin- and triple-well structures on power supply noise, and a
substrate model for simulating the power supply noise. We observed $V_{rm ss}$ noise reduction by the
resistive network of the p-substrate and $V_{rm dd}$ noise reduction by the junction capacitance of a
triple-well structure on a 90-nm test chip. Measurement results also showed that the total noise reduction
of a triple-well structure is superior to that of a twin-well structure. The measurement results correlate
well with the results obtained from the power supply noise simulation using a hierarchical resistive mesh
model. Our simulation-based verification indicates that in common CMOS design, a triple-well structure
can reduce the power supply drop by 10%–40% or the decoupling capacitance area by 5%–10%. We also
verified that supply drop sensitivity to variation of the well junction capacitance is sufficiently small and
that supply noise reduction using a triple-well structure is robust to process variation.
ETPL
VLSI-079
Software-Based Self Test Methodology for On-Line Testing of L1 Caches in
Multithreaded Multicore Architectures
Abstract: The flexibility that allows the application of different March tests is a critical requirement for
on-line testing of memory arrays. In a previous study, we have introduced a low-cost software-based self
test (SBST) program development methodology for on-line periodic testing of L1 caches that utilizes
direct cache access (DCA) instructions and exploits the native monitoring hardware available in modern
architectures. In this brief, we discuss a multithreaded optimization of this SBST methodology that
exploits the thread level parallelism of multithreaded multicore architectures in order to speed up March
test execution by elaborating the low level multiple sub-bank cache organization. The effectiveness of the
methodology and its multithreaded optimization is demonstrated on the L1 caches of OpenSPARC T1
processor. Our results showed a speedup of more than 1.7 when the multithreaded optimization is applied
and an acceptable performance overhead (less than 11%), even in intensive periodic test scenarios.
Final Year IEEE Project 2013-2014  - VLSI Project Title and Abstract
Final Year IEEE Project 2013-2014  - VLSI Project Title and Abstract
Final Year IEEE Project 2013-2014  - VLSI Project Title and Abstract
Final Year IEEE Project 2013-2014  - VLSI Project Title and Abstract
Final Year IEEE Project 2013-2014  - VLSI Project Title and Abstract
Final Year IEEE Project 2013-2014  - VLSI Project Title and Abstract

Contenu connexe

Dernier

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Dernier (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Final Year IEEE Project 2013-2014 - VLSI Project Title and Abstract

  • 1. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com 13 Years of Experience Automated Services 24/7 Help Desk Support Experience & Expertise Developers Advanced Technologies & Tools Legitimate Member of all Journals Having 1,50,000 Successive records in all Languages More than 12 Branches in Tamilnadu, Kerala & Karnataka. Ticketing & Appointment Systems. Individual Care for every Student. Around 250 Developers & 20 Researchers
  • 2. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com 227-230 Church Road, Anna Nagar, Madurai – 625020. 0452-4390702, 4392702, + 91-9944793398. info@elysiumtechnologies.com, elysiumtechnologies@gmail.com S.P.Towers, No.81 Valluvar Kottam High Road, Nungambakkam, Chennai - 600034. 044-42072702, +91-9600354638, chennai@elysiumtechnologies.com 15, III Floor, SI Towers, Melapudur main Road, Trichy – 620001. 0431-4002234, + 91-9790464324. trichy@elysiumtechnologies.com 577/4, DB Road, RS Puram, Opp to KFC, Coimbatore – 641002 0422- 4377758, +91-9677751577. coimbatore@elysiumtechnologies.com
  • 3. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com Plot No: 4, C Colony, P&T Extension, Perumal puram, Tirunelveli- 627007. 0462-2532104, +919677733255, tirunelveli@elysiumtechnologies.com 1st Floor, A.R.IT Park, Rasi Color Scan Building, Ramanathapuram - 623501. 04567-223225, +919677704922.ramnad@elysiumtechnologies.com 74, 2nd floor, K.V.K Complex,Upstairs Krishna Sweets, Mettur Road, Opp. Bus stand, Erode-638 011. 0424-4030055, +91- 9677748477 erode@elysiumtechnologies.com No: 88, First Floor, S.V.Patel Salai, Pondicherry – 605 001. 0413– 4200640 +91-9677704822 pondy@elysiumtechnologies.com TNHB A-Block, D.no.10, Opp: Hotel Ganesh Near Busstand. Salem – 636007, 0427-4042220, +91-9894444716. salem@elysiumtechnologies.com
  • 4. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com ETPL VLSI-001 Pragmatic Integration of an SRAM Row Cache in Heterogeneous 3-D DRAM Architecture Using TSV Abstract: As scaling DRAM cells becomes more challenging and energy-efficient DRAM chips are in high demand, the DRAM industry has started to undertake an alternative approach to address these looming issues-that is, to vertically stack DRAM dies with through-silicon-vias (TSVs) using 3-D-IC technology. Furthermore, this emerging integration technology also makes heterogeneous die stacking in one DRAM package possible. Such a heterogeneous DRAM chip provides a unique, promising opportunity for computer architects to contemplate a new memory hierarchy for future system design. In this paper, we study how to design such a heterogeneous DRAM chip for improving both performance and energy efficiency. In particular, we found that, if we want to design an SRAM row cache in a DRAM chip, simple stacking alone cannot address the majority of traditional SRAM row cache design issues. In this paper, to address these issues, we propose a novel floorplan and several architectural techniques that fully exploit the benefits of 3-D stacking technology. Our multi-core simulation results with memory- intensive applications suggest that, by tightly integrating a small row cache with its corresponding DRAM array, we can improve performance by 30% while saving dynamic energy by 31%. ETPL VLSI-002 A Low-Complexity Turbo Decoder Architecture for Energy-Efficient Wireless Sensor Networks Abstract: Turbo codes have recently been considered for energy-constrained wireless communication applications, since they facilitate a low transmission energy consumption. However, in order to reduce the overall energy consumption, lookup table-log-BCJR (LUT-Log-BCJR) architectures having a low processing energy consumption are required. In this paper, we decompose the LUT-Log-BCJR architecture into its most fundamental add compare select (ACS) operations and perform them using a novel low-complexity ACS unit. We demonstrate that our architecture employs an order of magnitude fewer gates than the most recent LUT-Log-BCJR architectures, facilitating a 71% energy consumption reduction. Compared to state-of-the-art maximum logarithmic Bahl-Cocke-Jelinek-Raviv implementations, our approach facilitates a 10% reduction in the overall energy consumption at ranges above 58 m. ETPL VLSI-003 Pipelined Radix- 2k Feedforward FFT Architectures Abstract: The appearance of radix-22 was a milestone in the design of pipelined FFT hardware architectures. Later, radix-22 was extended to radix-2k . However, radix-2k was only proposed for single- path delay feedback (SDF) architectures, but not for feedforward ones, also called multi-path delay commutator (MDC). This paper presents the radix-2k feedforward (MDC) FFT architectures. In feedforward architectures radix-2k can be used for any number of parallel samples which is a power of two. Furthermore, both decimation in frequency (DIF) and decimation in time (DIT) decompositions can be used. In addition to this, the designs can achieve very high throughputs, which makes them suitable for the most demanding applications. Indeed, the proposed radix-2k feedforward architectures require fewer hardware resources than parallel feedback ones, also called multi-path delay feedback (MDF), when several samples in parallel must be processed. As a result, the proposed radix-2k feedforward architectures not only offer an attractive solution for current applications, but also open up a new research line on feedforward structures.
  • 5. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com ETPL VLSI-004 Algorithm and Architecture Design of Bandwidth-Oriented Motion Estimation for Real-Time Mobile Video Applications Abstract: This paper proposes a data bandwidth-oriented motion estimation design for resource-limited mobile video applications using an integrated bandwidth rate distortion optimization framework. This framework predicts and allocates the appropriate data bandwidth for motion estimation under a limited bandwidth supply to fit a dynamically changing bandwidth supply. The simulation results show that our proposed algorithm can achieve 66% and 41% memory bandwidth savings while maintaining an equivalent rate-distortion performance and meeting real-time targets, when compared with conventional approaches for low-motion and high-motion D1 (704 ×  576)-size video, respectively. The final implementation costs 122 K gate counts with TSMC 0.13-μ m CMOS technology and consumes 74 mW of power for D1 resolution at 30 frames/s which is 40% of that achieved in previous designs. ETPL VLSI-005 STBC-OFDM Downlink Baseband Receiver for Mobile WMAN Abstract: This paper proposes a space time block code-orthogonal frequency division multiplexing downlink baseband receiver for mobile wireless metropolitan area network. The proposed baseband receiver applied in the system with two transmit antennas and one receive antenna aims to provide high performance in outdoor mobile environments. It provides a simple and robust synchronizer and an accurate but hardware affordable channel estimator to overcome the challenge of multipath fading channels. The coded bit error rate performance for 16 quadrature amplitude modulation can achieve less than 10-6 under the vehicle speed of 120 km/hr. The proposed baseband receiver designed in 90-nm CMOS technology can support up to 27.32 Mb/s uncoded data transmission under 10 MHz channel bandwidth. It requires a core area of 2.41 × 2.41 mm2 and dissipates 68.48 mW at 78.4 MHz with 1 V power supply. ETPL VLSI-006 Glitch-Free NAND-Based Digitally Controlled Delay-Lines Abstract: The recently proposed NAND-based digitally controlled delay-lines (DCDL) present a glitching problem which may limit their employ in many applications. This paper presents a glitch-free NAND- based DCDL which overcame this limitation by opening the employ of NAND-based DCDLs in a wide range of applications. The proposed NAND-based DCDL maintains the same resolution and minimum delay of previously proposed NAND-based DCDL. The theoretical demonstration of the glitch-free operation of proposed DCDL is also derived in the paper. Following this analysis, three driving circuits for the delay control-bits are also proposed. Proposed DCDLs have been designed in a 90-nm CMOS technology and compared, in this technology, to the state-of-the-art. Simulation results show that novel circuits result in the lowest resolution, with a little worsening of the minimum delay with respect to the previously proposed DCDL with the lowest delay. Simulations also confirm the correctness of developed glitching model and sizing strategy. As example application, proposed DCDL is used to realize an All- digital spread-spectrum clock generator (SSCG). The employ of proposed DCDL in this circuit allows to reduce the peak-to-peak absolute output jitter of more than the 40% with respect to a SSCG using three- state inverter based DCDLs.
  • 6. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com ETPL VLSI-007 A High-Efficiency, Wide Workload Range, Digital Off-Time Modulation (DOTM) DC- DC Converter With Asynchronous Power Saving Technique Abstract: Conventionally for wide workload range applications, to keep good stability and high efficiency, a switching converter with multi-mode operation is necessary. With the advanced digital signal processing, this work presents an asynchronous digital controller with dynamic power saving technique to achieve high power efficiency. The regulation is based on the off-time modulation, in which an adaptive resolution adjustment is proposed for the extension toward light-loaded range. The DC-DC converter is fabricated in a 0.18- μm CMOS process. The input voltage is from 2.7 to 3.6 V and the regulated output is 1.8 V. The switching frequency is from 44 kHz to 1.65 MHz and the maximum output ripple is 20 mV with a 10-μF capacitor and a 2.2-μH inductor. The power efficiency is higher than 91% for the workload range from 3 to 400 mA. ETPL VLSI-008 Formal Verification of Architectural Power Intent Abstract: This paper presents a verification framework that attempts to bridge the disconnect between high-level properties capturing the architectural power management strategy and the implementation of the power management control logic using low-level per-domain control signals. The novelty of the proposed framework is in demonstrating that the architectural power intent properties developed using high-level artifacts can be automatically translated into properties over low-level control sequences gleaned from UPF specifications of power domains, and that the resulting properties can be used to formally verify the global on-chip power management logic. The proposed translation uses a considerable amount of domain knowledge and is also not purely syntactic, because it requires formal extraction of timing information for the low-level control sequences. We present a tool, called POWER-TRUCTOR which enables the proposed framework, and several test cases of significant complexity to demonstrate the feasibility of the proposed framework. ETPL VLSI-009 Statistical SRAM Read Access Yield Improvement Using Negative Capacitance Circuits Abstract: SRAM has become the dominant block in modern ICs and constitutes more than 50% of the die area. The increase of process variations with continued CMOS technology scaling is considered one of the major challenges for SRAM designers. This process variations increase causes the SRAM cells to functionally fail and reduces the chip functional yield considering the static noise margin stability failures (i.e., cell flips when accessed), write failures (i.e., cell is not written within the write window), and read access failures (i.e., incorrect read operation). In this paper, novel negative capacitance circuits are developed, for the first time, to statistically improve the SRAM read access yield under process variations by reducing the bitlines parasitic capacitance. Post layout simulation results, referring to an industrial hardware-calibrated TSMC 65-nm CMOS technology, show that the adoption of the negative capacitance circuit to a 512 SRAM cells column is capable of improving the read access yield from 61.9% to 100%. ETPL VLSI-010 An Energy-Efficient L2 Cache Architecture Using Way Tag Information Under Write- Through Policy Abstract: Many high-performance microprocessors employ cache write-through policy for performance improvement and at the same time achieving good tolerance to soft errors in on-chip caches. However,
  • 7. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com write-through policy also incurs large energy overhead due to the increased accesses to caches at the lower level (e.g., L2 caches) during write operations. In this paper, we propose a new cache architecture referred to as way-tagged cache to improve the energy efficiency of write-through caches. By maintaining the way tags of L2 cache in the L1 cache during read operations, the proposed technique enables L2 cache to work in an equivalent direct-mapping manner during write hits, which account for the majority of L2 cache accesses. This leads to significant energy reduction without performance degradation. Simulation results on the SPEC CPU2000 benchmarks demonstrate that the proposed technique achieves 65.4% energy savings in L2 caches on average with only 0.02% area overhead and no performance degradation. Similar results are also obtained under different L1 and L2 cache configurations. Furthermore, the idea of way tagging can be applied to existing low-power cache design techniques to further improve energy efficiency. ETPL VLSI-011 An Analytical Latency Model for Networks-on-Chip Abstract: We propose an analytical model based on queueing theory for delay analysis in a wormhole- switched network-on-chip (NoC). The proposed model takes as input an application communication graph, a topology graph, a mapping vector, and a routing matrix, and estimates average packet latency and router blocking time. It works for arbitrary network topology with deterministic routing under arbitrary traffic patterns. This model can estimate per-flow average latency accurately and quickly, thus enabling fast design space exploration of various design parameters in NoC designs. Experimental results show that the proposed analytical model can predict the average packet latency more than four orders of magnitude faster than an accurate simulation, while the computation error is less than 10% in non- saturated networks for different system-on-chip platforms. ETPL VLSI-012 Built-In Generation of Functional Broadside Tests Using a Fixed Hardware Structure Abstract: Functional broadside tests are two-pattern scan-based tests that avoid overtesting by ensuring that a circuit traverses only reachable states during the functional clock cycles of a test. In addition, the power dissipation during the fast functional clock cycles of functional broadside tests does not exceed that possible during functional operation. On-chip test generation has the added advantage that it reduces test data volume and facilitates at-speed test application. This paper shows that on-chip generation of functional broadside tests can be done using a simple and fixed hardware structure, with a small number of parameters that need to be tailored to a given circuit, and can achieve high transition fault coverage for testable circuits. With the proposed on-chip test generation method, the circuit is used for generating reachable states during test application. This alleviates the need to compute reachable states offline. ETPL VLSI-013 Checkpointing for Virtual Platforms and SystemC-TLM Abstract: Integrating simulation models created using different simulation systems is a common problem when constructing virtual platforms. Different companies and different departments can create models, and virtual platforms for different purposes using different tools. There are also existing models that need to be integrated into new tools, or the other way around. The simulators can be quite different in details, even in the case of transaction-level models. We present work in integrating SystemC transaction-level models into two typical full-system simulation environments, QEMU and Simics. We present issues in
  • 8. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com reconciling the semantics of the different platforms, and our proposed solutions. In the Simics integration, we additionally enable checkpointing in the models, based on the Simics checkpoint mechanism. ETPL VLSI-014 Design of a Practical Nanometer-Scale Redundant Via-Aware Standard Cell Library for Improved Redundant Via1 Insertion Rate Abstract: Despite the rapid advances in process technology, via failure is still problematic in nanometer- scale semiconductor manufacturing. Adding redundant vias is a typical approach for improving yield and reliability. Cell-based design methodologies are widely adopted in the industry for application-specific integrated circuits. Standard cells are effective for increasing the insertion rate of redundant via1s in cell- based designs. This study proposes an efficient library check and staggered pin arrangement approach that compares redundant via1 insertion rate in different configurations such as double-via and rectangle-via. To compare the variability in standard cell (SC) libraries, accurate characterization results are provided. Moreover, the proposed SC library is easily implemented in all currently available routers. The experimental results reveal that the proposed library improves total inserted redundant vias, total inserted redundant via1s, and total run time by 20.2%, 51.9%, and 42.3%, respectively. In double-via pattern, the proposed approach improves average via1 insertion rate by 14.6%. In rectangle-via pattern, the proposed approach achieves a 100% via1 insertion rate. ETPL VLSI-015 Scaling Energy Per Operation via an Asynchronous Pipeline Abstract: Statistical analysis of computations per unit energy in processors over the last 30 years is given that illustrates a sharp reduction in the rate of energy efficiency improvements over the last several years resulting in the formation of an asymptotic “wall” with our dataset; we use the measure of giga multiply accumulates per Joule. We have developed an energy model which takes into account the realities of scaling, specifically for asynchronous systems. Studies of an energy efficient asynchronous pipeline show fabricated results of 17 Giga Operations per Joule in 0.6 μm at subthreshold when fully pipelined, and simulations at a more modern 65 nm process show a further order of magnitude improvement on that. ETPL VLSI-016 A High Speed Low Power CAM With a Parity Bit and Power-Gated ML Sensing Abstract: Content addressable memory (CAM) offers high-speed search function in a single clock cycle. Due to its parallel match-line (ML) comparison, CAM is power-hungry. Thus, robust, high-speed and low-power ML sense amplifiers are highly sought-after in CAM designs. In this paper, we introduce a parity bit that leads to 39% sensing delay reduction at a cost of less than 1% area and power overhead. Furthermore, we propose an effective gated-power technique to reduce the peak and average power consumption and enhance the robustness of the design against process variations. A feedback loop is employed to auto-turn off the power supply to the comparison elements and hence reduce the average power consumption by 64%. The proposed design can work at a supply voltage down to 0.5 V. ETPL VLSI-017 Error Detection in Majority Logic Decoding of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes Abstract: In a recent paper, a method was proposed to accelerate the majority logic decoding of difference set low density parity check codes. This is useful as majority logic decoding can be implemented serially
  • 9. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com with simple hardware but requires a large decoding time. For memory applications, this increases the memory access time. The method detects whether a word has errors in the first iterations of majority logic decoding, and when there are no errors the decoding ends without completing the rest of the iterations. Since most words in a memory will be error-free, the average decoding time is greatly reduced. In this brief, we study the application of a similar technique to a class of Euclidean geometry low density parity check (EG-LDPC) codes that are one step majority logic decodable. The results obtained show that the method is also effective for EG-LDPC codes. Extensive simulation results are given to accurately estimate the probability of error detection for different code sizes and numbers of errors. ETPL VLSI-018 Techniques for Compensating Memory Errors in JPEG2000 Abstract: This paper presents novel techniques to mitigate the effects of SRAM memory failures caused by low voltage operation in JPEG2000 implementations. We investigate error control coding schemes, specifically single error correction double error detection code based schemes, and propose an unequal error protection scheme tailored for JPEG2000 that reduces memory overhead with minimal effect in performance. Furthermore, we propose algorithm-specific techniques that exploit the characteristics of the discrete wavelet transform coefficients to identify and remove SRAM errors. These techniques do not require any additional memory, have low circuit overhead, and more importantly, reduce the memory power consumption significantly with only a small reduction in image quality. ETPL VLSI-019 Spatial Distribution Measurement of Dynamic Voltage Drop Caused by Pulse and Periodic Injection of Spot Noise Abstract: This paper presents measured results of dynamic voltage drop caused by pulse and periodic injection of spot noise. The test structure being fabricated by a 45 nm low-power process has 1024 delay probes to measure spatial distributions in response to the spot-noise generation. The test structure is the advanced version of our predecessor being fabricated by a 65-nm node, and can trace changes in the spatial distributions with time after the noise injection. The measured results are compared with SPICE simulations, in which package/socket LCR as well as power-line RC within the die is modeled. It is found that the simple model agrees well with the measured results. ETPL VLSI-020 Low-Complexity Multiplier for GF(2^{m}) Based on All-One Polynomials Abstract: This paper presents an area-time-efficient systolic structure for multiplication over GF(2m) based on irreducible all-one polynomial (AOP). We have used a novel cut-set retiming to reduce the duration of the critical-path to one XOR gate delay. It is further shown that the systolic structure can be decomposed into two or more parallel systolic branches, where the pair of parallel systolic branches has the same input operand, and they can share the same input operand registers. From the application- specific integrated circuit and field-programmable gate array synthesis results we find that the proposed design provides significantly less area-delay and power-delay complexities over the best of the existing designs. ETPL VLSI-021 Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip
  • 10. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com Abstract: This paper presents the silicon-proven design of a novel on-chip network to support guaranteed traffic permutation in multiprocessor system-on-chip applications. The proposed network employs a pipelined circuit-switching approach combined with a dynamic path-setup scheme under a multistage network topology. The dynamic path-setup scheme enables runtime path arrangement for arbitrary traffic permutations. The circuit-switching approach offers a guarantee of permuted data and its compact overhead enables the benefit of stacking multiple networks. A 0.13-μ m CMOS test-chip validates the feasibility and efficiency of the proposed design. Experimental results show that the proposed on-chip network achieves 1.9× to 8.2× reduction of silicon overhead compared to other design approaches. ETPL VLSI-022 An On-Chip Network Fabric Supporting Coarse-Grained Processor Array Abstract: Coarse grained arrays (CGAs) with run-time reconfigurability play an important role in accelerating reconfigurable computing applications. It is challenging to design on-chip communication networks (OCNs) for such CGAs with dynamic run-time reconfigurability whilst satisfying the tight budgets of power and area for an embedded system. This paper presents a silicon-proven design of a 64- PE circuit-switched OCN fabric with a dynamic path-setup scheme capable of supporting an embedded coarse-grained processor array. A proof-of-concept test chip fabricated in a 0.13 μm CMOS process occupies a silicon area of 23 mm2 and consumes a peak power of 200 mW @ 128 MHz and 1.2 Vcc, at room temperature. The OCN overhead consumes 9.4% of the area and 18% of the power of the total chip. Experimental results and analysis show that the proposed OCN fabric with its dynamic path-setup is suitable for use in an embedded CGA supporting fast run-time reconfigurability. ETPL VLSI-023 A Very Linear Low-Pass Filter with Automatic Frequency Tuning Abstract: A Gm-C third-order Chebyshev low-pass filter with a novel switched capacitor frequency tuning technique for a zero-IF Bluetooth receiver has been designed. The frequency tuning scheme is simpler and has more relaxed specifications than conventional ones. Furthermore, a highly linear pseudo- differential transconductor with a compact feedback loop able to operate with low supply voltage has been used. This control loop holds the input transistors in triode region and provides high output resistance, keeping high linearity in a wide range of transconductance. The filter bandwidth is 0.5 MHz and the overall scheme consumes 1.1 mA from a 1.8-V supply. The measured third-order intermodulation (IM3) distortion of the filter for a 1 Vpp two-tone signal centered at 300 kHz is -65 dB. ETPL VLSI-024 A High-Speed Low-Complexity Modified {rm Radix}-2^{5} FFT Processor for High Rate WPAN Applications Abstract: This paper presents a high-speed low-complexity modified radix-25 512-point fast Fourier transform (FFT) processor using an eight data-path pipelined approach for high rate wireless personal area network applications. A novel modified radix-25 FFT algorithm that reduces the hardware complexity is proposed. This method can reduce the number of complex multiplications and the size of the twiddle factor memory. It also uses a complex constant multiplier instead of a complex Booth multiplier. The proposed FFT processor achieves a signal-to-quantization noise ratio of 35 dB at 12 bit internal word length. The proposed processor has been designed and implemented using 90-nm CMOS technology with a supply voltage of 1.2 V. The results demonstrate that the total gate count of the
  • 11. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com proposed FFT processor is 290 K. Furthermore, the highest throughput rate is up to 2.5 GS/s at 310 MHz while requiring much less hardware complexity. ETPL VLSI-025 Application Space Exploration of a Heterogeneous Run-Time Configurable Digital Signal Processor Abstract: This paper describes the application space exploration of a heterogeneous digital signal processor with dynamic reconfiguration capabilities. The device is built around three reconfigurable engines featuring different flavours and computation granularities that make it suitable for a wide range of signal processing application domains such as video coding, image processing, telecommunications, and cryptography. Performance of signal processing applications is evaluated from measurements performed on a CMOS 90 nm prototype. In order to characterize the application space of the processor, performance is compared with state-of-the-art devices, taking programmability, computational capabilities, and energy efficiency as the main metrics. The device exploits performance and energy efficiency significantly more than general purpose processors, while still maintaining a user-friendly programming approach that mainly relies on software-oriented languages. The device is able to achieve 1.2 to 15 GOPS with an energy efficiency from 2 to 50 GOPS/W when running the selected applications ETPL VLSI-026 A Unified Graphics and Vision Processor With a 0.89 mu W/fps Pose Estimation Engine for Augmented Reality Abstract: A unified vision and graphics processor with three layers is shown to provide a fast pipeline for augmented reality. In the image-level layer, a 153.6 GOPS massively parallel processing unit with eight SIMD processors, each containing 128 processing elements, performs highly data-parallel operations. In the sub-image layer, a rasterizer and a pixel arranger respectively generate and reduce data-level parallelism. In the descriptor-level layer, a pose estimation engine executes sequential programs. Our processor can provide images for augmented reality at 100 fps, for a power consumption of 413 mW. This is 39% faster than a comparable smartphone implementation. Our chip is fabricated in a 0.18 μm CMOS process and contains 0.95 M gates. ETPL VLSI-027 CORDIC Designs for Fixed Angle of Rotation Abstract: Rotation of vectors through fixed and known angles has wide applications in robotics, digital signal processing, graphics, games, and animation. But, we do not find any optimized coordinate rotation digital computer (CORDIC) design for vector-rotation through specific angles. Therefore, in this paper, we present optimization schemes and CORDIC circuits for fixed and known rotations with different levels of accuracy. For reducing the area- and time-complexities, we have proposed a hardwired pre- shifting scheme in barrel-shifters of the proposed circuits. Two dedicated CORDIC cells are proposed for the fixed-angle rotations. In one of those cells, micro-rotations and scaling are interleaved, and in the other they are implemented in two separate stages. Pipelined schemes are suggested further for cascading dedicated single-rotation units and bi-rotation CORDIC units for high-throughput and reduced latency implementations. We have obtained the optimized set of micro-rotations for fixed and known angles. The optimized scale-factors are also derived and dedicated shift-add circuits are designed to implement the scaling. The fixed-point mean-squared-error of the proposed CORDIC circuit is analyzed statistically, and strategies for reducing the error are given. We have synthesized the proposed CORDIC cells by Synopsys Design Compiler using TSMC 90-nm library, and shown that the proposed designs offer higher
  • 12. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com throughput, less latency and less area-delay product than the reference CORDIC design for fixed and known angles of rotation. We find similar results of synthesis for different Xilinx field-programmable gate-array platforms. ETPL VLSI-028 Application-Driven End-to-End Traffic Predictions for Low Power NoC Design Abstract: As chip multiprocessors keep increasing the number of cores on the chip, the network-on-chip (NoC) technology is becoming essential for interconnecting the cores. While NoCs result in noticeable performance boost over conventional bus systems, they consume a non-negligible fraction of the system power. One promising solution is to dynamically adjust the working frequencies/voltages of the switches as well as the links between switches in the NoC to match the traffic flows. The question is when to adjust and by how much. Most previous works take a passive approach by reacting to fluctuations in local traffic flows. Unfortunately, this approach may be too slow and too conservative in adjusting the working frequencies/voltages. Since applications often exhibit periodic behaviors, we propose a hardware mechanism to proactively adjust the frequencies/voltages of switches and/or links in NoC by predicting the application runtime traffic. The evaluations show that our design achieves 86% dynamic power savings of the links in the on-chip network, and the resulting overheads from mispredictions are tolerable. ETPL VLSI-029 Thermal-Constrained Task Allocation for Interconnect Energy Reduction in 3-D Homogeneous MPSoCs Abstract: 3-D technology that stacks silicon dies with through silicon vias (TSVs) is a promising solution to overcome the interconnect scaling problem in giga-scale integrated circuits (ICs). Thermal dissipation is a major challenge for 3-D integration and prior thermal-balanced task scheduling methods for 3-D multiprocessor system-on-chips (MPSoCs) typically balance power gradient across vertical stacks based on the assumption of strong thermal correlation among processing cores within a stack. On the other hand, 3-D MPSoCs typically employ network-on-chip (NoC) as the communication infrastructure which consumes a large portion of the energy budget. As TSVs consume much less energy than horizontal links in 3-D MPSoCs when transmitting the same amount data due to the reduced interconnect distance between vertical adjacent cores, it motivates to allocate heavily communicating tasks within the same vertical stack as much as possible, and thus traffic is restricted in the third dimension to reduce interconnect energy. However, aggregating active tasks within the same stack probably exacerbates the power density and result in hot spots. In this paper, we explore the tradeoff between thermal and interconnect energy when allocating tasks in 3-D Homogeneous MPSoCs, and propose an efficient heuristic. Experimental results show that the proposed technique can reduce interconnect energy by more than 25% on average with almost the same peak temperature when compared with prior thermal-balanced solutions. ETPL VLSI-030 A Wide-Range PLL Using Self-Healing Prescaler/VCO in 65-nm CMOS Abstract: The variability and leakage current in nanoscale CMOS technology may degrade the circuit performances significantly. To accommodate the above issues in a wide-range phase-locked loop (PLL), a self-healing prescaler, a self-healing voltage-controlled oscillator (VCO), and a calibrated charge pump (CP) are presented. This PLL is fabricated in a 65-nm CMOS technology and its active area is 0.0182 mm2 . For the self-healing VCO, its measured frequency range is from 60 to 1489 MHz. When this PLL
  • 13. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com operates at 855 MHz, the measured rms and peak-to-peak jitters are 8.03 and 55.6 ps, respectively. The measured reference spur is -52.89 dBc. This PLL consumes 4.3 mW from 1.2 V supply without buffers. ETPL VLSI-031 A Clock Control Strategy for Peak Power and RMS Current Reduction Using Path Clustering Abstract: Peak power reduction has been a critical challenge in the design of integrated circuits impacting the chip's performance and reliability. The reduction of peak power also reduces the power density of integrated circuits. Due to large IR-voltage drops in circuits, transistor switching slows down giving rise to timing violations and logic failures. In this paper, we present a new clock control strategy for peak- power reduction in VLSI circuits. In the proposed method, the simultaneous switching of combinational paths is minimized by taking advantage of the delay slacks among the paths and clustering the paths with similar slack values. Once the paths are identified based on the path delays and their slack values, the clustering algorithm determines the ideal number of clusters for the given circuit and for each cluster the maximum possible phase shift that can be applied to the clock. The paths are assigned to clusters in a load balanced manner based on the slack values and each cluster will have a phase shift possible on its clock depending on the slack. Thus, the proposed register-transfer level (RTL) method takes advantage of the logic-path timing slack to re-schedule circuit activities at optimal intervals within the unaltered clock period. When switching activities are redistributed more evenly across the clock period, the IC supply- current consumption is also spread across a wider range of time within the clock period. This has the beneficial effect of reducing peak-current draw in addition to reducing RMS power draw without having to change the operating frequency and without utilizing additional power supply voltages as in dual or multi VT approaches. The proposed method is implemented and tested through simulations using an experimental setup with Synopsys Tools Suite and Cadence Tools on the ISCAS'85 benchmark circuits, OpenCore circuits and LEON processor multiplier circuit. Experimental results indicate that peak power can be reduced significantly to at- least 72% depending on the number of clusters and the phase-shifted clock identified as suitable for the given circuit by the proposed algorithms. Although the proposed method incurs some power overhead compared to the traditional clocking method, the overhead can be made negligible compared to the peak-power reduction as seen in the experimental results presented. ETPL VLSI-032 A Fast-Locking All-Digital Deskew Buffer With Duty-Cycle Correction Abstract: In this paper, a fast-locking all-digital deskew buffer with duty cycle correction is proposed and implemented. A cyclic time-to-digital converter is introduced to decrease the locking time in conventional register-controlled delay-locked loop to only two input clock cycles in coarse tuning. With the aid of the three half delay lines technique, the mismatch between half delay lines causing the duty cycle distortion can be alleviated by interpolation. A balanced edge combiner to achieve a precise 50% output clock is also presented. A test chip is fabricated in 0.18-μm technology to demonstrate the feasibility of the proposed architecture. The circuit can accept the input clock rates from 250 to 625 MHz with the duty cycle variation within 30% and 70% to generate 50% output clocks. It preserves the capability of closed- loop control with a small area and power consumption. ETPL VLSI-033 A Built-In Repair Analyzer With Optimal Repair Rate for Word-Oriented Memories
  • 14. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com Abstract: This paper presents a built-in self repair analyzer with the optimal repair rate for memory arrays with redundancy. The proposed method requires only a single test, even in the worst case. By performing the must-repair analysis on the fly during the test, it selectively stores fault addresses, and the final analysis to find a solution is performed on the stored fault addresses. To enumerate all possible solutions, existing techniques use depth first search using a stack and a finite-state machine. Instead, we propose a new algorithm and its combinational circuit implementation. Since our formulation for the circuit allows us to use the parallel prefix algorithm, it can be configured in various ways to meet area and test time requirements. The total area of our infrastructure is dominated by the number of content addressable memory entries to store the fault addresses, and it only grows quadratically with respect to the number of repair elements. The infrastructure is also extended to support various types of word-oriented memories. ETPL VLSI-034 System-Level Modeling and Analysis of Thermal Effects in Optical Networks-on-Chip Abstract: The performance of multiprocessor systems, such as chip multiprocessors (CMPs), is determined not only by individual processor performance, but also by how efficiently the processors collaborate with one another. It is the communication architecture that determines the collaboration efficiency on the hardware side. Optical networks-on-chip (ONoCs) are emerging communication architectures that can potentially offer ultra-high communication bandwidth and low latency to multiprocessor systems. Thermal sensitivity is an intrinsic characteristic of photonic devices used by ONoCs as well as a potential issue. This paper systematically modeled and quantitatively analyzed the thermal effects in ONoCs. We used an 8 × 8 mesh-based ONoC as a case study and evaluated the impacts of thermal effects in the average power efficiency for real MPSoC applications. We revealed three important factors regarding ONoC power efficiency under temperature variations, and proposed several techniques to reduce the temperature sensitivity of ONoCs. These techniques include the optimal initial setting of microresonator resonant wavelength, increasing the 3-dB bandwidth of optical switching elements by parallel coupling multiple microresonators, and the use of passive-routing optical router Crux to minimize the number of switching stages in mesh-based ONoCs. We gave a mathematical analysis of periodically parallel coupling of multiple microresonators and show that the 3-dB bandwidth of optical switching elements can be widened nearly linearly with the ring number. Evaluation results for different real MPSoC applications show that, on the basis of thermal tuning, the optimal device setting improves the average power efficiency by 54% to 1.2 pJ/bit when chip temperature reaches 85 °C. The findings in this paper can help support the further development of this emerging technology. ETPL VLSI-035 A Study of Tapered 3-D TSVs for Power and Thermal Integrity Abstract: 3-D integration presents a path to higher performance, greater density, increased functionality and heterogeneous technology implementation. However, 3-D integration introduces many challenges for power and thermal integrity due to large switching currents, longer power delivery paths, and increased parasitics compared to 2-D integration. In this work, we provide an in-depth study of power and thermal issues while incorporating the physical design characteristics unique to 3-D integration. We provide a qualitative perspective of the power and thermal dissipation issues in 3-D and study the impact of Through Silicon Vias (TSVs) size for their mitigation. We investigate and discuss the design implications of power and thermal issues in the presence of decoupling capacitors, TSV/on-die/package parasitics, various resonance effects and power gating. Our study is based on a ten-tier system utilizing existing 3-D
  • 15. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com technology specifications. Based on detailed power distribution and heat dissipation models, we present a comprehensive analysis of TSV tapering for alleviating power and thermal integrity issues in 3-D ICs. ETPL VLSI-036 Improved Trace Buffer Observation via Selective Data Capture Using 2-D Compaction for Post-Silicon Debug Abstract: This paper presents a novel technique for extending the capacity of trace buffers when capturing debug data during post-silicon debug. It exploits the fact that is it not necessary to capture error-free data in the trace buffer since that information can be obtained from simulation. A selective data capture method is proposed in this paper that only captures debug data during clock cycles in which errors are present. The proposed debug method requires only three debug sessions. The first session estimates a rough error rate, the second session identifies a set of suspect clock cycles where errors may be present, and the third session captures the suspect clock cycles in the trace buffer. The suspect clock cycles are determined through a 2-D compaction technique using multiple-input signature register signatures and cycling register signatures. Intersecting both signatures generates a small number of suspect clock cycles for which the trace buffer needs to capture. The effective observation window of the trace buffer can be expanded significantly, by up to orders of magnitude. Experimental results indicate very significant increases in the effective observation window for a trace buffer can be obtained. ETPL VLSI-037 AC-Plus Scan Methodology for Small Delay Testing and Characterization Abstract: Small delay defects escaping traditional delay testing could cause a device to malfunction in the field and thus detecting these defects is often necessary. To address this issue, we propose three test modes in a new methodology called AC-plus scan, in which versatile test clocks can be generated on the chip by embedding an all-digital phase-locked loop (ADPLL) into the circuit under test (CUT). AC-plus scan can be executed on an in-house wireless test platform called HOY system. The first test mode of our AC-plus scan provides a more efficient way to measure the longest path delay associated with each test pattern. Experimental result shows that our method could greatly reduce the test time by 81.8%. The second test mode is designed for volume production test. It could effectively detect small delay defects and provide fast characterization on those defective chips for further processing. This mode could be used to help predict which chips are more likely to fall victim to operational failure in the field. The third test mode is to extract the waveform of each flip-flop's output in a real chip. This is made possible by taking advantage of the almost unlimited test memory our HOY test platform provides, so that we could easily store a great volume of data and reconstruct the waveform for post-silicon debugging. We have successfully fabricated a Viterbi decoder chip with such an AC-plus scan methodology inside to demonstrate its capability. ETPL VLSI-038 A Variation Tolerant Current-Mode Signaling Scheme for On-Chip Interconnects Abstract: Current-mode signaling (CMS) with dynamic overdriving is one of the most promising scheme for high-speed low-power communication over long on-chip interconnects. However, they are sensitive to parameter variations due to reduced voltage swings on the line. In this paper, we propose a variation tolerant dynamic overdriving CMS scheme. The proposed CMS scheme and a competing CMS scheme (CMS-Fb) are fabricated in 180-nm CMOS technology. Measurement results show that the proposed scheme offers 34% reduction in energy/bit and 42% reduction in energy-delay-product over CMS-Fb
  • 16. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com scheme for a 10 mm line operating at 0.64 Gbps of data rate. Simulations indicate that the proposed CMS scheme consumes 0.297 pJ/bit for data transfer over the 10 mm line at 2.63 Gb/s. Measurements indicate that the delay of CMS-Fb becomes 2.5 times its nominal value in the presence of intra-die variations whereas the delay of the proposed scheme changes by only 5% for the same amount of intra-die variations. Measurement and simulation results show that both the schemes are robust against inter-die variations. Experiments and simulations also indicate that the proposed CMS scheme is more robust against practical variations in supply and temperature as compared to CMS-Fb scheme. ETPL VLSI-039 Modeling and Analysis of Power Distribution Networks in 3-D ICs Abstract: This paper addresses the modeling and analysis problems for power distribution networks (PDNs) in 3-D ICs. An on-chip distributed model is proposed for 3-D power grids, in which the details of metal layers are considered. The distributed model is demonstrated to be essential to identifying the unique noise behavior of 3-D PDNs. A lumped model is proposed based on the distributed model. The lumped model features the connection impedance between tiers and is proven to be useful for designers to understand the global effects of 3-D PDNs. Based on the models, an analysis flow is designed for 3-D PDNs in both frequency domain and time domain. With the analysis flow, the electrical characteristics of 3-D PDNs are studied systematically for the first time. The frequency-domain analysis identifies the global and local resonance phenomena in 3-D PDNs that are distinct from those in 2-D PDNs. The physical mechanisms behind the resonance phenomena are investigated. The time-domain analysis predicts the worst-case supply noise based on distributed current constraints. The “Rogue Wave” concept is introduced to explain the spatial and temporal relations of the worst-case on-chip noise responses in 3- D PDNs. ETPL VLSI-040 A Low-Cost, Systematic Methodology for Soft Error Robustness of Logic Circuits Abstract: Due to current technology scaling trends such as shrinking feature sizes and decreasing supply voltages, circuit reliability is becoming more susceptible to radiation-induced transient faults (soft errors). Soft errors, which have been a great concern in memories, are now a main factor in reliability degradation of logic circuits as well. In this paper, we present a systematic and integrated methodology for circuit robustness to soft errors. The proposed soft error rate (SER) reduction framework, based on redundancy addition and removal (RAR), aims at eliminating those gates with large contribution to the overall SER. Several metrics and constraints are introduced to guide the RAR-based approach toward SER reduction. Furthermore, we integrate a resizing strategy into our framework, as post-RAR additive SER optimization. The strategy can identify most critical gates to be upsized and thereby, minimize area and power overheads while maintaining a high level of soft error robustness. Experimental results show that the proposed RAR-based framework can achieve up to 70% reduction in output failure probability. On average, about 23% SER reduction is obtained with less than 4% area overhead. ETPL VLSI-041 Low Complexity Out-of-Order Issue Logic Using Static Circuits Abstract: In this paper a single-cycle issue queue circuit architecture that simplifies the wakeup and selection logic is proposed. The micro-architecture and fully static CMOS circuits are presented for a 32- entry queue that issues four instructions per cycle. The instruction-ready signals are divided into groups and processed in parallel to issue the four oldest ready instructions. The complete issue queue and prioritization logic requires 20 inversions, allowing simulated circuit operation at over 4 GHz in a foundry
  • 17. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com 45 nm SOI fabrication process. ETPL VLSI-042 Low Latency Systolic Montgomery Multiplier for Finite Field GF(2^{m}) Based on Pentanomials Abstract: In this paper, we present a low latency systolic Montgomery multiplier over GF(2m) based on irreducible pentanomials. An efficient algorithm is presented to decompose the multiplication into a number of independent units to facilitate parallel processing. Besides, a novel so-called “pre-computed addition” technique is introduced to further reduce the latency. The proposed design involves significantly less area-delay and power-delay complexities compared with the best of the existing designs. It has the same or shorter critical-path and involves nearly one-fourth of the latency of the other in case of the National Institute of Standards and Technology recommended irreducible pentanomials. ETPL VLSI-043 Power-Up Sequence Control for MTCMOS Designs Abstract: Power gating is effective for reducing standby leakage power as multi-threshold CMOS (MTCMOS) designs have become popular in the industry. However, a large inrush current and dynamic IR drop may occur when a circuit domain is powered up with MTCMOS switches. This could in turn lead to improper circuit operation. We propose a novel framework for generating a proper power-up sequence of the switches to control the inrush current of a power-gated domain while minimizing the power-up time and reducing the dynamic IR drop of the active domains. We also propose a configurable domino- delay circuit for implementing the sequence. Experimental results based on state-of-the-art industrial designs demonstrate the effectiveness of the proposed framework in limiting the inrush current, minimizing the power-up time, and reducing the dynamic IR drop. Results further confirm the efficiency of the framework in handling large-scale designs with more than 80 K power switches and 100 M transistors. ETPL VLSI-044 Architecture and Design Flow for a Highly Efficient Structured ASIC Abstract: As fabrication process technology continues to advance, mask set costs have become prohibitively expensive. Structured application specific integrated circuits (sASICs) offer a middle ground in price and performance between ASICs and field-programmable gate arrays (FPGAs) by sharing masks across different designs. In this paper, two sASIC architectures are proposed, the first being based on three-input lookup-tables, and the second on AOI22 gates. The sASICs are programmed using a standard- cell compatible design flow. They are customized using a minimum of three masks, i.e., two metals and one via. The area and delay of the sASIC are compared with ASICs and FPGAs. Results over a set of benchmark circuits show that our AOI22-based sASIC had an average of 1.76x/1.41x increase in area/delay compared to ASICs, a considerable improvement compared with the 26.56x/5.09x increase for FPGAs. This is, to the best of our knowledge, the best performance reported in the literature for a practical sASIC. A prototype using the sASIC was fabricated using a universal machine control 0.13-μm mixed-mode/RF process. It was fully verified using scan and functional tests, and used in a demonstration system. ETPL VLSI-045 Secure Dual-Core Cryptoprocessor for Pairings Over Barreto-Naehrig Curves on FPGA Platform, Abstract: This paper is devoted to the design and the physical security of a parallel dual-core flexible cryptoprocessor for computing pairings over Barreto-Naehrig (BN) curves. The proposed design is specifically optimized for field-programmable gate-array (FPGA) platforms. The design explores the in- built features of an FPGA device for achieving an efficient cryptoprocessor for computing 128-bit secure
  • 18. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com pairings. The work further pinpoints the vulnerability of those pairing computations against side-channel attacks and demonstrates experimentally that power consumptions of such devices can be used to attack these ciphers. Finally, we suggest a suitable countermeasure to overcome the respective weaknesses. The proposed secure cryptoprocessor needs 1 730 000, 1 206 000, and 821 000 cycles for the computation of Tate, ate, and optimal-ate pairings, respectively. The implementation results on a Virtex-6 FPGA device shows that it consumes 23 k Slices and computes the respective pairings in 11.93, 8.32, and 5.66 ms. ETPL VLSI-046 In-Situ Method for TSV Delay Testing and Characterization Using Input Sensitivity Analysis Abstract: In this paper, we propose a method and the required architecture for characterizing the propagation delays of the through Silicon vias (TSVs) in a 3-D IC. First of all, every two TSVs are paired up to form an oscillation ring with some peripheral circuits. Their joint performance can thus be measured roughly by the oscillation period of the ring. Next, we utilize a technique called sensitivity analysis to further derive the propagation delay of each individual TSV participating in an oscillation ring-a distilling process. In this process, we perturb the strength of the two TSV drivers, and then measure their effects in terms of the change of the oscillation ring's period. By some following analysis, the propagation delay of each TSV can be revealed. On top of scheme, we also present an architecture that can activate the performance characterization process of each test unit - that consists of two TSVs - one at a time in a proper sequence. The area overhead is only 18.97 equivalent two-input NAND gate per TSV, by which one can gain the ability to profile the capacitances and the propagation delays of the TSVs on a 3-D IC. ETPL VLSI-047 Low-Resolution DAC-Driven Linearity Testing of Higher Resolution ADCs Using Polynomial Fitting Measurements Abstract: A low-cost linearity test methodology for high-resolution analog-to-digital converters (ADCs) is presented in this paper. Linearity testing of ADCs requires high-precision digital-to-analog conversion (DAC) capability, commonly 3-bit higher resolution than the ADC under test. Further, a large number of ADC output data samples must be collected making conventional histogram testing impractical for high- resolution ADCs with 18-24 bit precision. In the proposed test methodology, two low-precision and low- cost DACs are used to generate a high-resolution ADC test stimulus. Significant reductions in test cost and test time are achieved by using low-cost instrumentation and by making fewer measurements than required for conventional histogram test. A least-squares-based polynomial fitting approach is used to determine the transfer function of the ADC under test. The generated transfer function is used to compute the non-linearity of the ADC accurately. No assumption is made regarding the linearity of the lower precision signal generators (DACs) used in the testing procedure. Software simulations and hardware experiments are performed to validate the proposed test methodology ETPL VLSI-048 Low-Cost Error Tolerance Scheme for 3-D CMOS Imagers Abstract: This paper presents an error tolerance scheme for 3-D CMOS imagers that are constructed by stacking a pixel array of imager sensors, an analog-to-digital converter (ADC) array, and an image signal processor (ISP) array using microbumps (μbumps) and through silicon vias (TSVs). To deliver high- quality images in the presence of single or multiple μbump, ADC, or TSV failures, we propose to interleave the connections from pixels to ADCs and recover the corrupted data in the ISPs. Key design parameters, such as the interleaving stride and the grouping ratio are determined by analyzing the employed error correction algorithm. Architectural simulation results demonstrate that the error tolerance
  • 19. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com scheme enhances the effective yield of an exemplar 3-D imager from 44% to 97%. ETPL VLSI-049 Computing Two-Pattern Test Cubes for Transition Path Delay Faults Abstract: Considering full-scan circuits, incompletely-specified tests, or test cubes, are used for test data compression. When considering path delay faults, certain specified input values in a test cube are needed only for determining the lengths of the paths associated with detected faults. Path delay faults, and therefore, small delay defects, would still be detected if such values are unspecified. The goal of this paper is to explore the possibility of increasing the number of unspecified input values in a test set for path delay faults by unspecifying such values in order to make the test set more amenable to test data compression. Experimental results indicate that significant numbers of such values exist. The proposed procedure unspecifies them gradually to obtain a series of test sets with increasing numbers of unspecified values and decreasing path lengths. Experimental results also indicate that filling the unspecified values randomly (as with some test data compression methods) recovers some or all of the path lengths associated with detected path delay faults. The procedure uses a matching of the sets of detected faults for the comparison of path lengths ETPL VLSI-050 Integrated Energy-Harvesting Photodiodes With Diffractive Storage Capacitance Abstract: Integrating energy-harvesting photodiodes with logic and exploiting on-die interconnect capacitance for energy storage can enable new, ultraminiaturized wireless systems. Unlike CMOS imager pixels, the proposed photodiode designs utilize p-diffusion fingers and are implemented in a conventional logic process. Also unlike specialized solar cell processes, the designs utilize the on-chip metal interconnect to form a diffraction grating above the p-diffusion fingers which also provides capacitive energy storage. To explore the tradeoffs between optical efficiency and energy storage for integrated photodiodes, an array of photovoltaics with various diffractive storage capacitors was designed in a 90- nm CMOS logic process. The diffractive effects can be exploited to increase the photodiodes' response to off-axis illumination. Transient effects from interfacing the photodiodes with switched-capacitor DC-DC converters were examined, with measurements indicating a 50% reduction in the output voltage ripple due to the diffractive storage capacitance. A quantitative comparison between 90-nm and 0.35-μm CMOS logic processes for energy-harvesting capabilities was carried out. Measurements show an increase in power generation for the newer CMOS technology, however at the cost of reduced output voltage. One potential application for the integrated photodiodes is harvesting energy for a subdermal biomedical device. ETPL VLSI-051 Fast Fixed-Outline 3-D IC Floorplanning With TSV Co-Placement Abstract: Through-silicon vias (TSVs) are used to connect inter-die signals in a 3-D IC. Unlike conventional vias, TSVs occupy device area and are very large compared to logic gates. However, most previous 3-D floorplanners only view TSVs as points. As a result, whitespace redistribution is necessary for TSV insertion after the initial floorplan is computed, which leads to suboptimal layouts. In this paper, we propose a very efficient 3-D floorplanner to simultaneously floorplan the functional modules and place the TSVs and to optimize the total wirelength under fixed-outline constraint. Compared to the state-
  • 20. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com of-the-art 3-D floorplanner with TSV planning, our design consistently produces better floorplans with 15% shorter wirelength and 31% fewer TSVs on average. Our algorithm is extremely fast and only takes a few seconds to floorplan benchmarks with hundreds of modules compared to hours as required by the previous state-of-the-art floorplanner. ETPL VLSI-052 Reactivation Noise Suppression With Sleep Signal Slew Rate Modulation in MTCMOS Circuits Abstract: Multi-threshold CMOS (MTCMOS) is commonly used for suppressing leakage currents in idle integrated circuits. Power and ground distribution network noise produced during SLEEP to ACTIVE mode transitions is an important reliability concern in MTCMOS circuits. Sleep signal slew rate modulation techniques for suppressing mode-transition noise are explored in this paper. A triple-phase sleep signal slew rate modulation (TPS) technique with a novel digital sleep signal generator is proposed. Reactivation time, mode-transition energy consumption, leakage power consumption, and layout area of different MTCMOS circuits are characterized under an equal-noise constraint. Influences of within-die and die-to-die parameter variations on the reactivation noise, time, and energy consumption of sleep signal slew rate modulated MTCMOS circuits are evaluated with a process imperfections aware robustness metric. The proposed triple-phase sleep signal slew rate modulation technique enhances the tolerance to process parameter fluctuations by up to 183.1× as compared to various alternative MTCMOS noise suppression techniques in a UMC 80-nm CMOS technology. ETPL VLSI-053 Sub-mW LC Dual-Input Injection-Locked Oscillator for Autonomous WBSNs Abstract: This paper presents a sub-mW, current-reused first-harmonic LC injection-locked oscillator (ILO) using in-phase dual-input injection technique. It can be used as a power oscillator in the injection- locked transmitter of wireless biomedical sensor nodes (WBSNs) integrated into a wireless body area network. A prototype chip, implemented in a standard 0.13-μm CMOS process occupying 200 × 380 μm, operates in the medical implantable communications service (MICS) band for medical implants. Measurement results show that the proposed ILO features a wide locking range of 800 MHz (150-950 MHz) at an input power of 0 dBm. More importantly, it has a high input sensitivity of -30 dBm to lock the 3-MHz bandwidth of the MICS band, while consuming only 660 μW at 1-V supply. This ultralow power consumption enables autonomous WBSNs ETPL VLSI-054 Constant Delay Logic Style Abstract: A constant delay (CD) logic style is proposed in this paper, targeting at full-custom high-speed applications. The CD characteristic of this logic style regardless of the logic type makes it suitable in implementing complicated logic expressions such as addition. CD logic exhibits a unique characteristic where the output is pre-evaluated before the inputs from the preceding stage is ready. This feature offers performance advantage over static and dynamic domino logic styles in a single-cycle multistage circuit block. Several design considerations including timing window width adjustment and clock distribution are discussed. Using 65-nm general-purpose CMOS technology, the proposed logic demonstrates an average speedup of 94% and 56% over static and dynamic domino logic, respectively, in five different logic gates. Simulation results of 8-bit ripple carry adders show that CD logic is 39% and 23% faster than the static and dynamic-based adders, respectively. CD logic also demonstrates 39% speedup and 64%
  • 21. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com (22%) energy-delay product (EDP) reduction from static logic at 100% (10%) data activity in 32-bit carry lookahead adders. For 8-bit Wallace tree multiplier, CD logic achieves a similar speedup with at least 50% EDP reduction across all data activities. ETPL VLSI-055 A Compact Clock Generator for Heterogeneous GALS MPSoCs in 65-nm CMOS Technology Abstract: This paper presents an all-digital phase-locked loop (ADPLL) clock generator for globally asynchronous locally synchronous (GALS) multiprocessor systems-on-chip (MPSoCs). With its low power consumption of 2.7 mW and ultra small chip area of 0.0078 mm2 it can be instantiated per core for fine-grained power management like DVFS. It is based on an ADPLL providing a multiphase clock signal from which core frequencies from 83 to 666 MHz with 50% duty cycle are generated by phase rotation and frequency division. The clock meets the specification for DDR2/DDR3 memory interfaces. Additionally, it provides a dedicated high-speed clock up to 4 GHz for serial network-on-chip data links. Core frequencies can be changed arbitrarily within one clock cycle for fast dynamic frequency scaling applications. The performance including statistical analysis of mismatch has been verified by a prototype in 65-nm CMOS technology. ETPL VLSI-056 A Colpitts CMOS Quadrature VCO Using Direct Connection of Substrates for Coupling Abstract: A new low-phase noise low-power quadrature voltage-controlled oscillator (QVCO) using differential Colpitts oscillator is presented. The proposed QVCO is composed of two identical current- switching differential Colpitts VCOs in which the first core VCO is coupled to the second in an in-phase manner, and the second core VCO is coupled to the first in an anti-phase manner. To couple the two core VCOs, the substrates of the cross-connected transistors as well as the substrates of MOS varactors are used; alleviating the need for any extra elements for coupling, which could add noise and increase power dissipation. A linear (sinusoidal) analysis is presented that confirms that the proposed circuit generates quadrature waveforms. The proposed coupling technique can be generalized to N differential Colpitts VCOs for multiphase signals generation ETPL VLSI-057 A Self-Calibrated DLL-Based Clock Generator for an Energy-Aware EISC Processor Abstract: This paper describes a low-jitter delay-locked loop (DLL)-based clock generator for dynamic frequency scaling in the extendable instruction set computing (EISC) processor. The DLL-based clock generator provides the system clock with frequencies of 0.5× to 8× of the reference clock, according to the workload of the EISC processor. The proposed analog self-calibration method and a phase detector with an auxiliary charge pump can effectively reduce the delay mismatch between delay cells in the voltage-controlled delay line and the static phase offset due to the current mismatch in the charge pump, respectively. The self-calibrated output waveform exhibits 9.7 ps of RMS jitter and 73.7 ps of peak-to- peak jitter at 120 MHz. The prototype clock generator implemented in a 0.18-μm CMOS process occupies an active area of 0.27 mm2 and consumes 15.56 mA ETPL VLSI-058 Clamping Virtual Supply Voltage of Power-Gated Circuits for Active Leakage Reduction and Gate-Oxide Reliability
  • 22. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com Abstract: In an integrated circuit (IC) adopting a power-gating (PG) technique, the virtual supply voltage (VVDD) is susceptible to: 1) negative-bias temperature instability (NBTI) degradation that weakens the PG device over time and 2) temporal temperature variation that affects active leakage current (thus total current) of the IC. The PG device is sized to guarantee a minimum VVDD level over the chip lifetime. Thus, the NBTI degradation and the worst-case total current at high-temperature must be considered for sizing the PG device. This leads to higher VVDD (thus active leakage power) than necessary in early chip lifetime and/or at low temperature, negatively impacting the gate-oxide reliability of transistors. To reduce active leakage power increase and improve the gate-oxide reliability due to these effects, we propose two techniques that adjust the strength of a PG device based on its usage and IC's temperature at runtime. We demonstrate the efficacy of these techniques with an experimental setup using a 32-nm technology model in the presence of within-die spatial process and temperature variations. On an average of 100 die samples, they can reduce dynamic and active leakage power by up to 3.7% and 10% in early chip lifetime. Finally, these techniques also reduce the oxide failure rate by up to 5% across process corners over a period of 7 years. ETPL VLSI-059 10-bit 30-MS/s SAR ADC Using a Switchback Switching Method Abstract: This brief presents a 10-bit 30-MS/s successive-approximation-register analog-to-digital converter (ADC) that uses a power efficient switchback switching method. With respect to the monotonic switching method, the input common-mode voltage variation reduces which improves the dynamic offset and the parasitic capacitance variation of the comparator. The proposed switchback switching method does not consume any power at the first digital-to-analog converter switching, which can reduce the power consumption and design effort of the reference buffer. The prototype was fabricated in a 90-nm 1P9M CMOS technology. At 1-V supply and 30 MS/s, the ADC achieves an sequenced neighbor double reservation of 56.89 dB and consumes 0.98 mW, resulting in a figure-of-merit (FOM) of 57 fJ/conversion-step. The ADC core occupies an active area of only 190 × 525 μm2. ETPL VLSI-060 Spur-Reduction Frequency Synthesizer Exploiting Randomly Selected PFD Abstract: This brief presents a low-spur phase-locked loop (PLL) system for wireless applications. The low-spur frequency synthesizer randomizes the periodic ripples on the control voltage of the voltage- controlled oscillator to reduce the reference spur at the output of the PLL. A novel random clock generator is presented to perform the random selection of the phase frequency detector control for the charge pump in locked state. The proposed frequency synthesizer was fabricated in a TSMC 0.18-μm CMOS process. The proposed PLL achieved phase noise of -93 dBc/Hz with a 600-kHz offset frequency and reference spurs below -72 dBc. ETPL VLSI-061 Gain-Enhanced Monolithic Charge Pump With Simultaneous Dynamic Gate and Substrate Control Abstract: This brief presents a gain-enhanced complimentary metal-oxide-semiconductor (CMOS) charge pump (CP) circuit via dynamically controlling the gate and substrate terminals of each pMOS pass transistor. The proposed control strategy enables the CP circuit free of the threshold-voltage drops, the body effect, and the floating substrate terminals of pass devices. The on-resistance of each pass device is
  • 23. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com also reduced to improve the gain and the power efficiency of the CP circuit. Implemented in a 0.35-μm single n-well CMOS process, the proposed four-stage monolithic CP circuit can operate with a supply voltage down to 0.9 V and deliver a maximum output current of about 100 μA. The proposed CP circuit also achieves a high voltage gain of 4 with two complementary-phase nonoverlapping clock signals. ETPL VLSI-062 Embedding Repeaters in Silicon IPs for Cross-IP Interconnections Abstract: During systems-on-a-chip (SoC) integration, silicon intellectual properties (IPs) are generally regarded as blockages to long interconnections that connect different IPs. With this constraint, conventional designs are forced to place those repeaters that drive long interconnections outside the IP. These designs either lead to a longer interconnection distance requiring more repeaters or result in a longer signal delay, since the interconnection wire is not appropriately segmented by the repeaters. To solve these problems, we designed the IPs such that designers can embed the repeaters in the IP for the SoC integration. In other words, it allows the cross-IP interconnections to be routed over the IP using repeaters inserted in the IP. The design concept, physical implementation, and application examples of the embedded repeaters are described in this brief ETPL VLSI-063 RATS: Restoration-Aware Trace Signal Selection for Post-Silicon Validation Abstract: Post-silicon validation is one of the most important and expensive tasks in modern integrated circuit design methodology. The primary problem governing post-silicon validation is the limited observability due to storage of a small number of signals in a trace buffer. The signals to be traced should be carefully selected in order to maximize restoration of the remaining signals. Existing approaches have two major drawbacks. They depend on partial restorability computations that are not effective in restoring maximum signal states. They also require long signal selection time due to inefficient computation as well as operating on gate-level netlist. We have proposed a signal selection approach based on total restorability at gate-level, which is computationally more efficient (10 times faster) and can restore up to three times more signals compared to existing methods. We have also developed a register transfer level signal selection approach, which reduces both memory requirements and signal selection time by several orders-of-magnitude. ETPL VLSI-064 Test Patterns of Multiple SIC Vectors: Theory and Application in BIST Schemes Abstract: This paper proposes a novel test pattern generator (TPG) for built-in self-test. Our method generates multiple single-input change (MSIC) vectors in a pattern, i.e., each vector applied to a scan chain is an SIC vector. A reconfigurable Johnson counter and a scalable SIC counter are developed to generate a class of minimum transition sequences. The proposed TPG is flexible to both the test-per-clock and the test-per-scan schemes. A theory is also developed to represent and analyze the sequences and to extract a class of MSIC sequences. Analysis results show that the produced MSIC sequences have the favorable features of uniform distribution and low input transition density. The performances of the designed TPGs and the circuits under test with 45 nm are evaluated. Simulation results with ISCAS benchmarks demonstrate that MSIC can save test power and impose no more than 7.5% overhead for a scan design. It also achieves the target fault coverage without increasing the test length.
  • 24. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com ETPL VLSI-065 Effective and Efficient Approach for Power Reduction by Using Multi-Bit Flip-Flops Abstract: Power has become a burning issue in modern VLSI design. In modern integrated circuits, the power consumed by clocking gradually takes a dominant part. Given a design, we can reduce its power consumption by replacing some flip-flops with fewer multi-bit flip-flops. However, this procedure may affect the performance of the original circuit. Hence, the flip-flop replacement without timing and placement capacity constraints violation becomes a quite complex problem. To deal with the difficulty efficiently, we have proposed several techniques. First, we perform a co-ordinate transformation to identify those flip-flops that can be merged and their legal regions. Besides, we show how to build a combination table to enumerate possible combinations of flip-flops provided by a library. Finally, we use a hierarchical way to merge flip-flops. Besides power reduction, the objective of minimizing the total wirelength is also considered. The time complexity of our algorithm is $Theta({rm n}^{1.12})$ less than the empirical complexity of $Theta({rm n}^{2})$. According to the experimental results, our algorithm significantly reduces clock power by 20–30% and the running time is very short. In the largest test case, which contains 1 700 000 flip-flops, our algorithm only takes about 5 min to replace flip-flops and the power reduction can achieve 21%. ETPL VLSI-066 Reconfigurable Accelerator for the Word-Matching Stage of BLASTN Abstract: BLAST is one of the most popular sequence analysis tools used by molecular biologists. It is designed to efficiently find similar regions between two sequences that have biological significance. However, because the size of genomic databases is growing rapidly, the computation time of BLAST, when performing a complete genomic database search, is continuously increasing. Thus, there is a clear need to accelerate this process. In this paper, we present a new approach for genomic sequence database scanning utilizing reconfigurable field programmable gate array (FPGA)-based hardware. In order to derive an efficient structure for BLASTN, we propose a reconfigurable architecture to accelerate the computation of the word-matching stage. The experimental results show that the FPGA implementation achieves a speedup around one order of magnitude compared to the NCBI BLASTN software running on a general purpose computer. ETPL VLSI-067 Architecturally Homogeneous Power-Performance Heterogeneous Multicore Systems Abstract: Dynamic voltage and frequency scaling (DVFS), a widely adopted technique to ensure safe thermal characteristics while delivering superior energy efficiency, is rapidly becoming inefficient with technology scaling due to two critical factors: 1) inability to scale the supply voltage due to reliability concerns and 2) dynamic adaptations through DVFS cannot alter underlying power hungry circuit characteristics, designed for the nominal frequency. In this paper, we show that DVFS scaled circuits substantially lag in energy efficiency, by 22%–86%, compared to ground up designs for target frequency levels. We propose architecturally homogeneous power-performance heterogeneous multicore systems, a fundamentally alternate means to design energy efficient multicore systems. Using a system level computer-aided design (CAD) approach, we seamlessly integrate architecturally identical cores, designed for different voltage-frequency domains. We use a combination of standard cell library based CAD flow and full system architectural simulation to demonstrate 11%–22% improvement in energy efficiency using our design paradigm.
  • 25. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com ETPL VLSI-068 Active Filter-Based Hybrid On-Chip DC–DC Converter for Point-of-Load Voltage Regulation Abstract: An active filter-based on-chip DC–DC voltage converter for application to distributed on-chip power supplies in multivoltage systems is described in this paper. No inductor or output capacitor is required in the proposed converter. The area of the voltage converter is therefore significantly less than that of a conventional low-dropout (LDO) regulator. Hence, the proposed circuit is appropriate for point- of-load voltage regulation for noise sensitive portions of an integrated circuit. The performance of the circuit has been verified with Cadence Spectre simulations and fabricated with a commercial 110 nm complimentary metal oxide semiconductor (CMOS) technology. The area of the voltage regulator is 0.015 ${rm mm}^{2}$ and delivers up to 80 mA of output current. The transient response with no output capacitor ranges from 72 to 192 ns. The parameter sensitivity of the active filter is also described. The advantages and disadvantages of the active filter-based, conventional switching, linear, and switched capacitor voltage converters are compared. The proposed circuit is an alternative to classical LDO voltage regulators, providing a means for distributing multiple local power supplies across an integrated circuit while maintaining high current efficiency and fast response time within a small area. ETPL VLSI-069 CusNoC: Fast Full-Chip Custom NoC Generation Abstract: We propose a full-chip synthesis methodology to construct custom network-on-chips (CusNoCs) for NoC-based systems. The proposed scheme generates irregular network topologies for application-specific designs with known communication demands. In this method, processors and the communication architecture can be synthesized simultaneously in the floorplanning process, and thus it is called CusNoC. CusNoC synthesizes CusNoC in two steps. The target network topology is first generated based on communication analysis. Processing elements are partitioned into groups such that the utility of routers will be maximized if a router is assigned to each group. In this way, the number of routers passed by a packet, or hops, is minimized, and so is the power consumption in the network. The final network topology is formed by properly connecting these groups. A wirelength-aware floor planning is then carried out to optimize circuit size as well as wirelength. Experimental results show that CusNoC produces custom NoCs with better performance than previous methods while the computation time is significantly shorter. This method is also more scalable, which makes it ideal for complicated systems. ETPL VLSI-070 Cooperating Virtual Memory and Write Buffer Management for Flash-Based Storage Systems Abstract: Flash memory is becoming the preferred choice of secondary storage in mobile devices and embedded systems. The performance of Flash memory is dictated by asymmetric speeds of read and write, limited number of erase times, and the absence of in-place updates. To improve the performance of Flash-based storage systems, the write buffer has been provided in Flash memories recently. At the same time, new virtual memory management strategies have been proposed in recent studies that consider the characteristics of Flash memory. Currently, approaches on these two memory layers are considered separately, which fail to explore the full potential of these two layers. In this paper, we propose cooperative management schemes for virtual memory and write buffer to maximize the performance of Flash-memory-based systems. Management on virtual memory is designed to exploit write buffer status via reordering of the write sequences. The proposed write buffer management scheme works seamlessly with the proposed virtual memory management scheme. Experimental results show that significant
  • 26. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com improvement in I/O performance and reduction of the number of erase and write operations can be achieved compared to the state-of-art approaches. ETPL VLSI-071 MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM Systems Abstract: This paper presents an multipath delay commutator (MDC)-based architecture and memory scheduling to implement fast Fourier transform (FFT) processors for multiple input multiple output- orthogonal frequency division multiplexing (MIMO-OFDM) systems with variable length. Based on the MDC architecture, we propose to use radix-$N_{s}$ butterflies at each stage, where $N_{s}$ is the number of data streams, so that there is only one butterfly needed in each stage. Consequently, a 100% utilization rate in computational elements is achieved. Moreover, thanks to the simple control mechanism of the MDC, we propose simple memory scheduling methods for input data and output bit/set-reversing, which again results in a full utilization rate in memory usage. Since the memory requirements usually dominate the die area of FFT/inverse fast Fourier transform (IFFT) processors, the proposed scheme can effectively reduce the memory size and thus the die area as well. Furthermore, to apply the proposed scheme in practical applications, we let $N_{s}=4$ and implement a 4-stream FFT/IFFT processor with variable length including 2048, 1024, 512, and 128 for MIMO-OFDM systems. This processor can be used in IEEE 802.16 WiMAX and 3GPP long term evolution applications. The processor was implemented with an UMC 90-nm CMOS technology with a core area of 3.1 ${rm mm}^{2}$. The power consumption at 40 MHz was 63.72/62.92/57.51/51.69 mW for 2048/1024/512/128-FFT, respectively in the post-layout simulation. Finally, we analyze the complexity and performance of the implemented processor and compare it with other processors. The results show advantages of the proposed scheme in terms of area and power consumption. ETPL VLSI-072 Current-Reused 2.4-GHz Direct-Modulation Transmitter With On-Chip Automatic Tuning Abstract: This paper presents the design, analysis, and experimental verification of a self-calibrating current-reused 2.4-GHz direct-modulation transmitter for short-range wireless applications. The key contributions are the design/analysis of a stacked power amplifier (PA)/voltage-controlled oscillator (VCO) architecture, the nonlinear frequency-dependent analysis of a Gilbert-cell-based root-mean-square detector, and an on-chip $LC$-tank calibration circuit that needs no analog-to-digital convertor (ADC)/digital signal processor. The stacked architecture reduces the number of required regulators, utilizes supply headroom effectively, and allows for an “ADC-less” calibration loop that can dynamically tune the PA center frequency by sensing the transmitted signal. The very nature of direct-modulation architecture obviates additional high-purity signal generators, reducing complexity and allowing online calibration. The system was implemented in TSMC 0.18 $mu{rm m}$ CMOS, occupies 0.7 ${rm mm}^{2}~({rm TX})+0.1~{rm mm}^{2}$ (self-tuning), and was measured in a QFN48 package on an FR4 PCB. Automatically correcting PA/VCO tank misalignment in this case yielded ${>}{rm 4}~{rm dB}$ increase in output power. With the automatic tuning active, the transmitter delivers a measured output power ${>}{rm 0}~{rm dBm}$ to a 100-$Omega$ differential load, and the system consumes 22.9 mA from a 1.8-V core-circuit supply.
  • 27. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com ETPL VLSI-073 Reconfigurable Adaptive Singular Value Decomposition Engine Design for High- Throughput MIMO-OFDM Systems Abstract: Singular value decomposition (SVD) is an optimal method to obtain spatial multiplexing gain in multi-input multi-output (MIMO) channels. However, the high cost of implementation and high decomposing latency of the SVD restricts its usage in current wireless communication applications. In this paper, we present a complete adaptive SVD algorithm and a reconfigurable architecture for high- throughput MIMO-orthogonal frequency division multiplexing systems. There are several proposed architectural design techniques: reconfigurable scheme, division-free adaptive step size scheme, early termination scheme, and data interleaving scheme. The reconfigurable scheme can support all antenna configurations in a MIMO system. The division-free adaptive step size and early termination schemes are used to effectively reduce the decomposing latency and improve hardware utilization. The data interleaving scheme helps to deal with several channel matrices concurrently. Besides, we propose an orthogonal reconstruction scheme to obtain more accurate SVD outputs, and then the system performance will be greatly enhanced. We apply our SVD design to the IEEE 802.11 n applications. This design is implemented and fabricated in UMC 90 nm 1P9M CMOS technology. The maximum operating frequency is measured to be at 101.2 MHz, and the corresponding power dissipation is at 125 mW. The core size is 2.17 ${rm mm}^{2}$ and the die size occupies 4.93 ${rm mm}^{2}$. The chip result shows that the average latency is only 0.33% of the wireless local area network coherence time. Hence, the proposed reconfigurable adaptive SVD engine design is very suitable for high-throughput wireless communication applications. ETPL VLSI-074 The LUT-SR Family of Uniform Random Number Generators for FPGA Architectures Abstract: Field-programmable gate array (FPGA) optimized random number generators (RNGs) are more resource-efficient than software-optimized RNGs because they can take advantage of bitwise operations and FPGA-specific features. However, it is difficult to concisely describe FPGA-optimized RNGs, so they are not commonly used in real-world designs. This paper describes a type of FPGA RNG called a LUT-SR RNG, which takes advantage of bitwise xor operations and the ability to turn lookup tables (LUTs) into shift registers of varying lengths. This provides a good resource–quality balance compared to previous FPGA-optimized generators, between the previous high-resource high-period LUT-FIFO RNGs and low-resource low-quality LUT-OPT RNGs, with quality comparable to the best software generators. The LUT-SR generators can also be expressed using a simple C++ algorithm contained within this paper, allowing 60 fully-specified LUT-SR RNGs with different characteristics to be embedded in this paper, backed up by an online set of very high speed integrated circuit hardware description language (VHDL) generators and test benches. ETPL VLSI-075 Exploring the Use of Emerging Nonvolatile Memory Technologies in Future FPGAs, Abstract: As new nonvolatile memory technologies become increasingly mature, there has been a growing interest on investigating their use in future field-programmable gate arrays (FPGAs). Similar to existing FPGAs with embedded Flash memory, future FPGAs can embed these new nonvolatile memories to persistently store configuration data. By comparing with prior work, we first propose the more appropriate design style for new nonvolatile configuration data storage memory. Moreover, this brief studies a dynamic random-access memory (DRAM)-based FPGA design strategy enabled by high-
  • 28. Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com density embedded nonvolatile memory. Existing FPGAs do not use on-chip DRAM cells for configuration data storage mainly because DRAM self-refresh involves destructive DRAM read. This problem can be solved, if we use embedded nonvolatile memory as primary FPGA configuration data storage and externally refresh on-chip DRAM cells. Analysis and simulations have been carried out to demonstrate the potential advantages of such a design strategy. ETPL VLSI-076 Broadside and Skewed-Load Tests Under Primary Input Constraints Abstract: Tester limitations may impose certain constraints on the primary input vectors applicable as part of a two-pattern test for delay faults. Under these constraints, the primary input vectors may be held constant, or the second primary input vector of a test may be obtained by a single shift of a scan chain relative to the first. The goal of this brief is to study the differences in achievable transition fault coverage between various primary input constraints that are similar to the commonly used ones of holding or shifting primary input vectors. This brief also studies the possibility of combining the constraints in order to increase the transition fault coverage. The combination requires a fixed and circuit-independent hardware structure similar to the case where shifting of primary input vectors is used. This study is done using test sets that consist of both broadside and skewed-load tests in order to maximize the transition fault coverage. ETPL VLSI-078 Supply Noise Suppression by Triple-Well Structure Abstract: This brief discusses the impact of twin- and triple-well structures on power supply noise, and a substrate model for simulating the power supply noise. We observed $V_{rm ss}$ noise reduction by the resistive network of the p-substrate and $V_{rm dd}$ noise reduction by the junction capacitance of a triple-well structure on a 90-nm test chip. Measurement results also showed that the total noise reduction of a triple-well structure is superior to that of a twin-well structure. The measurement results correlate well with the results obtained from the power supply noise simulation using a hierarchical resistive mesh model. Our simulation-based verification indicates that in common CMOS design, a triple-well structure can reduce the power supply drop by 10%–40% or the decoupling capacitance area by 5%–10%. We also verified that supply drop sensitivity to variation of the well junction capacitance is sufficiently small and that supply noise reduction using a triple-well structure is robust to process variation. ETPL VLSI-079 Software-Based Self Test Methodology for On-Line Testing of L1 Caches in Multithreaded Multicore Architectures Abstract: The flexibility that allows the application of different March tests is a critical requirement for on-line testing of memory arrays. In a previous study, we have introduced a low-cost software-based self test (SBST) program development methodology for on-line periodic testing of L1 caches that utilizes direct cache access (DCA) instructions and exploits the native monitoring hardware available in modern architectures. In this brief, we discuss a multithreaded optimization of this SBST methodology that exploits the thread level parallelism of multithreaded multicore architectures in order to speed up March test execution by elaborating the low level multiple sub-bank cache organization. The effectiveness of the methodology and its multithreaded optimization is demonstrated on the L1 caches of OpenSPARC T1 processor. Our results showed a speedup of more than 1.7 when the multithreaded optimization is applied and an acceptable performance overhead (less than 11%), even in intensive periodic test scenarios.