This document discusses testing strategies for system-on-chip (SoC) integrated circuits and their embedded memories. It provides an overview of built-in self-test (BIST) techniques for testing logic blocks, processor cores, and memory arrays on SoCs. Specific BIST strategies discussed include memory BIST (MBIST) to test embedded memories using march test algorithms, and memory BIST with self-repair (MBISR) to repair faulty memory locations using redundant rows or columns. The document also briefly describes concurrent autonomous self-test (CASP) and virtualization-assisted concurrent autonomous self-test (VAST) approaches for concurrently testing core and non-core elements of SoCs without degrading performance.
2. without replacing with the redundant one if the number of
faults in the entire array is within tolerable limits.
As mentioned earlier, the SoCs though predominantly
contain memory arrays and the testing of memories is vital
for higher system reliability, the functional testing of core
and noncore elements of SoCs are also essential for correct
functionality. The recent research literature has presented
the concurrent online testing of single/multiple- core and
noncore components of SOCs without degradation in its
performance and system downtime [16-18].
The next section gives a brief outline about the Built-
In-Self-Test methodology used for testing of any system.
The Built-in-Self Test/Repair techniques and their
architectures for self testing/repairing of embedded
memories are presented in section 3. The Concurrent
Autonomous Self-Test using Stored Patterns (CASP) and
Virtualization-Assisted Concurrent Autonomous Self-Test
(VAST) algorithms used for testing of Core and noncore
elements in SoCs are briefed in Section 4.
II. BUILT-IN SELF TEST ARCHITECTURE
Built-In Self-Test (BIST) mechanism [1,10] used for
testing of manufactured ICs has the capability of testing the
circuit itself by incorporating the Automatic Test Pattern
Generator (ATPG) and Output Response Analyzer (ORA)
on-chip within a marginally increased logic overhead and
Silicon area. The BIST techniques are classified into (i) On-
line BIST and (ii) Off-line BIST. Using BIST, the circuit
can be tested both on-line and off-line. In on-line BIST, the
testing will be carried during the interval when normal
operation carries. The online testing however is capable of
detecting the faults in the circuit, cannot be used for fault
diagnosis. In off-line BIST, the circuit is normally tested at
the time of system boot up and/or during system on reset
period by executing self-test program with pre-defined test
vectors stored in non-volatile memories. The offline BIST
may also be carried out periodically by suspending the CUT
from its functional mode of operation. The off-line testing
being a periodic testing does not guarantee the detection of
temporary and soft-error faults. However, the offline BIST
are normally preferred due to its fault diagnosis capability
and the possible repair work.
A generalized BIST architecture shown in Figure 1
consists of a Test Pattern Generator (TPG), Circuit-Under-
Test (CUT) and Output Response Analyzer (ORA). The
TPG produces test vectors for the CUT during off-line test.
The ORA compares the CUT output with the reference
outputs to check whether the circuit is faulty or not. The
reference patterns may be stored in a non-volatile storage.
For reducing the size off-line storage memory, the reference
patterns and the corresponding CUT output responses may
be compressed and/or compacted and stored. The BIST
controller produces the required control and timing signals
for proper functioning of BIST architecture.
Fig.1. Built-In Self Test Architecture
III. MEMORY BUILT-IN SELF TEST/REPAIR
(MBISTR) ARCHITECTURE
As mentioned earlier, a major portion of Silicon area
in SoCs is being dominated by on-chip memories. The
integration of high capacity memories in a single Silicon
chip has resulted in the appearance of various faults such
as Stuck-at Faults (SAFs), Transition Faults (TFs),
Coupling Faults (CFs), Address Decoding Faults (ADFs)
and Physical neighborhood pattern-sensitive faults
(NPSFs). These commonly occurring faults in memory
arrays can be effectively addressed using Memory BIST
(MBIST) techniques, which is briefly described in the next
sub-section. The subsequent sub-section discusses the
Memory Built-In Self Repair (MBISR) technique which
allows the test and repair of memory with the help of
redundant rows or columns.
A. Memory Built-In Self Test (MBIST) Architecture
The memory devices have a densely packed memory
cells in two dimensional (2-D) structures called memory
arrays as core memory along with Row- and Column-
address decoders and sense amplifiers. With memory BIST
architecture shown in Figure 2, the testing of the entire
memory can be implemented on-chip. The self-testing of
embedded memories [5,6] requires the following test
hardware equipped with the memory-under test (MUT):
a. An Address generator for off-line BIST operation.
b. A MUX circuit feeding the memory during self-
test from the controller.
c. A Comparator for response checking.
d. MBIST Controller.
Fig.2. Memory Built-In Self Test (MBIST) Architecture
200
3. The address generator (a counter or an LFSR) produces
the addresses either in a pseudo random or in a predefined
sequence for testing the memory off-line. The MUX at the
input allows the selection of either the input address or the
address being generated by the address generator during
normal mode or test mode of operation respectively.
During BIST mode of operation, a predefined data pattern
stored in nonvolatile storage is written at an off-line
address location generated by address generator and then
read the data from the same location. The readout data is
compared with the expected pattern stored in the Non-
volatile storage for equality. The comparator detects the
presence of faults at that address location if there is any
mismatch between the data readout from the memory as
compared to the expected data. An FSM based design
approach may be preferred to realize MBIST controller for
the generation of various control signals required for
MBIST operation. However, a Micro-code based MBIST
controller [21] adds the flexibility to run various March
algorithms in the same BIST hardware by changing the
instructions stored in the microcode storage unit.
The architecture of memory is a 2-dimensional regular
array and hence Algorithmic Test Sequences (ATS) may
be adopted for the functionality testing of memories. The
March-based algorithms [4, 11] are being extensively used
for Built-in Self Test of Memory arrays. The march-based
algorithms involve a finite sequence of write and read
operations, called March elements. A sequence of March
operations are performed to the memory cells to check
whether the given memory cell is fault-free or not. During
March operation, the memory cells may be addressed
either in an ascending or descending order. Table 1 shows
the general notations used for memory addressing and
memory read/write operations in March algorithms.
TABLE I. DESCRIPTION OF NOTATIONS
Notation Description
R0/1 The response 0/1 from a cell during memory read operation.
W0/1 The data 0/1 written into a cell during memory write operation.
↕ Addressing with any order
↑ Addressing with increasing order
↓ Addressing with decreasing order
N Number of address locations/memory cells
The previous literatures have proposed various
memory self- test algorithms as listed in Table 2. These
March algorithms [4-7] differ in terms of number of March
elements, test sequences and targeted faults in the memory.
In general, the number and the order of memory read/write
operations depend on the targeted faults. To illustrate this,
a comparison is done for two March algorithms March A
and March X for a 1Kx8 memory array. The March A
algorithm requires 15,360 read/write operations and is
capable of detecting Stuck-at faults, Address decoding
faults and Transition faults in the memory. In contrast to
this, the March X algorithms requires only 6,144 read/write
operations, but it is capable of detecting only coupling
faults.
B. Memory Built-In Self Repair (MBISR) Architecture
The MBIST is capable of detecting the presence of
faults in the memory array so that the faulty memory block
may be replaced by the spare units available within the
system-on chip. The MBISR [3,8,9,14,15,27,28,30]
supports fault-diagnosis by self-repairing the Memory-
Under Test in case it is found to be a faulty. Various self-
test mechanisms discussed in the previous section may be
used for detection of faults in memory array. The MBISR
architecture shown in Figure 3 consists of a Built-In Self-
Test (BIST) module, a redundancy logic, a spare location
locator and MBISR controller. The Built-In Self-Test of
memory arrays is performed using one of the efficient
March algorithms. Once one or more memory locations are
found to be faulty locations, the faulty addresses are stored
in redundancy logic. The redundancy logic can be either
spare columns or rows [22, 23] or a block of main memory
not being used as normal memory during BISR operation.
The spare location allocator maps the input address to one
of the address location in the redundant area. During
memory write operation, the data will be written in both
user memory as well as the mapped address location of the
redundant array simultaneously. The multiplexer placed at
the output will accept correct data either from main
memory or from redundancy array based on the absence or
presence of faults in the memory location being addressed
respectively. The BISR controller generates various control
signals for proper operation of the BISR circuitry. The
generalized BISR procedure can be indicated by the flow
chart shown in Figure 4.
Fig.3. Memory Built-In Self Repair (MBISR) Architecture
The capability and complexity of BISR lies on the size
of the redundant memory array. The large redundant array
however enhances the number of faulty memory locations
it can tolerate but increases the area and complexity.
201
4. TABLE II. COMPARISONS OF VARIOUS MARCH ALGORITHMS
S.
No
March
Algorithm
No. of
march
elements
No. of
march steps
Test Sequence Targeted Faults
1 ATS 4N 3 {↕W0, ↕(R0, W1), ↕R1} SAFs, ADFs
2 MATS 4N 3 {↕W0, ↕(R0, W1), ↕R1} SAFs, ADFs
3 MATS+ 5N 3 {↕W0, ↑(R0, W1), ↓(R1,W0)} SAFs, ADFs
4 MATS++ 6N 3 {↕W0, ↑(R0, W1), ↓(R1,W0, R0)} SAFs, ADFs, TFs, CFs
5 March A 15N 5
{↕W0, ↑(R0,W1,W0, W1), ↑(R1,W0,W1), ↓((R1,W0,W1, W0),
↓(R0,W1, W0)}
SAFs, ADFs, TFs
6 March B 17N 5
{↕W0, ↑(R0,W1,R1,W0,R0,W1), ↑(R1,W0,W1),
↓((R1,W0,W1,W0), ↓(R0,W1,W0)}
SAFs, ADFs, TFs, CFs
7 March C 11N 7 {↕W0, ↑(R0,W1), ↑(R1,W0), ↕R0, ↓(R0,W1), ↓(R1,W0), ↕R0 } SAFs, ADFs, TFs, Some CFs
8 March X 6N 4 {↕W0, ↑(R0,W1), ↓(R1,W0), ↕R0 } CFs
9 March Y 8N 4 {↕W0, ↑(R0,W1, R1), ↓(R1,W0, R0), ↕R0 } SAFs, ADFs, TFs, CFs
10 March LA 22N 6
{↕W0, ↑(R0,W1,W0,W1, R1), ↑(R1,W0,W1,W0, R0),
↓((R0,W1,W0,W1,R1), ↓((R1,W0,W1,W0, R0), ↓R0}
SAFs, ADFs, TFs, CFs
11 March SR+ 18N 6
{↓W0, ↑(R0,R0,W1,R1,R1,W0,R0), ↓R0,↑W1,
↓(R1,R1,W0,R0,R0,W1,R1), ↑R1}
SAFs, ADFs, TFs, CFs
12 March SS 22N 6
{↕W0, ↑(R0,R0,W0,R0,W1), ↑(R1,R1,W1,R1,W0)
↓(R0,R0,W0,R0,W1), ↓(R1,R1,W1,R1,W0), ↕R0}
SAFs, ADFs, TFs, CFs
Fig.4. Flow chart indicating BISR operation
Fig.5. Conceptual Block Diagram of Shared BISR scheme
In order to reduce the Silicon area overhead being posed
by BISR Logic, the BISR circuitry can be shared by
multiple RAMs of homogenous or heterogeneous nature to
test them in parallel[24,25, 26]. The BISR logic may be
shared either serially or in parallel by multiple RAMs. In
serially shared BISR [13] scheme conceptualized in Figure
5, only one RAM module can be tested and repaired at a
time thereby taking long test and repair time for SOCs
having multiple memory chips. The parallel shared BISR
scheme supports the testing and repair of multiple RAMs
simultaneously, but it results in more area cost. A test
wrapper [29,31] provides a standardized interface between
MUT and MBIST controller so as to enable at-speed test
and repair of memories.
IV. BIST ARCHITECTURES FOR TESTING SYSTEM-
ON-CHIPS/ MULTI-CORE SYSTEMS
The System-On-Chip architecture contain
many/multiple-core processors along with many noncore
elements such as memory controllers, I/O controllers etc.
The SOC architecture also includes analog/mixed signal
modules such as ADCs and DACs in order to interface the
analog blocks with data path units. The increased complexity
of the SOCs causes the testing process more complex and
costly affair. The self-test mechanisms that are applicable to
memory testing may not be suited for logic testing because
of non-regular structures of Logic-Under-Test. Moreover,
the SOCs require concurrent on-line self test without
significant performance degradation and system downtime
because any module cannot be detached at any time from its
normal functionality. The test compression and scheduling of
each component of SoC has become a necessary and
challenging task for the improvement of its reliability and
performance. The dynamic method of test compression and
scheduling [32] is more flexible and viable if the structural
information of the core is available. This kind of test
requirement is facilitated by proper resource sharing and
smart backups. Latest research works have proposed two on-
line self test methodologies, which is briefly presented in the
202
5. next two subsections, for testing of core as well as noncore
elements in many/multi-core processors.
A. Concurrent Autonomous Self-Test using Stored Patterns
The CASP architecture allows a self-test of a system
on-fly during its normal mode of operation. The CASP
algorithm [16,17] applied to SoCs facilitates online self-
testing of both processor cores as well as noncore
components, e.g., cache controllers, DRAM controllers, and
I/O controllers. The online self-test of noncore components
becomes essential by the fact that the noncore components
contribute a significant portion of SoC area.
The CASP architecture improves the fault coverage by
storing high-quality test vectors in an off-chip non-volatile
storage. The CASP provides a cost-effective concurrent on-
line self-test through the following special hardware features
1. Resource Reallocation and Sharing (RRS); 2. no-
performance-impact testing; and, 3. Smart backups.
The RRS scheme identifies multiple instances of
components having “similar” functionality. During online
self-test, the workload of instance-under-test can be
reallocated to another identical component in addition to its
own workload. The self-testing of two modules whose
functionalities are independent to each other may be
accomplished concurrently without causing any impact in
overall system performance. This technique is referred as no-
performance-impact testing. An additional backup module,
called smart backup may be provided in the system which
can provide backup to the multiple instances during self-test.
Moreover, the backup module need not be operated
concurrently with the original module and can be turned off
whenever the original module is operating under normal
mode. This will reduce the area and power penalty
significantly as compared to the conventional redundancy
and make a fine balance between performance and cost.
B. Virtualization-Assisted Concurrent Autonomous Self-
Test
VAST [18] facilitates the concurrent self-testing of a
multi-/many-core system with inbuilt support for failure
prediction, failure detection and diagnosis and self-healing
which overcomes some of the reliability challenges such as
component aging and early-life failures [15]. The VAST
architecture requires the following two additional
hardware/software modules: VAST controller and Virtual
Machine Monitors (VMMs) as shown in figure 6.
The VAST controller consisting of a processor core
with virtualization support is responsible for test scheduling,
test instrumentation and initiation of system recovery upon
hard failure. Virtualization uses a software abstraction layer,
called virtual machine monitors (VMMs) to facilitate on-
line testing of many-/multi processor cores. The testing of
multiple cores may be achieved in one of two ways (i)
migrate-and-test, in which the OS migration is performed
from the core-under-test to a spare core (ii) stop-and-test,
where the normal execution of the core-under-test is
suspended. VAST -supported self-test policies offers
extremely thorough on-line self-test of SOCs with
negligible performance impact.
Fig.6. VAST Architecture
V. CONCLUSIONS
It being the fact that today’s Si Technology has realized
the system containing number of VLSI chips of one or
multiple Processors and all other associated hardware
modules including memory IPs are integrated in a single
SoC. Built-In Self-Test mechanism facilitates the self-test of
the chip by itself. The SoC consists of memory blocks of
regular 2-D structures as well as logic area with irregular
logic structures. Therefore, different approaches have been
suggested for testing embedded memories and logic
elements (core as well as noncore elements) of SOCs.
As the majority of SoC area has been occupied by the
on-chip RAM modules, the MBIST architecture facilitates
the off-line testing of the memory blocks by itself without
depending on external Test Equipment. The March
algorithms through a number of test sequences can target the
various faults in memory arrays. Recently developed March
algorithms require more March elements for higher fault
coverage at the cost of increased complexity and the latency
of BIST circuitry. The microcode based BIST circuitry adds
the design flexibility to run various March algorithms to run
in the same BIST hardware.
The MBISR provides fault diagnosis of the embedded
memories. MBISR schemes support the repair of memory
blocks with the help of the integrated redundant area in case
the memory-under test is detected to have faults in it. The
practical implementation in several research works has
shown that MBISR reduces the overall performance
degradation and system downtime due to its on-chip repair
capability. The practical implementation presented in the
research work [8] shows that 4K × 32 SRAM with BISR
circuitry based on 55nm CMOS process may occupy an
additional area overhead of 20% and it can work upto a
clock speed of 150 MHz. The increased logic overhead due
to in-built repair circuitry can be limited by way of sharing a
single BISR circuitry among multiple RAMs integrated in
SoCs.
203