SoC层次化测试方法

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

D E S I G N -T O -S I L I C O N
W H I T E P A P E R
DIVIDE AND CONQUER:
HIERARCHICAL DFT FOR SOC DESIGNS
RICK FISETTE, MENTOR GRAPHICS
INTRODUCTION
Large System on Chip (SoC) designs present many challenges to all design disciplines, including design-for-
test (DFT). By taking a divide-and-conquer approach to test, significant savings in tool runtime and memory
consumption can be realized. This whitepaper describes the basic components of a hierarchical DFT
methodology, the benefits that it provides, and the tool automation that is available through Mentor’s Tessent
tool suite.
WHAT IS HIERARCHICAL DFT?
For large SoC devices, the front-end and physical design practices are typically performed at a core level.
Whether it’s called a core, block, tile, macro, or module they all refer to the level of hierarchy at which design
tasks are completed. These completed cores are then integrated into the SoC. Hierarchical DFT refers to the
practice of implementing all DFT with respect to these same core hierarchical boundaries. The test patterns
for these cores are then applied individually or in groups from the SoC level.
With hierarchical DFT, once a core design is complete it means it’s DFT is complete as well, and that it includes
a set of patterns that can be used to test the core regardless of how it gets integrated into an SoC. Cores can
be tested individually, in groups, or all together; whatever best suits the test plan and available pin resources
in the SoC. Interconnect between cores and chip-level glue logic are then tested separately and the coverage
for all test modes is combined into a single comprehensive coverage report.
WHY ADOPT A HIERARCHICAL DFT METHODOLOGY?
A hierarchical DFT methodology solves many issues that are often encountered with the insertion of DFT
structures and running ATPG for large SoCs. Some of the most common and compelling problems that can be
mitigated with hierarchical DFT include the following:
LONG ATPG RUNTIMES
As netlist sizes grow so does the runtime for scan ATPG. It is not unusual for pattern generation to take many
hours or even many days depending on the fault model and design size.
LARGE MEMORY FOOTPRINT
The memory required for loading an entire SoC design into the workstation for ATPG can require 10s of Gb, if
not more than 100 Gb. This severely limits how many machines can be used, if indeed, any machines have the
required amount of memory. Even if those machines are available then there is often competition with other
design disciplines (e.g. physical design/verification) for using these resources.
DFT IN THE CRITICAL PATH TO TAPEOUT
Traditional DFT methodologies require that the full-chip netlist be finalized before production test patterns
can be generated. This requirement places DFT squarely in the critical path to tapeout. To further complicate
this situation any late (even minor) changes made prior to tapeout to address functional bugs will mean
throwing away any existing patterns and restarting that process, potentially delaying tapeout.
LIMITED CHIP PINS FOR DFT
It is common to have very large SoCs that have relatively few chip-level pins available for DFT purposes.
Especially with core-based designs where the number of cores can far exceed the number of chip-level pins.
One potential solution is to concatenate chains from one core to another, but this can result in very long shift
cycles and create dependencies between cores that make the cores harder to reuse. It also still requires that
the full-chip netlist be completed before patterns can be generated. The same can be said for any approach
that puts compression logic at the chip-level in order to drive multiple cores with many scan chains. This
arrangement also has a negative impact on compression logic because of the resulting very high chain-to-
channel ratio.
OVERVIEW OF THE HIERARCHICAL DFT FLOW
The best way to address the challenges of testing large SoCs is to take a divide and conquer approach that
cuts the task down into smaller, more manageable pieces. The following high-level description highlights the
key DFT tasks required for hierarchical DFT and how they fit into the overall flow.
CORE-LEVEL WRAPPER CHAIN INSERTION
The core is the least common denominator in a hierarchical flow and is the lowest hierarchical level at which
test patterns are applied. What makes testing an individual core possible, regardless of how it’s integrated into
a higher-level design, is a wrapper chain. Much like a boundary scan chain, the core wrapper chain provides
guaranteed control for all inputs of a core and guaranteed observation of the outputs. As long as you can
access the wrapper chain you can test that core without compromising test coverage.
The wrapper chain is a key concept and is the foundation upon which the hierarchical DFT methodology is
built. When testing the contents of the core, otherwise known as Internal mode, the wrapper chain is
configured to launch values into the core and capture responses coming out, as illustrated in Figure 1.
The External mode of the core reconfigures the wrapper chain to launch values from the outputs to test chip-
level glue logic and interconnect while inputs capture the responses, as shown in Figure 2.
The wrapper chain therefore delineates between the logic tested in Internal mode and the logic tested in
External mode. In addition to the wrapper chains, it is also necessary to insert all scan chains, compression
logic, and test control logic to make it “DFT complete.”
CORE-LEVEL RETARGETABLE PATTERN GENERATION
The benefit of running ATPG at the core level for Internal mode is significantly reduced runtime and memory
footprint compared to running ATPG at the chip level. Runtime savings are design dependent, but it is not
unusual to see a 5x-10x reduction by running ATPG at the core level. Memory footprint savings of a similar
magnitude are also possible. Once the core is “DFT complete,” it is possible to generate a set of scan patterns
Figure 1: Wrapper chain configuration for Internal test
mode. Figure 2: Wrapper chain configuration for External test mode.
at the core level that can subsequently be retargeted to the chip level. The key assumption is that the core is
wrapped. Any other core-based DFT methodology that does not include wrapper chains cannot produce
retargetable patterns without seriously compromising test coverage at the core boundaries and chip-level
glue logic. Once core-level patterns have been generated, they should also be verified at the core-level much
the same as functional and physical verification are done. The outputs from this step in the flow are a set of
retargetable patterns and a fault list containing all the coverage information for that pattern set.
GRAYBOX GENERATION FOR CORE
Graybox models are intended to reduce the
memory footprint for External mode by as much as
10x or more. As previously described, External
mode only targets logic outside of a core’s wrapper
chain. Some of that logic still resides within the
boundary of the core. Since none of the logic inside
the wrapper is tested in External mode there’s no
need to include it in the core netlist for that mode.
The graybox netlist of the core removes all of the
logic tested in Internal mode and only includes
what’s needed for External mode, as shown in
Figure 3. This is another key hierarchical DFT
concept and makes it possible to test an entire SoC
design without ever having to load the full chip
netlist at any time.
The amount of core netlist reduction made
possible by using a graybox netlist varies widely based on the design. When measured by the number design instances in the graybox netlist
compared to the full netlist, it’s typical to have less
than 10% or a reduction of 10X. Table 1 shows a few examples graybox netlist reduction.
CHIP-LEVEL RETARGETING OF SCAN PATTERN
Retargeting previously generated core level patterns to the chip level is where the biggest ATPG runtime
savings are realized. Large SoC designs may require days to generate patterns from the chip level, but
retargeting patterns reduces the chip-level effort to minutes. There are many variations of how a chip-level
architecture can configure cores to be tested in Internal mode. It could be one core at a time, all cores
together or a group of cores at a time. For each grouping of cores to be tested together, you load in either a
graybox or blackbox netlist of those cores. The full netlist is not necessary because the retargeting starts at
the core boundary and works its way out to chip-level pins. First, it performs the mapping of Internal mode
scan patterns from the core pins up to the chip-level pins. In addition it must also merge the pattern sets of
multiple cores so that they can be simultaneously applied. Retargeting reduces the memory footprint of the
Figure 3: The graybox netlist of a core only includes the logic
needed for External mode test.
loaded design and eliminates the need to regenerate patterns at the chip level to test the cores. A single set of
retargeted patterns is saved for each Internal mode configuration required. The example in Figure 4 shows
how Internal mode 1 groups Cores 1 and 2 together whose patterns are retargeted and merged together.
Cores 3 and 4 are retargeted and merged together in Internal mode 2.
EXTERNAL MODE PATTERN GENERATION
Once all cores have been tested in Internal mode, all that’s
left is to test the interconnect and glue logic between the
cores and calculate the final chip-level test coverage
number. First, the chip-level netlist is loaded along with the
graybox of each core. The External mode configuration
accesses all the wrapper chains of the cores as well as any
chip-level scan chains and ATPG is run, as shown in Figure 5.
Once these patterns are completed, all of the interconnect
and glue logic has been tested, which, on top of all the
Internal mode configurations, means the entire chip has
been tested. In order to calculate the final test coverage of
the entire chip, the fault lists saved from each core’s Internal
mode pattern set are merged into the External mode fault
list. The tool then calculates the combined coverage of the
Internal modes and External mode. The final pattern set that is delivered to the ATE consists of the patterns for External mode and as many retargeted Internal mode
pattern sets as there are Internal mode configurations.
COMPONENTS OF A HIERARCHICAL DFT SOLUTION
Hierarchical DFT is not a single tool feature but rather a methodology that requires changes in how DFT is
inserted in cores, how patterns get generated and how those patterns get applied. The following sections
provide more details regarding the DFT tool automation in the Tessent tool suite that support a hierarchical
methodology.
WRAPPER CHAINS/CORE ISOLATION
Wrapper chain identification and insertion is a fundamental step in preparing cores for hierarchical test. There
are two different types of wrapper cells supported: dedicated and shared . A dedicated wrapper cell is a new
cell that is added to the design for test purposes that provides the control and observation on I/Os that is
Figure 4:
Internal test modes at the chip level.
Figure 5: External mode tests the interconnect and glue logic between cores.
required for hierarchical DFT. However, dedicated wrapper cells are not usually an ideal solution because they
add area to the design and add delay to the functional path. This added area and delay problem has been a
barrier to wider adoption of hierarchical DFT.
Tessent Scan gets around this problem by
identifying and stitching shared wrapper
cells by default. A shared wrapper cell is an
existing functional register that can also
serve to isolate a primary input/output for
test purposes. It therefore is shared for
functional as well as DFT purposes. This is
the ideal solution for wrapping a core
because there is no additional logic
required for test purpose and has no impact
on the functional path.
Any input/output that is not immediately
registered at the core boundary requires
additional analysis to identify which existing
functional flops can be used as wrapper
cells. This identification can get complicated
quickly because the tool must account for
all fanouts to registration points as well as
internal feedback paths that drive logic
clouds interacting with the I/Os. All of these situations must be identified and handled by the wrapper insertion process. The resulting wrapper
chains and scan chains might look like Figure 6. All the logic inside these wrapper chains is tested as part of
Internal mode while any logic outside the wrapper chains will be detected in External mode. Notice also that
the input wrapper chains, core chains and output chains each have their own scan shift enable signal. That is
necessary in order to achieve at-speed test coverage at the boundary of the core.
Take for example a typical at-speed test
scenario (launch off capture) as shown
in Figure 7. To detect the fault at the
input of the buffer, you need a launch
flop and a capture flop, but you also
need another flop to provide a
transition value to the launch flop.
When operating at the boundary of a
wrapped core, you do not have a flop to
provide that transition value, as shown in Figure 8. Instead, this is handled by holding the input wrapper
chain in shift mode even during the capture cycle. This
allows the scan shift path to provide the transition
value instead of the D input of the launch flop, as
shown in Figure 9.Figure 6: Shared wrapper chains.
Figure 7: At-speed testing inside the core.
Figure 8: At-speed testing at the core boundary.
At the same time, the values on the D
input of the core flops and the output
wrapper cells must be captured. Internal
test mode is defined by the fact that
input wrapper chains are constrained in
shift mode so they can launch into the
core while the rest of the registers in the
design can capture. Conversely, External
mode requires that the output wrapper
chains be held in shift mode in order to
launch at-speed to chip-level logic while
the other registers can capture. This
means you need separate control of the
input wrapper scan_enable and the
output wrapper scan_enable to successfully operate Internal and
External modes, respectively. The
identification of these various shared wrapper cell scenarios, as well as the insertion of the wrapper chain
control elements, are automatically handled by Tessent Scan.
GRAYBOX NETLIST GENERATION
The purpose of the graybox representation of a core is to reduce the memory footprint otherwise required by
DFT tools to load a full netlist. It can be used in place of the full core netlist in any situation in which only the
logic at the boundary of the core is needed. Typically, that means External mode but may also apply to scan
pattern retargeting situations. Figure 10 illustrates the comparison between a full core netlist and what is
retained for a graybox core.
What needs to be included in the graybox is primarily the wrapper chains and any combinational logic that
sits between the wrapper chains and the primary inputs/outputs of the core. This is very similar to an interface
logic model that might be used for static timing analysis (STA) in which the path between an I/O port and a
register must be analyzed. What is not included in an STA model would be internal feedback paths that also
drive clouds of logic interfacing to the I/Os. There may also be some control logic needed to put the core into
its test mode or possibly needed by other cores or chip-level logic. The typical reduction factor from the full
netlist size to the graybox is in the range of 10x to 20x, with some designs outside that range. This reduction is
dependent on how much logic must be traced through in order to identify shared wrapper cells. Tessent
TestKompress includes the ability to generate the graybox netlist.
Figure 9: At-speed testing at the boundary of a wrapped core.Figure 10: Complete core netlist compared to graybox netlist.
SCAN PATTERN RETARGETING
Mapping patterns from the boundary of a core to chip-level pins is only a small part of the overall retargeting
task. One must also take into account any inversions or added pipeline stages at the chip level and modify the
patterns accordingly. If the chip architecture is such that multiple non-identical cores can be accessed and
tested simultaneously, the pattern sets of those cores must be merged. A critical part of the retargeting
process is the design rule check (DRC) that verifies the chip’s setup conditions match the conditions during
core-level pattern generation. Without this check in place to flag setup problems early, it falls to the user to
debug the problem by simulating the faulty retargeted patterns in a chip-level simulation environment, a
much longer and difficult debug process.
Clock control and the placement of that control logic in the design hierarchy is a major consideration when
implementing pattern retargeting. The ideal solution is to have clock control logic that is programmable by
scan data located inside the core. Because the programming of the capture clock on a per pattern basis is part
of the scan data, clocking information is completely self-contained in the pattern set and is not dependent on
external clock sources. This makes it possible to merge the pattern sets of multiple cores without creating
clocking conflicts. Once clock sources are located outside the core (and are presumably shared by other cores)
you can only merge patterns for one clock at a time. Otherwise the cores being merged may require different
commonly sourced clocks for any given pattern, which would result in capture errors in one or both of the
cores. While still retargetable, the resulting patterns are less efficient than they would be if the clock control
were inside the core. Tessent TestKompress includes all of the pattern retargeting functionality described. BENEFITS OF HIERARCHICAL DFT REALIZED IN SILICON
Any change in methodology means additional effort, or at least different effort, that must be justified. In the
case of hierarchical DFT, the most notable additional effort required is the wrapper insertion. A second
addition is a change in clock control implementation for retargeting, if it is not already located inside the
cores. This additional up-front effort, though, pays dividends throughout the design process and even on
the ATE.
Users who are employing this methodology with pattern retargeting are seeing ATPG runtimes reduced by 5x
or more. As important as that reduction is, reduced cumulative time for hierarchical ATPG does not accurately
quantify the complete benefit to runtime challenges. What’s more important is when in the design schedule
ATPG occurs. With retargeting, the patterns for all the cores can be done far in advance of the completion of
the chip-level netlist. That means as soon as the chip netlist is complete, the core-level patterns already exist
and can be retargeted in just minutes instead of taking days to generate patterns from the chip level at a
critical point in the schedule. If there is a late ECO to one of the cores, then you only need to rerun ATPG for
that one core and then retarget. Generating scan patterns is no longer a gating item as you get close to
tapeout. The verification of those patterns is also considerably simplified because it is done primarily at the
core level at the time the core is completed.
The memory footprint reduction is highly design-dependent, but it’s not unusual to see a 10x reduction.
Whatever memory is required for your largest core represents the most memory you’ll need for the entire
chip. The chip-level netlist for External mode is typically very small because it comprises mostly graybox
models, which are usually 1-5% the size of the full core netlist. There are a couple of aspects to why this
memory reduction is advantageous. It opens up quite a few more multi-processor machines to work on ATPG
that would otherwise have all their memory consumed before all the CPUs can be used. This means you can
take better advantage of distributed and multi-threaded processing without running into memory limitations.
You also will no longer have to compete for the biggest machines with other design disciplines like physical
design and physical verification, which usually require lots of memory.
TECH12050-w MGC 05-14F o r t h e l a t e s t p r o d u c t i n f o r m a t i o n , c a l l u s o r v i s i t :
©2014 Mentor Graphics Corporation, all rights reserved. This document contains information that is proprietary to Mentor Graphics Corporation and may be duplicated in whole or in part by the original recipient for internal business purposes only, provided that this entire notice appears in all copies. In accepting this document, the recipient agrees to make every reasonable effort to prevent unauthorized use of this information. All trademarks mentioned in this document are the trademarks of their respective owners.
A less intuitive advantage to hierarchical DFT is that pattern count (and consequently test time) is often
reduced by as much as 2x. This can be attributed to the fact that the limited scan channel resources no longer
need to be divided across the entire chip. Instead, by breaking up the testing into smaller pieces, all of those
resources can be dedicated to testing the individual cores thereby improving efficiency.
Diagnosis also benefits from hierarchical DFT with pattern retargeting capabilities. Being able to map chip
failures back to the core level allows you to run diagnosis at the core level rather than the full chip. Just like
ATPG, the runtime is reduced dramatically.
CONCLUSION
With some up-front design effort and planning, the biggest challenges of testing large SoCs can be addressed
with a hierarchical DFT methodology. Implementation of the methodology is greatly assisted by the
automation now available in the Tessent tool suite for the most important design tasks.。

相关文档
最新文档