[2024]Digital Global Overview Report 2024 Meltwater.pdf
Implementing a Dual-Mode GPS Receiver
1. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER
Implementation Methodology for Dual-Mode GPS Receiver
David Tester1; J Young, C Atkinson, T Ryan2
GPS functionality has been synonymous with in-car navigation but required information is either available or frozen. The
has recently emerged as a must-have feature in recent cellphones resulting inherent conflict between schedule and execution
such as the Apple iPhone, Blackberry Pearl and Nokia N95. These steers key back-end decisions.
are examples of embedded positioning, capable of enabling
features and functionality in a wide range of additional portable Challenges associated with back-end implementation of
electronics such as digital cameras, watches and media players. complex system-level products, in the context of a large well
Air has developed a GPS receiver optimized for the requirements of resourced organization, are well documented. This paper
embedded positioning. Capable of supporting today’s “killer” outlines the start-up’s perspective on the same problem, but in
navigation application, the Air architecture is optimized for the the context of taking a product from concept to market in the
more demanding requirements of embedded GPS and critically, for minimum time with minimum investment and resources.
the first time, offers the capability of 24/7 continuous location The conflict between exhaustive, conclusive analysis and
awareness for mobile, battery powered, consumer devices.
getting a product to market is not for everyone! Many of the
This paper outlines design flow and implementation decisions critical implementation decisions can only be made based on
utilized in the successful development of Air’s first generation
previous experiences and instinct.
airwave1 product, optimized for the first non-cellular embedded
GPS application – geotagging – in the digital camera market. Disruptive products are created through new innovative
Conventional wisdom dictates that complex system-level products architecture decisions which exploit system optimizations not
demand both a “bleeding edge” process node (45nm, etc) and available to competitors. Products are not “made” through
$25M+ of venture capital investment to bring initial silicon to careful implementation of circuit level functionality but the
market. The product described in this paper was implemented in same product opportunity can be “lost” through inappropriate
130nm CMOS technology and taken from concept to engineering implementation of that same circuit level functionality.
sample silicon within 3 years with significantly less than $25M by
This paper will not present any architectural details for Air’s
the combination of an experienced team and robust methodology.
first generation dual-mode low power GPS receiver. Instead
Successful realization of Air’s target power budget for airwave1
the focus is how the unique architecture was successfully
demanded a wide range of low power techniques spanning system
and architecture level through RTL, gate and transistor levels. mapped to silicon with specific focus on back-end IC design.
Air partnered with Synopsys for both digital EDA tools and also
physical IC design services. The resulting product enables GPS II. PRODUCT OVERVIEW AND ARCHITECTURE
functionality for less than the standby current of a cellphone. airwave1 is a single die consumer GPS solution containing
optimized GPS signal processing and integrated radio with
I. INTRODUCTION embedded processor, memory and support peripherals with
Development of any low-power product demands additional on-chip support analog functions.
optimization from system to transistor level. This paper Implemented in a 130nm CMOS process the IC requires a GPS
outlines the recent experiences at Air in the development of a antenna, SAW filter, crystal and passive support components.
130nm structured custom 41.6M transistor GPS receiver IC, airwave1 provides independent GPS searching and tracking
with specific focus on the back-end physical silicon design. capabilities. Multiple instances of two implementations of the
Air is a pre-revenue venture capital funded fabless proprietary satellite tracking DSP are provided comprising
semiconductor company developing a family of embedded 125K and 100K gates along with the 1.1M gate searching DSP.
The embedded 32b microprocessor requires 47K gates with
GPS receivers optimized for 24/7 operation in mobile devices.
various support digital blocks utilizing an additional 61K
Start-up’s must identify new, emerging markets but also
gates. The integrated GPS radio and general purpose support
deliver disruptive products before competitors. As a result, analog are implemented as two independent analog macro’s.
anything and everything that can be performed in parallel
Standard cell logic and the device pad ring are implemented
must be done in parallel to support this target. This demands
with libraries from TSMC and ARM Physical IP whilst the
back-end physical IC design starts before front-end activities
power management functionality uses a mix of cells from both
have completed. Many of the decisions required for back-end Air and ARM Physical IP. Memory macros were licensed from
work need to be taken (at risk) before all or even most of the both Dolphin and ARM Physical IP.
1
David Tester (david.tester@air-semi.com) is co-founder and CTO of Air
2
Jon Young, Chris Atkinson and Tom Ryan are with Synopsys
2. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER
III. DEVELOPMENT FLOW AND EDA TOOLS Static power is a function of the total number of logic gates
airwave1 was constructed as a “structured custom” device powered by the digital supply rail. If further reduction in total
within a conventional digital IC development flow that gate count is not an option (or is already minimized) then an
includes additional verification stages to ensure the product additional option is to break functionality into blocks and then
power budget was not violated, as discussed later in the paper. remove power from individual blocks – power islands.
Digital functionality is implemented with a combination of Replacement of a single digital supply voltage by multiple
VHDL and Verilog, synthesized with Design Compiler and switched power domains can eliminate leakage current from
RTL to gate level netlist (and later pre-P&R and post-P&R) major functional blocks when those parts of a system are idle.
verification performed with Formality. Static timing analysis airwave1 comprises 44 independent digital supply domains.
was performed with PrimeTime. Device layout was with ICC. Reducing total gate count in a design reduces total transistor
Clock gating functionality was inserted with PowerCompiler. gate area but often at the cost of increased development time.
P&R blocks used macros developed by Air and characterized For a start-up, additional optimization to refine, rather than
by Synopsys using Liberty NCX and NanoTime prior to use in create, functionality can conflict with schedule requirements.
a traditional DesignCompiler based logic synthesis flow. Embedded memory within an IC presents exactly the same
RTL design and verification was performed by Air. Custom static power consumption issues. Rather than optimize gate
digital macro characterization was performed by Synopsys. count the challenge becomes optimization of memory size.
Logic synthesis and formal verification was performed by Air. Leakage current is also temperature dependant. Obtaining an
Digital and analog macro layout was performed by Air. Block acceptable leakage current at 25°C is often not a challenge.
layout and device layout was performed by Synopsys. Reaching that acceptable leakage current at 85°C is complex.
Pre-P&R static timing was performed by Air with post-P&R
static timing being performed both by Air and Synopsys using V. UNDERSTANDING DYNAMIC POWER CONSUMPTION
post layout parasitic extracted by Star-RCXT.
Battery powered products, such as GPS receivers, must
The radio and support analog are developed in a traditional optimize the power consumed during normal operation.
design flow by an internal team and delivered into the ICC
1. What options to minimize dynamic power are available?
flow as GDS with a CDL netlist. Prior to integration these
2. What is the minimum data processing rate required?
macros were verified as DRC and LVS clean both in tools from
the analog design flow and with Hercules. 3. What clock frequency does the design operate with?
4. Are all clock edges required for processing?
Final LVS and DRC verification for the complete device was
performed by Synopsys with Hercules. 5. Are “spare” clock cycles available in the system timing?
6. Do all subsystems operate at the same frequency?
IV. LEAKAGE AND POWER MANAGEMENT ISSUES 7. Does all logic within a block operate at the same rate?
8. Can further clock edges be removed with clock gating?
Dynamic power consumption of a conventional GPS receiver is
9. Can clock gating be added at the RTL level? Capturing
impacted by the balance between functionality implemented as
clock gating at this level ensures that maximum
hardware and that implemented as software.
knowledge about the processing rates is captured - with
Static power consumption increases with logic complexity and
potential schedule cost.
memory requirements. Each transistor potentially contributes
Additional optimization can be realized with tools such as
to the total static (or leakage) power consumption for a device.
Synopsys PowerCompiler to automatically identify flip-flops
What are the options to reduce static power consumption?
that could be gated either because the clock edge is not
Leakage from each transistor is defined by the bias conditions required or because the data does not change.
for that component. Within a custom IC flow this offers the
airwave1 development utilized a mixture of both techniques.
opportunity of local power down transistors to force bias
Investigation of power dissipated by typical flip-flop designs
conditions on circuits that are not required. Additionally the
shows 10% to 20% of power consumed can relate directly to
size of transistors can be optimized. Minor variations in
switching activity on flip-flop clock pins and internal buffers.
transistor sizes can often provide a significant reduction in
leakage. Finally, the supply voltage to individual circuits can Efficiency of clock gating depends on where the gating cells
be removed when specific functionality is not required. are placed within the buffer chain used to build the clock tree.
In the context of semi-custom IC design these techniques are Does a design really need both the Q and QN output pins for
not directly available. The opportunity to change transistor each latch or flip-flop? Provision of both pins increases both
sizing for an existing standard cell library would violate the the area and power consumption of each cell.
library license agreement and would demand characterization Deep cones of logic between flip-flops can risk non-minimum
of the new library. Skills and design tools required to perform switching activity when driving registers change state. Such
these activities are often not available. What options remain? logic can significantly increase total power consumption for
Custom digital cells were developed for airwave1 but not to adders and multipliers, for example.
directly address dynamic (or static) power consumption issues.
3. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER
Reduction of digital power supply voltage, either on a global Analysis of post-layout performance suggests that high drive
or block-by-block basis can reduce the dynamic power strength logic gates don’t offer an optimum tradeoff for power
consumed by core logic although the overall efficiency when routing is limited and parasitic capacitance minimized.
improvement depends on the voltage regulation architecture. Circuit level simulation for key logic paths within airwave1
confirms this gate drive strength vs routing parasitic tradeoff.
VI. PREDICTING DYNAMIC POWER CONSUMPTION Synthesis strategies for the various DSP blocks in airwave1 are
Predicting digital power consumption in the traditional very different. Within the high data processing rate satellite
synthesis based semi-custom flow is a challenge. Power is not searching DSP timing closure is complex, demanding gate
modeled in the typical RTL based logic simulation flow. delay after post-layout parasitic capacitance less than 200ps.
Switching activity, gate drive strength information and P&R Only X1, X2 and X4 drive strength cells were permitted both in
parasitic capacitance is not modeled until the final stages of the initial logic synthesis and in post-placement optimization.
typical design flow. Influence of these factors on decisions In contrast, the low data processing rate satellite tracking DSP
made during the architecture stage of implementation remains requires a clear minimum drive strength and minimum logic
unknown until late (often too late) in the development process. area rather than a timing driven synthesis strategy.
The “power aware” design flow used is shown in Figure 1. Circuit level simulations of the gate level netlist after logic
synthesis but prior to layout was essential to confirm power
consumption for each functional block remained within the
overall power budget for the complete system.
VIII. CLOCK TREE ESTIMATION AND IMPLEMENTATION
Clock trees within design directly impact power consumption.
Typical clock trees constructed with automatic CTS tools will
provide functional, but over designed, results.
Target requirements for clock skew and transition times affect
power consumption of the clock tree built by CTS. Whilst P&R
tools will typically remove all logic within a pre-layout clock
tree realistic targets for clock skew and transition times are
essential in initial synthesis to create logic suitable for minimal
Figure 1 – Power Aware Design Flow optimization after construction of the clock trees.
Whilst the device was constructed with a hierarchical block-
Logic complexity and switching activity is minimized through by-block approach the total number of clock domains within
careful system design and modeling (prior to RTL coding) and the design exceeds 44 major clocks and 400 in total. Each
efficiency of the resulting implementation is peer reviewed domain contains multiple levels of clock gating, both manually
throughout the development process. inserted in RTL and inserted automatically with
PowerCompiler.
Rather than perform circuit level simulations after the P&R
process is completed to discover the power consumption the Multiple iterations of P&R are mandatory to tune the synthesis
post-layout parasitic capacitance is bounded at the start of the model of clock uncertainty and transitions if clock trees that
design process with a P&R constraint, enabling circuit level don’t contain strings of X20 buffers are to be avoided.
simulation of key blocks long before P&R has taken place. Estimation of clock tree power consumption remains a manual
Circuit level simulation of key digital blocks was performed activity. Circuit level simulation of clock tree performance was
pre-layout with estimated (and bounded) routing parasitics essential to confirm block power budgets post-layout were
and post-layout with extracted parasitics in a Cadence analog met.
flow with Spectre and UltraSim circuit simulators.
Evaluation of the resulting silicon shows actual power IX. ROUTING CONGESTION AND POST-LAYOUT CAPACITANCE
consumption for the most power critical digital blocks on airwave1 is implemented in CMOS 130nm 6LM UTM process.
airwave1 are within 10% of simulations. Impact on switching performance and power consumption of
post-layout parasitic capacitance can be significant and reliable
VII. LOGIC SYNTHESIS, TIMING CONSTRAINTS AND LAYOUT prediction of digital power consumption demands control of
airwave1 includes multiple independent signal processing post-layout routing capacitance.
blocks for satellite detection and tracking operating at various More metal layers generally give better utilization after P&R
clock rates of 96MHz, 64MHz, 32MHz and 16MHz. The but what are the implications of “ultra thick” top layer metal?
maximum clock rate blocks contain logic with hundreds of Minimum metal pitch and spacing rules for UTM metal have
paths containing over 92 levels of logic between flip-flops. the effect of making upper metal ineffective for detailed signal
routing and suitable only for power supply distribution. It is
4. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER
true that IR drop in DVDD and DVSS lines is significantly performance of these local memory functions compared to a
improved but the standard cell utilization degrades as a result. traditional array built with multiple flip-flops with synthesis.
Knowledge of relative DFF placement, driver and load allows
X. POWER DOMAINS & POWER MANAGEMENT KIT the performance of each flip-flop to be optimized with power
As previously described in section IV airwave1 contains 44 for the specific use-case as the target constraint. In these rare
independent digital power domains for fine control of leakage circumstances gates with sub-optimal propagation delays and
current during operation of the GPS receiver. This is transition times can offer optimized power consumption.
illustrated in Figure 2 below. The resulting macrocell offered 40% power improvement with
an additional 25% area optimization compared to synthesis.
Example switching performance is shown in Figure 3.
Synopsys provided cell characterization for the macro using
Liberty NCX for the cell characterization and then NanoTime
to generate the performance data for the cell array which was
subsequently included in the standard logic synthesis, static
timing and P&R flow.
Figure 2 – Voltage Domains in airwave1
All communication between blocks utilizes conventional
voltage clamp cells from a vendor power management kit. All
cells were manually inserted into the design at RTL level with
corresponding synthesis don’t touch constraints in the flow. Figure 3 – Switching Performance of Power Optimized Flip-Flop
Power domain control cells were automatically inserted by
ICC using its built in capabilities from manually generated XII. IMPACT OF POST-PLACEMENT LOGIC OPTIMIZATION
TCL commands with gate level netlist verification of the
During the digital layout process there are various points
resulting design. As part of this process the number and size
where ICC can re-optimize logic to fix timing and design rule
of the header cells needed by each voltage region had to be
violations. Each optimization step offers the opportunity to
calculated and the impact on the chip die and floorplan
transition a block from meeting to violating its power budget!
understood.
Typically logic synthesis for low power exploits carefully
The library cell used to create the switched voltage domains
coded RTL with specific synthesis constraints to ensure exactly
was internally developed at Air and exported into ICC.
the desired logic is obtained. Constraints applied to ICC for
There was no requirement for state retention flip-flops due to optimization steps must match those used for initial synthesis.
system level optimizations. Continuous power is required for
on-chip RAM with impact on memory macro leakage current.3 XIII. PHYSICAL DESIGN ISSUES (IN A START-UP)
For any pre-revenue company, time-to-market is critical. In
XI. CUSTOM LOGIC CELLS OPTIMIZED FOR POWER
the race to bring a new product to market there is an inherent
The high performance satellite searching DSP contains 105K conflict between activities essential to create the optimum
flip-flops as part of the local datapath to provide a total of 264 device floorplan and a time optimized development schedule.
discrete memory blocks. The ability of P&R tools to constrain
As start-up developing a complex system-level product against
the placement of 105,000 elements and build a structured array
an aggressive time to market goal, this is where the fun begins
presented a risk for the physical design phase of airwave1.
and conflict between front-end and back-end design appears.
Development of a custom macrocell not only eliminates the cell
No top level netlist? No problem! Critical decisions on device
placement risk but also offers an opportunity to optimize the
floorplan, pad-ring, global signal routing and package need to
be made before the major functional block design is complete,
3
http://www.air-semi.com/media/pdf/AIR__Dolphin_Integration_FINAL.pdf
5. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER
before block level layout is complete and long before the final XIV. DEVICE PACKAGING OPTIONS AND PAD RING DESIGN
(or even the preliminary) top level netlist is available. Whilst the internal evaluation package requires a 304 ball BGA
for the 244 pad airwave1 engineering sample silicon, the
device is offered to customers in a 68 lead QFN package.
Careful design of the pad ring was essential to ensure the 176
evaluation only I/O’s could be appropriately bonded in the
customer QFN bond option.
An example module containing the engineering sample silicon
with full GPS reference design is shown in Figure 5.
Figure 6 is the first photo geotagged with airwave1 silicon.
XV. DESIGN TOOL FLOW
Digital design followed an conventional logic synthesis based
flow using the Synopsys tools DesignCompiler, PrimeTime,
PowerCompiler, Formality, ICC and Hercules.
Analog design followed a conventional Cadence flow using
Composer, Artist, Spectre, SpectreRF, Virtuoso and Assura.
Licensed IP from all vendors and macros created by Air were
all subject to QA verification for LVS and DRC with Hercules.
Figure 4 – airwave1 Floorplan Final full-chip LVS and DRC was performed in Hercules prior
Can the analog macros be delivered days before tape-out?... to release by Air of final GDSII to the foundry for manufacture.
Hierarchical layout of a complex IC trades die area for risk.
Whilst flat layout of a complex IC allows EDA tools to make
unexpected placement and routing optimizations, predictable
execution for the back-end phase of development is only
possible with hierarchical floorplan and implementation.
Although without a fully automated method for implementing
the power down regions the time to re-spin a floorplan
Figure 5 – GPS module with engineering sample airwave1 silicon
(including header cells, power routing and voltage aware well
TIE and filler cell placement) can be longer than expected, in
this project extensive use of TCL scripting was used to
minimize the impact of changes and automate the process.
Throughout the digital design flow all block interfaces were
maintained with minimal connectivity issues for optimized
chip-level routing and timing, knowing that a block-by-block
hierarchical approach to device layout would be essential and
would enable physical IC design to start (at the block level)
before the functional RTL development phase had completed.
Global routing for a complex system-level product in a six
layer metal process where the top two layers of metal are more
suited to power supply routing than signal routing is a major
issue for a hierarchical based layout flow and demands careful
floorplanning in advance.
The “signal” processed by a GPS radio is, quite literally, noise.
Figure 6 – The first photograph geotagged with airwave1 silicon
Digital circuits are very good at generating not just wideband
noise but also noise at very specific design related frequencies.
Careful floorplanning is required to ensure all blocks capable XVI. THE FUTURE
of generating noise that would degrade the radio performance The silicon described in this paper is the engineering sample
are suitably located on the die. Circuits capable of generating release of the airwave1 product. Having demonstrated silicon
noise and circuits sensitive to noise must be shielded with that provided right-first-functionality the development team is
guard rings. Coupling capacitance between global route active on development of the production version of airwave1.
signals must be minimized.
6. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER
XVII. CONCLUSION
Air successfully completed development of the 130nm CMOS
single-die GPS receiver on-time and on-budget in conjunction
with Synopsys Professional Services physical design group.
The resulting right-first-time complex mixed signal silicon has
been sampled to lead digital camera customers in Japan.4
ACKNOWLEDGMENT
Development of any complex semiconductor product is a
group activity involving (and often demanding) system, silicon
and software optimizations and tradeoffs. The receiver
described in this paper forms part of a system-level GPS
semiconductor product developed by the R&D development
team at Air Semiconductor. The authors wrote this paper but
the product results from the combined contributions of all
team members.
David Tester is the CTO and leads the architecture and
product development activities for embedded GPS
products at Air, having raised series-A venture capital
funding and co-founded the company in 2006.
Prior to co-founding Air, he spent 15 years in various
semiconductor development and management positions
based both in the UK and US with Dialog Semiconductor,
LSI Logic, Conexant, Symbionics and GEC Research.
He was listed in GPS World’s “50 Leaders to Watch” during 2008. Air was
awarded the Red Herring Europe 100 along with both the Electra and IET
start-up of the year awards in 2008.
His high volume, standard product, consumer IC background spans both
analog and digital silicon development – ranging from system level to
transistor level design. He has participated in the development of over 20
high volume consumer semiconductor products for the navigation, wireless
voice, wireless data, digital TV and PC graphics markets.
Mr Tester is a senior member of IEEE, ION and IET; He is registered as a
Chartered Engineer with both ECUK and FEANI. He holds nine US patents.
Tom Ryan (left), Jon Young (centre)
and Chris Atkinson (right) work for
Synopsys Global Technical Support
in Reading, UK.
Global Technical Support enables
customer adoption and deployment
of Synopsys’ technology and flows
to improve their design productivity
and tape out predictability.
4
http://www.air-semi.com/media/pdf/AIR_firstcustomersrelease_FINAL.pdf