1. Embedded Electronics for
Telecom DSP
Aldebaro Klautau
Embedded Systems Lab (LASSE) @ Federal Univ. of Pará (UFPA)
V International Workshop on Trends in Optical Technologies (WTON)
CPqD – Campinas – Brazil - May 19, 2016
UFPA
2. Goal and Agenda
Goal: discuss options for prototyping new physical layers (PHY) of
DSP-based telecommunication systems
From the perspective of a digital signal processing
R&D group that (furiously) targets the highest
possible bit rates
No ASICs, but discrete components & development
boards
Agenda
Motivation: demand for increased bit rates
Options for prototyping: emphasis on DSP processor and FPGA
Examples of prototypes using the most from available hardware
May 19, 2016 Aldebaro Klautau 2
3. Bit-rate hungry applications
Optical transmission with flexible transceivers
Software-defined radios and 5G
Architecture: Small cells and centralized-RAN
PHY: Spectrum aggregation, massive MIMO, mmWaves
Example of 4G traffic:
4 signals with BW=20 MHz ~3.7 Gbps
In newer versions of LTE number of antennas
can be 16 or 32 Bit rate = 15 Gbps or 30 Gbps
Aldebaro Klautau 3May 19, 2016
4. Electronic components and associated
development boards for prototyping
Aldebaro Klautau 4May 19, 2016
Prototype
GPU DSP ASSP ASIC FPGA
Standard
cells
Full custom
IC
GPU: graphics processing unit
ASSP: application specific standard product
5. Complete DMT transceiver development
FFT-based Discrete Multi-Tone (DMT) bitloading supporting up to 10
bits per tone (1024-QAM)
5
Bits per tone
6. For DMT task: a DSP processor (SoC)
chosen as platform
Aldebaro Klautau 6
4 cores
FFT coprocessors
Network coprocessor
Viterbi coprocessors
7. C language programming
Our main motivation: program in C language
Besides, free open source routines available. Example: Forward Error
Correction (FEC)
But good performance required heavy optimization
Comparison of Reed-Solomon (RS) implementations, per codeword
7
8. Many routines to split among cores
Issues related to concurrency and parallelism
April 6, 2016 Aldebaro Klautau 8
12. FPGA boards support several interfaces
and peripherals
Several FMC (FPGA mezzanine card) boards
PC interface: PCIe to FPGA (up to 30 Gbps)
Commonly present in FPGA evaluation boards
Aldebaro Klautau 12
High speed
ADC/DAC cards
8x SFP
expansion card
General
purpose
13. Prototyping with FPGAs
HDL (VHDL, Verilog, etc.) is more difficult than C and most engineers
are exposed to “programmable” logic (digital electronics) but not
digital signal processing on FPGAs and parallel programming
Go for DSP “general-purpose” chips?
Note that multicore alternatives also require good skills on
concurrent and parallel programming and often a profound
knowledge of the chip architecture
Changing the DSP chip manufacturer requires studying the new
architecture while FPGAs are more “generic”
FPGAs are more natural step towards silicon / ASIC than using DSP
chips
Aldebaro Klautau 13
14. ADC trends
Photonic ADCs
Undersampling :
signals sampled below
their Nyquist rates
Compressive sampling
E.g. Bayesian
approach
May 23, 2016 Aldebaro Klautau 14
[Khilo, 2012]
Limits on ENOB (effective
number of bits) due to Jitter
ADCs up to 2007
Darker blue: ADCs
later than 2007
15. Some DAC performance numbers
Summary: DACs and AWGs (arbitrary waveform generators), together with ADCs
and DSOs (digital storage oscilloscopes) operating at ~100 GSa/s
Hence, the computing platform (DSP, FPGA, ASSP, etc.) may be the bottleneck!
15
bits BW (GHz) Fs (Gsa/s) ENOB
Micram DAC-4 6 42 100 -
Micram DAC-3 6 23.8 72 4.5
Micram DACII 6 20 34 4
[Nagatani, 2011] 6 - 60 -
[Huang, 2014] 8 10 100 5.3
16. “Design gap” does not help those
aiming at bit rate records
“Gap”: FPGA has enough
capacity to accomodate
most of the ASIC designs
But achieving symbol
rates of tens of Gbauds is
hard for a real-time
transmitter
implementation and
often impossible for a
receiver
Aldebaro Klautau 16
[Trimberger, 2015]
May 19, 2016
17. Architectures for PHY testbeds and
demonstrations
Offline processing
Both transmitter (Tx) and receiver (Rx) processing are performed offline
Often FPGA-based
Transmitter: samples are pre-computed, stored at e.g. FPGA memory and sent
to channel via fast DAC
Receiver: fast digital storage oscilloscope (DSO) digitizes received signal
Real-time receiver processing
Often based on ASICs or ASSPs
Real-time transmitter processing
May use FPGA with internal PRBS generation to avoid “slow” interface to PC
Aldebaro Klautau 17May 19, 2016
18. State of art offline processing example
1.125 Tb/s 15-carrier
super-channel
Two DACs at 32 GSa/s
(oversampling of 4
samples/symbol)
DSO with 62.5 GSa/s
using two interleaved
33 GSa/s ADCs
Aldebaro Klautau 18May 19, 2016
[Maher, 2016]
19. State of art Tx + Rx real-time processing
example
[Eiselt, 2016] “First Real-Time 400G PAM-4 Demonstration for Inter-
Data Center Transmission over 100 km of SSMF at 1550 nm”
ASIC chips
Extra info:
8 x 25.78125 GBaud signals, PAM-4, 100 km; 𝜆 = 1550 𝑛𝑚
19
20. Real-time transmitter processing example
Implementation by Ilan Sousa (UFPa). Joint work with CPqD
IMOC 2015 Second Best Student Paper Award
Example of reaching limit of available hardware via DSP
Real-time fractional oversampling of high order modulation signals
with Nyquist pulse shaping
Issues:
Fractional sampling rate conversion: interpolate by L and decimate by M
FPGA clock is slow and parallelism is required
Need to minimize the number of multipliers
Aldebaro Klautau 20
21. DAC with Fs = 25 GSa/s and FPGA with 156.25 MHz clock
Parallelism level: 160 (= 25 GSa/s / 156.25 MHz)
Hardware limitation required parallelism
May 19, 2016 Aldebaro Klautau 21
22. Real-time Nyquist pulse shaping
Input symbols at given rate Rsym (e.g. 12.5 Gbauds) must be
converted to samples at Fs (e.g. 25 Gsa/s) to feed the DAC
Often the oversampling factor L=Rsym/Fs is an integer
Then “shaping” is equivalent to interpolation: upsampling followed by an
FIR filter h[n] (the Nyquist pulse) with N coefficients
Aldebaro Klautau 22May 19, 2016
23. Fractional sampling rate conversion
(FSRC)
Fractional oversampling factor L/M
Example 1: L=3 and M=2 implies L/M=1.5 samples/sym and Fs=1.5 Rsym
Example 2: L=10 and M=9 implies L/M=1.11 samples/sym and Fs=1.11 Rsym
Gives flexibility for Nyquist pulse shaping with respect to relation
between symbol rate Rsym and sampling frequency Fs
May 23, 2016 Aldebaro Klautau 23
LPF
Gain=L,
ωc=π/L
L
𝒙[𝒎′
] 𝐪[𝒎] 𝐳[𝒎]
LPF
Gain=1,
ωc=π/M
M
𝒚[𝒏]𝐳′[𝒎]
Interpolator Decimator
24. Nyquist pulse shaping implementations
May 23, 2016 Aldebaro Klautau 24
Resampling = interpolation + decimation
LPF
Gain=L,
ωc=min{π/L,π/M}
ML
𝒚[𝒏]𝒙[𝒎′
] 𝐪[𝒎] 𝐳[𝒎]
LPF
Gain=L,
ωc=π/L
L
𝒙[𝒎′
] 𝐪[𝒎] 𝐳[𝒎]
LPF
Gain=1,
ωc=π/M
M
𝒚[𝒏]𝐳′[𝒎]
Interpolator Decimator
Combine the filters
Polyphase efficient implementation
25. Minimum number of multipliers and efficient use of memory
Example: L=3, M=5, parallelism P=15, V=5 stacked FSRCs
25
Aldebaro Klautau
Proposed Parallel FSRC
26. Results with parallel FSRC
Decreases computational cost by LM (for example: with L=16 and
M=15 2 orders of magnitude)
FPGAs resources usage for L=5, M=4, with filter lengths N=51 or 101
using V = 32 stacked FSRCs (XC5 and XC7 and boards for Virtex 5 and 7,
respectively)
26
Look-Up Tables:
Multipliers:
27. Validation results
Constellations for back-to-back (B2B) – first set of tests 28.125 GBd
Sampling rate 𝐹𝑠 = 30 𝐺𝑆𝑎/𝑠
𝑂𝑣𝑒𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 = 16/15 = 1.0667 samples per symbol
Symbol rate Rsym = 28.125 GBauds
Aldebaro Klautau 27
X polarization Y polarization
28. Channelization for FDM over fiber
An example in which smart (polyphase) filtering is not enough:
Aldebaro Klautau 28May 19, 2016
31. Demux signal transformations via DSP
~
Resample
𝑫 𝒑
~
31
Carrier Carrier Complex
Real
Adjacent channel
strong interference
32. Classical filtering result
Filter length may not be
enough
Problem: FPGA does not
suport real-time operation
with more than 3k
multipliers
Signal
Gen
DEMUX Analyzer
May 19, 2016 Aldebaro Klautau 32
33. Demux with improved filtering
~
Resample
𝑫 𝒑
~
May 19, 2016 Aldebaro Klautau 33
Carrier Carrier Complex
Real
34. Effect of improved filtering on received
signal
May 19, 2016 34
FIR filters with
length 90, 150 and 200
With significant
improvement
regarding distortion, etc.
35. Conclusions
“Platform FPGAs” have been chosen for cutting-edge
research testbeds due to their price and reconfigurability
There are wonderful EDA flows to simplify design for FPGAs (e.g. Matlab
VHDL FPGA), but for cutting-edge implementations, a skilled developer is
often required with
Capability to write custom and efficient VHDL code
Good understanding of corresponding IPs
Trained to explore parallelism
Along with microelectronics and photonics, telecom algorithms will also evolve
towards parallel implementations to cope with the increase on information
processing rate
Benefit of increased degrees of freedom (e.g. spatial multiplexing in wireless and optical
fibers)
Virtuous cycle: We develop better algorithms when evaluating their real-time
implementation on hardware
35
Academia needs to
update DSP courses!
36. Thanks!
Obrigado!
LASSE @ Espaço Inovação – Parque Ciência e Tecnologia Guamá
aldebaro@ufpa.br - www.lasse.ufpa.br
April 6, 2016 Aldebaro Klautau 36
37. References
[Khilo, 2012] Photonic ADC: overcoming the bottleneck of electronic jitter
[Huang, 2014] An 8-bit 100-GS/s distributed DAC in 28-nm CMOS
[Wong, 2014] Quantifying the Gap Between FPGA and Custom CMOS to Aid Microarchitectural Design
[Trimberger, 2015] Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology
[Lyke, 2015] An Introduction to Reconfigurable Systems
[Shannon, 2015] Technology Scaling in FPGAs: Trends in Applications and Architectures
[Maher, 2016] Increasing the information rates of optical communications via coded modulation: a study of
transceiver performance
[Nagatani, 2011] A 60-GS/s 6-Bit DAC in 0.5-µm InP HBT Technology for Optical Communications Systems
[Huang, 2014] An 8-bit 100-GS/s distributed DAC in 28-nm CMOS
[Eiselt, 2016] First Real-Time 400G PAM-4 Demonstration for Inter-Data Center Transmission over 100 km of
SSMF at 1550 nm
[Ilan, 2015] Parallel Polyphase Filtering for Pulse Shaping on High-Speed Optical Communication Systems
[Kuon, 2007] Measuring the Gap Between FPGAs and ASICs
[Jamieson, 2005] Mapping multiplexers onto hard multipliers in FPGAs
Aldebaro Klautau 37May 19, 2016