SlideShare une entreprise Scribd logo
1  sur  19
GPU-based Parallelization of
System Modeling

Stephan Pachnicke, 18.03.2013
Outline

• Motivation

• Numerical System Modeling

• GPU-Parallelization

• Comparison of Speedup and Accuracy

• Conclusion




2                       © 2013 ADVA Optical Networking. All rights reserved.
Acknowledgments

The author would like to acknowledge the help and
contributions of


Adam Chachaj – Krone Messtechnik

Heinrich Müller – TU Dortmund

Peter Krummrich – TU Dortmund

Markus Roppelt – ADVA Optical Networking

Michael Eiselt – ADVA Optical Networking




3                    © 2013 ADVA Optical Networking. All rights reserved.
Motivation




4            © 2013 ADVA Optical Networking. All rights reserved.
In Short: Computational Performance




                                                                           Graphical Processing Unit
                                                                                    (GPU)




                                       vs.
      CPU Cluster




5                   © 2013 ADVA Optical Networking. All rights reserved.
Increase in GFlop/s




• GPU performance is growing even faster than predicted by Moore„s
  law and is significantly higher than CPU performance

• GPUs are attractive also for general purpose computing
  (complex numerical simulations)



6                      © 2013 ADVA Optical Networking. All rights reserved.
Optical System Modeling

• Simulation of (long-haul) optical transmission systems requires
  numerical solution of the nonlinear Schrödinger equation

 High computational effort for small step-sizes due to accurate
  simulation of nonlinear fiber effects

• Precise estimation of the bit error ratio with Monte-Carlo
  simulations for PMD and noise

 Requires a high number of simulated bits




7                     © 2013 ADVA Optical Networking. All rights reserved.
Split-Step Fourier Method (SSFM)
•   Splits nonlinear Schrödinger equation in linear and nonlinear parts
•   Separate solution of linear and nonlinear parts




•   Solution of the linear part in the frequency domain and of the nonlinear
    part in time domain (acceptable for small step-sizes)




…                           FFT
                             FFT                                                 IFFT
                                                                                  IFFT
                                                                                   IFFT   …


                                       1 Split-Step
8                         © 2013 ADVA Optical Networking. All rights reserved.
Speedup Factor                              (GPU vs CPU)


          Single precision
                (SP)




                             Double precision
                                  (DP)
                                                                                              Legend
                                                                          DP:           Nvidia CUDA FFT
                                                                          SP:           FFT using pre-calculated
                                                                                        twiddle factors




•   Single precision arithmetic has much higher performance on GPU
    (because main target group is computer gaming)

•   Longer block lengths allow better parallelization

 Single precision implementation desirable

9                                © 2013 ADVA Optical Networking. All rights reserved.
Accuracy         (in single precision)


                                                                                   Legend
                                                                  CUFFT: Nvidia CUDA FFT
                                                                  FFTW: Fastest Fourier Transform
                                                                        in the West
                                                                  IPP:        Intel Integrated
                                                                              Performance Primitives
                             LUT-based FFT                        LUT:        Precalculate trigonometric
                                                                              functions in DP




 • Total accuracy of SSFM dominated by FFT accuracy

 • Backward error grows linearly with increasing number of FFTs

 • CUDA FFT shows considerably higher error than other FFT
   implementations

10                     © 2013 ADVA Optical Networking. All rights reserved.
Analysis: Accuracy

 Why is the accuracy of CUFFT in SP relatively low?

  FFT performance depends crucially on accuracy of „twiddle-
   factors“ (or trigonometric functions)

  HW implementation of trigonometric functions in SP on GPUs
   optimized for peak performance not accuracy


 What can be done to increase accuracy in single precision?

  Implementation of Taylor series expansion (slow!)

  Compute trigonometric functions in DP on CPU and store them in
   a look-up table on the GPU
   (especially suited to the split-step Fourier method with thousands
   of FFTs of similar length)

                                                         J. C. Schatzman, SIAM J. Scientific Comput. (1996).

11                     © 2013 ADVA Optical Networking. All rights reserved.
Illustrative Example
             CUDA FFT (SP)                                                  LUT-based FFT (SP)




                                                                                                                 -: GPU
                                                                                                                 -: CPU




     •   Look-up table based FFT provides a significantly increased accuracy in single-
         precision arithmetics
     •   Look-up table holds pre-calculated „twiddle-factor“ values

                                                                                   Source: S. Pachnicke, et al, OFC 2011.

12                              © 2013 ADVA Optical Networking. All rights reserved.
System Analysis                                               (SSFM Simulation)

      Req. OSNR deviation for BER=10-3 [dB]




                                                                                                     GPU simulation
                                                                                                      (in SP or DP)
                                                                                                           vs.
                                                                                                     CPU simulation
                                                                                                         (in DP)

                                                                                                     11x 112 Gb/s CP-QPSK




 •   GPU double precision results are (almost) identical to CPU results

 •   The OSNR penalty of our single precision implementation remains below
     0.1 dB up to a number of approx. 125,000 split-steps
                                                                                             Source: S. Pachnicke, IEEE ICTON, 2010.


13                                            © 2013 ADVA Optical Networking. All rights reserved.
Combined Simulation in SP & DP
                                                                  Calculate approximate
                                                                   division of the parameter
                                                                   space into strata by fast
                                                                   simulations with single
                                                                   precision.
                                                                  The ellipses represent
                                                                   parameter combinations
                                                                   for which bit errors occur
                                                                   during transmission.
                                                                  Execute simulations with
                                                                   double precision
                                                                   accuracy sparsely in the
                                                                   different strata to assess
                                                                   the BER.


  Combined simulation with single and double precision and automatic
   (algorithmic) choice of amount of single precision simulations
                                                                               P. Serena, et al, IEEE JLT, 2009.
                                                                                 S. Pachnicke, et al, OFC 2011.

14                      © 2013 ADVA Optical Networking. All rights reserved.
Discussion




                                                                   Robustness of algorithm has
                                                                   been checked by deliberately
                                                                   selecting high amount of
                                                                   880,000 split-steps



 •   Results of combined (SP & DP) GPU simulations match well with results obtained
     from CPU simulations in DP
 •   Speedup of up to a factor of 180 possible compared to CPU
  Stratified Monte-Carlo sampling allows algorithmic choice of amount of required DP
   simulations for a given accuracy


                                                                                    Source: S. Pachnicke, et al, OFC 2011.


15                           © 2013 ADVA Optical Networking. All rights reserved.
Design Advantages
 •   GPU parallelization allows simulation of a long distance 80 WDM channel system on
     a PC in reasonable time




                                                             Source: C. Xia, D. van den Borne, OFC, 2011




 •   Result: The system performance can be estimated much more precisely than with
     CPU-based simulations (typically modeling only 10 WDM channel systems)




16                           © 2013 ADVA Optical Networking. All rights reserved.
Conclusion

 • GPUs offer a much higher computational peak performance
   than CPUs

 • Full benefit of GPU power only in single precision

 • Increase in single precision accuracy possible by pre-computing of
   trigonometric function values for FFTs

 • Speedup in simulation time of more than a factor of 100 possible
   compared to CPU




17                     © 2013 ADVA Optical Networking. All rights reserved.
Further Reading

 •   N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, J. Manferdelli, “High
     Performance Discrete Fourier Transforms on Graphics Processors”, Proc. of
     IEEE conference on Supercomputing (SC), article no. 2 (2008).

 •   S. Pachnicke, “Fiber-Optic Transmission Networks: Efficient Design and
     Dynamic Operation”, Springer (2011).

 •   J. C. Schatzman, “Accuracy of the Discrete Fourier Transform and the Fast
     Fourier Transform”, SIAM J. Scientific Comput. 17, 1150-1166 (1996).

 •   G. Falcao, V. Silva, L. Sousa, “How GPUs can outperform ASICs for fast LDPC
     decoding”, Proc. of ACM International Conference on Supercomputing
     (ICS), 390-399 (2009).

 •   J. A. Stratton, S. S. Stone, W.-M. W. Hwu, “MCUDA: An Efficient
     Implementation of CUDA Kernels for Multi-core CPUs”, Lecture Notes in
     Computer Science 5335, 16-30 (2008).

 •   R. R. Exposito, G. L. Taboada, S. Ramos, J. Tourino, R. Doallo, “General-
     purpose computation on GPUs for high performance cloud computing”, Wiley J.
     Concurrency and Computation 24 (2012).




18                          © 2013 ADVA Optical Networking. All rights reserved.
Thank you

spachnicke@advaoptical.com


IMPORTANT NOTICE

The content of this presentation is strictly confidential. ADVA Optical Networking is the exclusive owner or licensee of the
content, material, and information in this presentation. Any reproduction, publication or reprint, in whole or in part, is strictly
prohibited.

The information in this presentation may not be accurate, complete or up to date, and is provided without warranties or
representations of any kind, either express or implied. ADVA Optical Networking shall not be responsible for and disclaims any
liability for any loss or damages, including without limitation, direct, indirect, incidental, consequential and special damages,
alleged to have been caused by or in connection with using and/or relying on the information contained in this presentation.

Copyright © for the entire content of this presentation: ADVA Optical Networking.

Contenu connexe

Tendances

Emerging Trends and Applications for Cost Effective ROADMs
Emerging Trends and Applications for Cost Effective ROADMsEmerging Trends and Applications for Cost Effective ROADMs
Emerging Trends and Applications for Cost Effective ROADMs
CPqD
 
ROADM Technologies for Flexible - Tbitsec Optical Networks
ROADM Technologies for Flexible - Tbitsec Optical NetworksROADM Technologies for Flexible - Tbitsec Optical Networks
ROADM Technologies for Flexible - Tbitsec Optical Networks
CPqD
 
Optical Fibre & Introduction to TDM & DWDM
Optical Fibre & Introduction to TDM & DWDMOptical Fibre & Introduction to TDM & DWDM
Optical Fibre & Introduction to TDM & DWDM
Hasna Heng
 

Tendances (20)

Emerging Trends and Applications for Cost Effective ROADMs
Emerging Trends and Applications for Cost Effective ROADMsEmerging Trends and Applications for Cost Effective ROADMs
Emerging Trends and Applications for Cost Effective ROADMs
 
ROADM Technologies for Flexible - Tbitsec Optical Networks
ROADM Technologies for Flexible - Tbitsec Optical NetworksROADM Technologies for Flexible - Tbitsec Optical Networks
ROADM Technologies for Flexible - Tbitsec Optical Networks
 
Evaluation of Virtualization Models for Optical Connectivity Service Providers
Evaluation of Virtualization Models for Optical Connectivity Service ProvidersEvaluation of Virtualization Models for Optical Connectivity Service Providers
Evaluation of Virtualization Models for Optical Connectivity Service Providers
 
Metro High-Speed Product Line Manager
Metro High-Speed Product Line ManagerMetro High-Speed Product Line Manager
Metro High-Speed Product Line Manager
 
Ft tx presentation to telkom 25092013
Ft tx presentation to telkom 25092013Ft tx presentation to telkom 25092013
Ft tx presentation to telkom 25092013
 
Basics of DWDM Technology
Basics of DWDM TechnologyBasics of DWDM Technology
Basics of DWDM Technology
 
Performance Tradeoffs of 120 Gb/s DP-QPSK in ROADM Systems
Performance Tradeoffs of 120 Gb/s DP-QPSK in ROADM SystemsPerformance Tradeoffs of 120 Gb/s DP-QPSK in ROADM Systems
Performance Tradeoffs of 120 Gb/s DP-QPSK in ROADM Systems
 
Optical Transport Technologies and Trends
Optical Transport Technologies and TrendsOptical Transport Technologies and Trends
Optical Transport Technologies and Trends
 
DWDM & Packet Optical Fundamentals by Dion Leung [APRICOT 2015]
DWDM & Packet Optical Fundamentals by Dion Leung [APRICOT 2015]DWDM & Packet Optical Fundamentals by Dion Leung [APRICOT 2015]
DWDM & Packet Optical Fundamentals by Dion Leung [APRICOT 2015]
 
WDM Basics
WDM BasicsWDM Basics
WDM Basics
 
Optical Networks Infrastructure
Optical Networks InfrastructureOptical Networks Infrastructure
Optical Networks Infrastructure
 
Synchronization protection & redundancy in ng networks itsf 2015
Synchronization protection & redundancy in ng networks   itsf 2015Synchronization protection & redundancy in ng networks   itsf 2015
Synchronization protection & redundancy in ng networks itsf 2015
 
Mobile Broadband
Mobile BroadbandMobile Broadband
Mobile Broadband
 
Introduction to dwdm technology
Introduction to dwdm technologyIntroduction to dwdm technology
Introduction to dwdm technology
 
Optical Fibre & Introduction to TDM & DWDM
Optical Fibre & Introduction to TDM & DWDMOptical Fibre & Introduction to TDM & DWDM
Optical Fibre & Introduction to TDM & DWDM
 
DWDM 101 - BRKOPT-2016
DWDM 101 - BRKOPT-2016DWDM 101 - BRKOPT-2016
DWDM 101 - BRKOPT-2016
 
Optical network evolution
Optical network evolutionOptical network evolution
Optical network evolution
 
Implications of super channels on CDC ROADM architectures
Implications of super channels on CDC ROADM architecturesImplications of super channels on CDC ROADM architectures
Implications of super channels on CDC ROADM architectures
 
LTE introduction part1
LTE introduction part1LTE introduction part1
LTE introduction part1
 
Guide otn ang
Guide otn angGuide otn ang
Guide otn ang
 

En vedette

En vedette (7)

Deploying Virtualized Services Over Legacy Networks
Deploying Virtualized Services Over Legacy NetworksDeploying Virtualized Services Over Legacy Networks
Deploying Virtualized Services Over Legacy Networks
 
Statistical-Multiplexing Gain of C-RAN
Statistical-Multiplexing Gain of C-RANStatistical-Multiplexing Gain of C-RAN
Statistical-Multiplexing Gain of C-RAN
 
数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战
 
Forget the Layers: NFV Is About Dynamism
Forget the Layers: NFV Is About DynamismForget the Layers: NFV Is About Dynamism
Forget the Layers: NFV Is About Dynamism
 
WDM PON Forum Workshop
WDM PON Forum WorkshopWDM PON Forum Workshop
WDM PON Forum Workshop
 
NGFI (Next Generation Fronthaul Interface) native RoE (Radio over Ethernet)
NGFI (Next Generation Fronthaul Interface) native RoE (Radio over Ethernet)NGFI (Next Generation Fronthaul Interface) native RoE (Radio over Ethernet)
NGFI (Next Generation Fronthaul Interface) native RoE (Radio over Ethernet)
 
Tunable DWDM PON at WDM PON Forum Workshop
Tunable DWDM PON at WDM PON Forum WorkshopTunable DWDM PON at WDM PON Forum Workshop
Tunable DWDM PON at WDM PON Forum Workshop
 

Similaire à OFC/NFOEC: GPU-based Parallelization of System Modeling

APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
Junli Gu
 
Symposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurSymposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT Kanpur
Rishi Pathak
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Fisnik Kraja
 
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Altera Corporation
 

Similaire à OFC/NFOEC: GPU-based Parallelization of System Modeling (20)

PG-Strom
PG-StromPG-Strom
PG-Strom
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Design and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processorDesign and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processor
 
Imaging on embedded GPUs
Imaging on embedded GPUsImaging on embedded GPUs
Imaging on embedded GPUs
 
Circuits eda
Circuits edaCircuits eda
Circuits eda
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
Main (3)
Main (3)Main (3)
Main (3)
 
427 432
427 432427 432
427 432
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Symposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurSymposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT Kanpur
 
FIR filter on GPU
FIR filter on GPUFIR filter on GPU
FIR filter on GPU
 
N A G P A R I S280101
N A G P A R I S280101N A G P A R I S280101
N A G P A R I S280101
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with compute
 
Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
Jpeg dct
Jpeg dctJpeg dct
Jpeg dct
 
stdp_on_fpga.ppt
stdp_on_fpga.pptstdp_on_fpga.ppt
stdp_on_fpga.ppt
 
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
Adv. FPGA Motor Control--EBV & Univ. of Koln: Embedded World 2010
 
PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)
 

Plus de ADVA

Plus de ADVA (20)

Industrial optically pumped cesium beam clock
Industrial optically pumped cesium beam clockIndustrial optically pumped cesium beam clock
Industrial optically pumped cesium beam clock
 
The need for GBaaS as GPS/GNSS is no longer a reliable source for critical PN...
The need for GBaaS as GPS/GNSS is no longer a reliable source for critical PN...The need for GBaaS as GPS/GNSS is no longer a reliable source for critical PN...
The need for GBaaS as GPS/GNSS is no longer a reliable source for critical PN...
 
Industry's longest holdover with the OSA 3350 SePRC™ optical cesium clock
Industry's longest holdover with the OSA 3350  SePRC™ optical cesium clockIndustry's longest holdover with the OSA 3350  SePRC™ optical cesium clock
Industry's longest holdover with the OSA 3350 SePRC™ optical cesium clock
 
Addressing PNT threats in critical defense infrastructure
Addressing PNT threats in critical defense infrastructureAddressing PNT threats in critical defense infrastructure
Addressing PNT threats in critical defense infrastructure
 
Precise and assured timing for enterprise networks
Precise and assured timing for enterprise networksPrecise and assured timing for enterprise networks
Precise and assured timing for enterprise networks
 
Introducing Ensemble Cloudlet for on-premises cloud demand
Introducing Ensemble Cloudlet for on-premises cloud demandIntroducing Ensemble Cloudlet for on-premises cloud demand
Introducing Ensemble Cloudlet for on-premises cloud demand
 
ePRTC in data centers - GNSS-backup-as-a-service (GBaaS)
ePRTC in data centers - GNSS-backup-as-a-service (GBaaS)ePRTC in data centers - GNSS-backup-as-a-service (GBaaS)
ePRTC in data centers - GNSS-backup-as-a-service (GBaaS)
 
Sync on TAP - Syncing infrastructure with software
Sync on TAP - Syncing infrastructure with softwareSync on TAP - Syncing infrastructure with software
Sync on TAP - Syncing infrastructure with software
 
Meet stringent latency demands with time-sensitive networking
Meet stringent latency demands with time-sensitive networkingMeet stringent latency demands with time-sensitive networking
Meet stringent latency demands with time-sensitive networking
 
Making networks secure with multi-layer encryption
Making networks secure with multi-layer encryptionMaking networks secure with multi-layer encryption
Making networks secure with multi-layer encryption
 
Quantum threat: How to protect your optical network
Quantum threat: How to protect your optical networkQuantum threat: How to protect your optical network
Quantum threat: How to protect your optical network
 
Optical networks and the ecodesign tradeoff between climate change mitigation...
Optical networks and the ecodesign tradeoff between climate change mitigation...Optical networks and the ecodesign tradeoff between climate change mitigation...
Optical networks and the ecodesign tradeoff between climate change mitigation...
 
Trends in next-generation data center interconnects (DCI)
Trends in next-generation data center interconnects (DCI)Trends in next-generation data center interconnects (DCI)
Trends in next-generation data center interconnects (DCI)
 
Open optical edge connecting mobile access networks
Open optical edge connecting mobile access networksOpen optical edge connecting mobile access networks
Open optical edge connecting mobile access networks
 
Introducing Adva Network Security – a trusted German anchor
Introducing Adva Network Security – a trusted German anchorIntroducing Adva Network Security – a trusted German anchor
Introducing Adva Network Security – a trusted German anchor
 
Meet the industry's first pluggable 10G demarcation device
Meet the industry's first pluggable 10G demarcation deviceMeet the industry's first pluggable 10G demarcation device
Meet the industry's first pluggable 10G demarcation device
 
Introducing ADVA AccessWave25™
Introducing ADVA AccessWave25™Introducing ADVA AccessWave25™
Introducing ADVA AccessWave25™
 
10G edge technology for outdoor environments
10G edge technology for outdoor environments10G edge technology for outdoor environments
10G edge technology for outdoor environments
 
The quantum age - secure transport networks
The quantum age - secure transport networksThe quantum age - secure transport networks
The quantum age - secure transport networks
 
From leased lines to optical spectrum services
From leased lines to optical spectrum servicesFrom leased lines to optical spectrum services
From leased lines to optical spectrum services
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

OFC/NFOEC: GPU-based Parallelization of System Modeling

  • 1. GPU-based Parallelization of System Modeling Stephan Pachnicke, 18.03.2013
  • 2. Outline • Motivation • Numerical System Modeling • GPU-Parallelization • Comparison of Speedup and Accuracy • Conclusion 2 © 2013 ADVA Optical Networking. All rights reserved.
  • 3. Acknowledgments The author would like to acknowledge the help and contributions of Adam Chachaj – Krone Messtechnik Heinrich Müller – TU Dortmund Peter Krummrich – TU Dortmund Markus Roppelt – ADVA Optical Networking Michael Eiselt – ADVA Optical Networking 3 © 2013 ADVA Optical Networking. All rights reserved.
  • 4. Motivation 4 © 2013 ADVA Optical Networking. All rights reserved.
  • 5. In Short: Computational Performance Graphical Processing Unit (GPU) vs. CPU Cluster 5 © 2013 ADVA Optical Networking. All rights reserved.
  • 6. Increase in GFlop/s • GPU performance is growing even faster than predicted by Moore„s law and is significantly higher than CPU performance • GPUs are attractive also for general purpose computing (complex numerical simulations) 6 © 2013 ADVA Optical Networking. All rights reserved.
  • 7. Optical System Modeling • Simulation of (long-haul) optical transmission systems requires numerical solution of the nonlinear Schrödinger equation  High computational effort for small step-sizes due to accurate simulation of nonlinear fiber effects • Precise estimation of the bit error ratio with Monte-Carlo simulations for PMD and noise  Requires a high number of simulated bits 7 © 2013 ADVA Optical Networking. All rights reserved.
  • 8. Split-Step Fourier Method (SSFM) • Splits nonlinear Schrödinger equation in linear and nonlinear parts • Separate solution of linear and nonlinear parts • Solution of the linear part in the frequency domain and of the nonlinear part in time domain (acceptable for small step-sizes) … FFT FFT IFFT IFFT IFFT … 1 Split-Step 8 © 2013 ADVA Optical Networking. All rights reserved.
  • 9. Speedup Factor (GPU vs CPU) Single precision (SP) Double precision (DP) Legend DP: Nvidia CUDA FFT SP: FFT using pre-calculated twiddle factors • Single precision arithmetic has much higher performance on GPU (because main target group is computer gaming) • Longer block lengths allow better parallelization  Single precision implementation desirable 9 © 2013 ADVA Optical Networking. All rights reserved.
  • 10. Accuracy (in single precision) Legend CUFFT: Nvidia CUDA FFT FFTW: Fastest Fourier Transform in the West IPP: Intel Integrated Performance Primitives LUT-based FFT LUT: Precalculate trigonometric functions in DP • Total accuracy of SSFM dominated by FFT accuracy • Backward error grows linearly with increasing number of FFTs • CUDA FFT shows considerably higher error than other FFT implementations 10 © 2013 ADVA Optical Networking. All rights reserved.
  • 11. Analysis: Accuracy Why is the accuracy of CUFFT in SP relatively low?  FFT performance depends crucially on accuracy of „twiddle- factors“ (or trigonometric functions)  HW implementation of trigonometric functions in SP on GPUs optimized for peak performance not accuracy What can be done to increase accuracy in single precision?  Implementation of Taylor series expansion (slow!)  Compute trigonometric functions in DP on CPU and store them in a look-up table on the GPU (especially suited to the split-step Fourier method with thousands of FFTs of similar length) J. C. Schatzman, SIAM J. Scientific Comput. (1996). 11 © 2013 ADVA Optical Networking. All rights reserved.
  • 12. Illustrative Example CUDA FFT (SP) LUT-based FFT (SP) -: GPU -: CPU • Look-up table based FFT provides a significantly increased accuracy in single- precision arithmetics • Look-up table holds pre-calculated „twiddle-factor“ values Source: S. Pachnicke, et al, OFC 2011. 12 © 2013 ADVA Optical Networking. All rights reserved.
  • 13. System Analysis (SSFM Simulation) Req. OSNR deviation for BER=10-3 [dB] GPU simulation (in SP or DP) vs. CPU simulation (in DP) 11x 112 Gb/s CP-QPSK • GPU double precision results are (almost) identical to CPU results • The OSNR penalty of our single precision implementation remains below 0.1 dB up to a number of approx. 125,000 split-steps Source: S. Pachnicke, IEEE ICTON, 2010. 13 © 2013 ADVA Optical Networking. All rights reserved.
  • 14. Combined Simulation in SP & DP  Calculate approximate division of the parameter space into strata by fast simulations with single precision.  The ellipses represent parameter combinations for which bit errors occur during transmission.  Execute simulations with double precision accuracy sparsely in the different strata to assess the BER.  Combined simulation with single and double precision and automatic (algorithmic) choice of amount of single precision simulations P. Serena, et al, IEEE JLT, 2009. S. Pachnicke, et al, OFC 2011. 14 © 2013 ADVA Optical Networking. All rights reserved.
  • 15. Discussion Robustness of algorithm has been checked by deliberately selecting high amount of 880,000 split-steps • Results of combined (SP & DP) GPU simulations match well with results obtained from CPU simulations in DP • Speedup of up to a factor of 180 possible compared to CPU  Stratified Monte-Carlo sampling allows algorithmic choice of amount of required DP simulations for a given accuracy Source: S. Pachnicke, et al, OFC 2011. 15 © 2013 ADVA Optical Networking. All rights reserved.
  • 16. Design Advantages • GPU parallelization allows simulation of a long distance 80 WDM channel system on a PC in reasonable time Source: C. Xia, D. van den Borne, OFC, 2011 • Result: The system performance can be estimated much more precisely than with CPU-based simulations (typically modeling only 10 WDM channel systems) 16 © 2013 ADVA Optical Networking. All rights reserved.
  • 17. Conclusion • GPUs offer a much higher computational peak performance than CPUs • Full benefit of GPU power only in single precision • Increase in single precision accuracy possible by pre-computing of trigonometric function values for FFTs • Speedup in simulation time of more than a factor of 100 possible compared to CPU 17 © 2013 ADVA Optical Networking. All rights reserved.
  • 18. Further Reading • N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, J. Manferdelli, “High Performance Discrete Fourier Transforms on Graphics Processors”, Proc. of IEEE conference on Supercomputing (SC), article no. 2 (2008). • S. Pachnicke, “Fiber-Optic Transmission Networks: Efficient Design and Dynamic Operation”, Springer (2011). • J. C. Schatzman, “Accuracy of the Discrete Fourier Transform and the Fast Fourier Transform”, SIAM J. Scientific Comput. 17, 1150-1166 (1996). • G. Falcao, V. Silva, L. Sousa, “How GPUs can outperform ASICs for fast LDPC decoding”, Proc. of ACM International Conference on Supercomputing (ICS), 390-399 (2009). • J. A. Stratton, S. S. Stone, W.-M. W. Hwu, “MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs”, Lecture Notes in Computer Science 5335, 16-30 (2008). • R. R. Exposito, G. L. Taboada, S. Ramos, J. Tourino, R. Doallo, “General- purpose computation on GPUs for high performance cloud computing”, Wiley J. Concurrency and Computation 24 (2012). 18 © 2013 ADVA Optical Networking. All rights reserved.
  • 19. Thank you spachnicke@advaoptical.com IMPORTANT NOTICE The content of this presentation is strictly confidential. ADVA Optical Networking is the exclusive owner or licensee of the content, material, and information in this presentation. Any reproduction, publication or reprint, in whole or in part, is strictly prohibited. The information in this presentation may not be accurate, complete or up to date, and is provided without warranties or representations of any kind, either express or implied. ADVA Optical Networking shall not be responsible for and disclaims any liability for any loss or damages, including without limitation, direct, indirect, incidental, consequential and special damages, alleged to have been caused by or in connection with using and/or relying on the information contained in this presentation. Copyright © for the entire content of this presentation: ADVA Optical Networking.