SlideShare une entreprise Scribd logo
1  sur  1
Hardware Acceleration of TEA and XTEA Algorithms on FPGA, GPU and
                                                        Multi-Core Processors
                                                                                  Vivek Venugopal and Devu Manikantan Shila {venugov, manikad}@utrc.utc.com

 Introduction                                                                                              Tiny Encryption Algorithm (TEA) Extended Tiny Encryption Algorithm (XTEA)
                                                                                                                                                                                            half round1                                                                                                                                               half round 2                                                                                          half round1                                                               half round 2
                                                                                                                              v1 32
                                                                                                                                                                                                                                                                              32                                                                                                                                          v1 32            << 4                                                     32
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         << 4
                                                                                                                                                 << 4                                                                                                                                               << 4
                                                                                                                              k0   32                                  +                                                                                            k2 32                                                         +                                                                                        v1   32
                                                                                                                                                                                                                                                                                                                                                                                                                                           >> 5
                                                                                                                                                                                                                                                                                                                                                                                                                                                      XOR
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    32
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         >> 5
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          XOR


                                                                                                                              v1      32                                                                                                                                      32
                                                                                                                                                                                                                                                                                                                                                                                                                           v1   32
                                                                                                                                      32       +                                           XOR
                                                                                                                                                                                                                                                                               32                   +                                          XOR                                                                                                          +                                                                    +
                                                                                                                              sum                                                                                                                                  sum

   Gateway to                                                                                                                         32                                                                                                                                       32                                                                                                                                               32                                                                               32
                                                                                                                              v1                 >> 5                                                                                                                                               >> 5                                                                                                                 sum0                                                                               ky
    Internet
                                 GPU + ARM (NVIDIA CARMA)                                                                     k1      32                               +                             XOR                                                              k3 32                                                     +                  XOR
                                                                                                                                                                                                                                                                                                                                                                                                                          kx    32                     +        XOR
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         sum1    32      +           XOR
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       v1_new
                                                                                                                                                                                                                                                                                                                                                                              v1_new
                                                              Planning                                                                                                                                              32                   +/-                   v0_new                                                                                        32   +/-                                                                                                 32   +/-                                                             32    +/-
                                                                                                                                                                                                     v0                                                                                                                                            v1                                                                                                           v0                              v0_new                               v1
                                                             Computer
                                                                                                                              encrypt/decrypt                                                                                                                                                                                                                                                                            encrypt/decrypt
       Encrypted communication

                                                                                     Flight Control and
                                                                                    Navigation Computer   • TEA uses addition, XOR and shift operations on 32-bit words • The Extended Tiny Encryption Algorithm (XTEA) was introduced after
                                                                                                          and has a very small code footprint.                                                                                                                                                                                                                                                       weaknesses for smaller rounds were found in TEA.
  Smart meter application         FPGA + ARM (Xilinx Zynq)
                                                                Unmanned Autonomous Vehicle               • TEA has security holes and weaknesses for smaller rounds,                                                                                                                                                                                                                                • In XTEA, the key scheduling is modified to reflect different patterns for
                                                                                                          especially the Avalanche Effect seen for 6 rounds                                                                                                                                                                                                                                          mixing the data and key continuously per round.
 • In smart grids, sensitive information such as power
 consumption, price update, or outage awareness is
 exchanged between the meters and the power utility
                                                                                                                                                                                                                        Implementation platforms and Results                                                                                                                                                                                                                                    8000
                                                                                                                                                                                                                                                                                                                                                                                             8000                                                                                                                     Intel Xeon X5650                          Nvidia C2070
 company in real-time over the Internet.                                                                  • Nvidia's Tesla C2070 high-end GPU, 2 hexa-core                                                                                                                                                                                                                                          Intel Xeon X5650
                                                                                                                                                                                                                                                                                                                                                                                                    Nvidia C2070
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Intel Quad core i7                        Nvidia GT650M
 • Unmanned Autonomous Vehicles (UAV) continuously                                                        Intel Xeon processors, Nvidia's GeForce GT 650M                                                                                                                                                                                                                                           Intel Quad core i7
                                                                                                                                                                                                                                                                                                                                                                                                    Nvidia GT650M                                                                               6000
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Zynq

 exchange dynamic information regarding the urban                                                         notebook GPU consisting of 384 cores, quad-core                                                                                                                                                                                                                                    6000




                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Throughput in Mbps
                                                                                                                                                                                                                                                                                                                                                                                                    Zynq




                                                                                                                                                                                                                                                                                                                                                                        Throughput in Mbps
 environment with a gateway. The gateway also provides                                                    Intel Core i7 CPU.
 feedback regarding the optimization parameters that                                                      • Xilinx's Zynq-7000 SoC ZC702 evaluation board.                                                                                                                                                                                                                                   4000
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                4000

 need to be fed into the UAV's path planning algorithm                                                    The Zynq-7000 platform consists of a dual ARM
 for mapping different routes to reach it's destination                                                   Cortex A-9 processor clocked at 800 MHz and                                                                                                                                                                                                                                                                                                                                           2000
                                                                                                                                                                                                                                                                                                                                                                                             2000
 safely.                                                                                                  Artix-7 FPGA as the programmable logic.                       Streaming Multiprocessor (SMX) Architecture
                                                                                                                                                                        Kepler GK110’s new SMX introduces several architectural innovations that make it not only the most




 • Cyber attacks on such critical and dynamic
                                                                                                                                                                        powerful multiprocessor we’ve built, but also the most programmable and power efficient.



                                                                                                                                                                                                                                                                                                                    Copy input data and
                                                                                                                                                                                                                                                                                                                   keys to GPU memory
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   0
 information can lead to severe losses of                                                                                                                                                                                                                                                                                                                                                       0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         8 KB                16 KB              8 MB       128 MB        1 GB
                                                                                                                                                                                                                                                                                                                                                                                                    8 KB      16 KB             8 MB              128 MB      1 GB
 resources and finance.                                                                                            SMX

                                                                                                            Control Logic
                                                                                                                                           SMX

                                                                                                                                      Control Logic
                                                                                                                                                                                                                                                                                                                  pre-compute sum values
                                                                                                                                                                                                                                                                                                                  for each round and store
                                                                                                                                                                                                                                                                                                                      in shared memory                                                                                Plaintext size
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Plaintext size
                                                                                                                                                                                                                                                                                                                                                                                                             Throughput (Mbps) comparison of TEA                                                                       Throughput (Mbps) comparison of XTEA

Motivation                                                                                                                                                                                                                                                                                                          calculate ciphers for
                                                                                                                                                                                                                                                                                                                     blocks in parallel




 • All the information from/to these smart meters need                                                           GT650M: 2 SMX with
                                                                                                                                                                                                                                                                                                                    copy ciphers back to
                                                                                                                                                                                                                                                                                                                            CPU
                                                                                                                                                                                                                                                                                                                                                                                              Conclusion
 to be decrypted/encrypted at the gateway, which in                                                                192 cores each                                                                            Inside SMX                                                                                     GPU Implementation
                                                                                                                                                                                                                                                                                                                                                                                              • GPUs and FPGAs provide better throughput for both TEA and XTEA as
                                                                                                                                                                        SMX: 192 single precision CUDA cores, 64 double precision units, 32 special function units (SFU), and 32 load/store units
                                                                                                                                                                        (LD/ST).




 turn can lead to very large response times. A larger
                                                                                                                                                                                                                                                                                                                                                                                              compared to CPUs.
                                                                                                                                       Flash          DRAM           SRAM



 response time implies poorer performance in terms of
 both throughput and latency.
                                                                                                          GIGe


                                                                                                          USB
                                                                                                                        Processing
                                                                                                                         System
                                                                                                                                                     Memory
                                                                                                                                                    Interfaces                         Custom
                                                                                                                                                                                                                                        Displays


                                                                                                                                                                                                                                           PCIe                      Running on Zynq board                                                 Running in ISIM
                                                                                                                                                                                                                                                                                                                                                                                              • FPGAs perform better for smaller plaintext sizes whereas GPUs are better for
                                                                                                                                                                                                                                                                                                                                                                                              larger plaintext sizes.
 • Continuous transmission of data from UAV regarding                                                     CAN
                                                                                                                                                                                                                                                                                                                        AXI Interconnect




                                                                                                                                                                                                                                                                                                                                                                                              • In terms of development time and cost, GPUs are better suited as embedded
                                                                                                                                               Dual ARM Cortex A-9
                                                                                                                          Fixed                 MPCore (800 MHz)
                                                                                                          I2C                                                                       Peripheral
                                                                                                                        peripherals


 the evidence grid need to be encrypted fast.
                                                                                                                                                                                                                                      SelectIO
                                                                                                                                                                                                                                     Resources
                                                                                                                                                                                                                                                                              Processing                                                             Programmable
                                                                                                          SD                                                                                                                                                                   System                                                                    Logic


                                                                                                                                                                                                                                                                                                                                                                                              cryptography co-processors as compared to FPGAs.
                                                                                                                                                                                                                                                                                                           JTAG


 • FPGAs and GPUs can be used in gateways to speed
                                                                                                          UART
                                                                                                                         2x 12-bit
                                                                                                                                                     Custom          Programmable

                                                                                                                                                                                                                                                                                                                                                                                              • Future research efforts may address the use of Zynq platform as a complete, low-
                                                                                                          GPIO          MSPS ADC                                                                                                        Memory
                                                                                                                                                                         Logic

 up the TEA/XTEA encryption and decryption of bulk
 information for improved throughput and latency.
                                                                                                                                      Analog        Monitors         Analog
                                                                                                                                                                                                                                                                                                                                                                                              cost cryptographic co-processor for more complex cryptographic algorithms
                                                                                                                               Zynq Internal block diagram                                                                                                                                      Hardware in Loop setup




 References
[1] D. J. Wheeler and R. M. Needham. TEA, a tiny encryption algorithm, 1995.
[2] D. J. Wheeler and R. M. Needham. TEA extensions. Technical report, Cambridge University, England, October 1997.
[3] Xilinx Inc. Xilinx Zynq-7000 SoC ZC702 Evaluation kit.
[4] Nvidia Inc. (Last Accessed: February 2012) Nvidia Tesla C2070 GPU Computing Processor, Nvidia GeoForce GT650M Notebook GPU [Available Online]

Contenu connexe

Similaire à Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors

Crompton Way Traffic Proposal Map
Crompton Way Traffic Proposal MapCrompton Way Traffic Proposal Map
Crompton Way Traffic Proposal Mapguestf8bf20
 
Extending carriers network with fring OTT
Extending carriers network with fring OTTExtending carriers network with fring OTT
Extending carriers network with fring OTTRoy Timor-Rousso
 
9 18 Part 2
9 18 Part 29 18 Part 2
9 18 Part 2burgerja
 
Whitehall Framework Plan
Whitehall Framework Plan  Whitehall Framework Plan
Whitehall Framework Plan ExSite
 
Whitehall Framework Plan
Whitehall Framework Plan  Whitehall Framework Plan
Whitehall Framework Plan ExSite
 
Jan&rsquo;s Health Bar Proposed Patio Revisions
Jan&rsquo;s Health Bar Proposed Patio RevisionsJan&rsquo;s Health Bar Proposed Patio Revisions
Jan&rsquo;s Health Bar Proposed Patio Revisionswedway
 
La Corda D'Oro: Brand New Breeze for Violin
La Corda D'Oro: Brand New Breeze for ViolinLa Corda D'Oro: Brand New Breeze for Violin
La Corda D'Oro: Brand New Breeze for Violinsayakahime
 
Fools garden lemon tree
Fools garden   lemon treeFools garden   lemon tree
Fools garden lemon treeSah Ya
 
CambridgeIP: IP Data as a source of Business Intelligence
CambridgeIP: IP Data as a source of Business IntelligenceCambridgeIP: IP Data as a source of Business Intelligence
CambridgeIP: IP Data as a source of Business IntelligenceCambridgeIP Ltd
 
AC/DC highway to hell
AC/DC highway to hellAC/DC highway to hell
AC/DC highway to helldhan drummer
 
BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.Starckn
 
Cafe Life Thumbnail Charts
Cafe Life Thumbnail ChartsCafe Life Thumbnail Charts
Cafe Life Thumbnail Chartsguest7cc3e6
 
Memorias (Juan Pablo Cediel)
Memorias (Juan Pablo Cediel)Memorias (Juan Pablo Cediel)
Memorias (Juan Pablo Cediel)pabloced
 
Architectural Portfolio
Architectural PortfolioArchitectural Portfolio
Architectural PortfolioSam Sampoux
 

Similaire à Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors (20)

Crompton Way Traffic Proposal Map
Crompton Way Traffic Proposal MapCrompton Way Traffic Proposal Map
Crompton Way Traffic Proposal Map
 
Rain in-spring
Rain in-springRain in-spring
Rain in-spring
 
Or cad
Or cadOr cad
Or cad
 
Extending carriers network with fring OTT
Extending carriers network with fring OTTExtending carriers network with fring OTT
Extending carriers network with fring OTT
 
9 18 Part 2
9 18 Part 29 18 Part 2
9 18 Part 2
 
Whitehall Framework Plan
Whitehall Framework Plan  Whitehall Framework Plan
Whitehall Framework Plan
 
Whitehall Framework Plan
Whitehall Framework Plan  Whitehall Framework Plan
Whitehall Framework Plan
 
Jan&rsquo;s Health Bar Proposed Patio Revisions
Jan&rsquo;s Health Bar Proposed Patio RevisionsJan&rsquo;s Health Bar Proposed Patio Revisions
Jan&rsquo;s Health Bar Proposed Patio Revisions
 
La Corda D'Oro: Brand New Breeze for Violin
La Corda D'Oro: Brand New Breeze for ViolinLa Corda D'Oro: Brand New Breeze for Violin
La Corda D'Oro: Brand New Breeze for Violin
 
Fools garden lemon tree
Fools garden   lemon treeFools garden   lemon tree
Fools garden lemon tree
 
CambridgeIP: IP Data as a source of Business Intelligence
CambridgeIP: IP Data as a source of Business IntelligenceCambridgeIP: IP Data as a source of Business Intelligence
CambridgeIP: IP Data as a source of Business Intelligence
 
Canon in-d
Canon in-dCanon in-d
Canon in-d
 
AC/DC highway to hell
AC/DC highway to hellAC/DC highway to hell
AC/DC highway to hell
 
21 chahd
21 chahd21 chahd
21 chahd
 
21 chahd
21 chahd21 chahd
21 chahd
 
BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.
 
Cafe Life Thumbnail Charts
Cafe Life Thumbnail ChartsCafe Life Thumbnail Charts
Cafe Life Thumbnail Charts
 
Memorias (Juan Pablo Cediel)
Memorias (Juan Pablo Cediel)Memorias (Juan Pablo Cediel)
Memorias (Juan Pablo Cediel)
 
Architectural Portfolio
Architectural PortfolioArchitectural Portfolio
Architectural Portfolio
 
クラウドコンピューティングと OSS
クラウドコンピューティングと OSSクラウドコンピューティングと OSS
クラウドコンピューティングと OSS
 

Plus de Vivek Venugopalan

xDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
xDEFENSE: An Extended DEFENSE for mitigating Next Generation IntrusionsxDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
xDEFENSE: An Extended DEFENSE for mitigating Next Generation IntrusionsVivek Venugopalan
 
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGADesign, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGAVivek Venugopalan
 
Accelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsVivek Venugopalan
 
Real-time processing for ATST
Real-time processing for ATSTReal-time processing for ATST
Real-time processing for ATSTVivek Venugopalan
 
Accelerating Particle Image Velocimetry using Hybrid Architectures
Accelerating Particle Image Velocimetry using Hybrid ArchitecturesAccelerating Particle Image Velocimetry using Hybrid Architectures
Accelerating Particle Image Velocimetry using Hybrid ArchitecturesVivek Venugopalan
 

Plus de Vivek Venugopalan (6)

xDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
xDEFENSE: An Extended DEFENSE for mitigating Next Generation IntrusionsxDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
xDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
 
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGADesign, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
 
Accelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUs
 
Real-time processing for ATST
Real-time processing for ATSTReal-time processing for ATST
Real-time processing for ATST
 
Accelerating Particle Image Velocimetry using Hybrid Architectures
Accelerating Particle Image Velocimetry using Hybrid ArchitecturesAccelerating Particle Image Velocimetry using Hybrid Architectures
Accelerating Particle Image Velocimetry using Hybrid Architectures
 
CISL talk
CISL talkCISL talk
CISL talk
 

Dernier

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Dernier (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors

  • 1. Hardware Acceleration of TEA and XTEA Algorithms on FPGA, GPU and Multi-Core Processors Vivek Venugopal and Devu Manikantan Shila {venugov, manikad}@utrc.utc.com Introduction Tiny Encryption Algorithm (TEA) Extended Tiny Encryption Algorithm (XTEA) half round1 half round 2 half round1 half round 2 v1 32 32 v1 32 << 4 32 << 4 << 4 << 4 k0 32 + k2 32 + v1 32 >> 5 XOR 32 >> 5 XOR v1 32 32 v1 32 32 + XOR 32 + XOR + + sum sum Gateway to 32 32 32 32 v1 >> 5 >> 5 sum0 ky Internet GPU + ARM (NVIDIA CARMA) k1 32 + XOR k3 32 + XOR kx 32 + XOR sum1 32 + XOR v1_new v1_new Planning 32 +/- v0_new 32 +/- 32 +/- 32 +/- v0 v1 v0 v0_new v1 Computer encrypt/decrypt encrypt/decrypt Encrypted communication Flight Control and Navigation Computer • TEA uses addition, XOR and shift operations on 32-bit words • The Extended Tiny Encryption Algorithm (XTEA) was introduced after and has a very small code footprint. weaknesses for smaller rounds were found in TEA. Smart meter application FPGA + ARM (Xilinx Zynq) Unmanned Autonomous Vehicle • TEA has security holes and weaknesses for smaller rounds, • In XTEA, the key scheduling is modified to reflect different patterns for especially the Avalanche Effect seen for 6 rounds mixing the data and key continuously per round. • In smart grids, sensitive information such as power consumption, price update, or outage awareness is exchanged between the meters and the power utility Implementation platforms and Results 8000 8000 Intel Xeon X5650 Nvidia C2070 company in real-time over the Internet. • Nvidia's Tesla C2070 high-end GPU, 2 hexa-core Intel Xeon X5650 Nvidia C2070 Intel Quad core i7 Nvidia GT650M • Unmanned Autonomous Vehicles (UAV) continuously Intel Xeon processors, Nvidia's GeForce GT 650M Intel Quad core i7 Nvidia GT650M 6000 Zynq exchange dynamic information regarding the urban notebook GPU consisting of 384 cores, quad-core 6000 Throughput in Mbps Zynq Throughput in Mbps environment with a gateway. The gateway also provides Intel Core i7 CPU. feedback regarding the optimization parameters that • Xilinx's Zynq-7000 SoC ZC702 evaluation board. 4000 4000 need to be fed into the UAV's path planning algorithm The Zynq-7000 platform consists of a dual ARM for mapping different routes to reach it's destination Cortex A-9 processor clocked at 800 MHz and 2000 2000 safely. Artix-7 FPGA as the programmable logic. Streaming Multiprocessor (SMX) Architecture Kepler GK110’s new SMX introduces several architectural innovations that make it not only the most • Cyber attacks on such critical and dynamic powerful multiprocessor we’ve built, but also the most programmable and power efficient. Copy input data and keys to GPU memory 0 information can lead to severe losses of 0 8 KB 16 KB 8 MB 128 MB 1 GB 8 KB 16 KB 8 MB 128 MB 1 GB resources and finance. SMX Control Logic SMX Control Logic pre-compute sum values for each round and store in shared memory Plaintext size Plaintext size Throughput (Mbps) comparison of TEA Throughput (Mbps) comparison of XTEA Motivation calculate ciphers for blocks in parallel • All the information from/to these smart meters need GT650M: 2 SMX with copy ciphers back to CPU Conclusion to be decrypted/encrypted at the gateway, which in 192 cores each Inside SMX GPU Implementation • GPUs and FPGAs provide better throughput for both TEA and XTEA as SMX: 192 single precision CUDA cores, 64 double precision units, 32 special function units (SFU), and 32 load/store units (LD/ST). turn can lead to very large response times. A larger compared to CPUs. Flash DRAM SRAM response time implies poorer performance in terms of both throughput and latency. GIGe USB Processing System Memory Interfaces Custom Displays PCIe Running on Zynq board Running in ISIM • FPGAs perform better for smaller plaintext sizes whereas GPUs are better for larger plaintext sizes. • Continuous transmission of data from UAV regarding CAN AXI Interconnect • In terms of development time and cost, GPUs are better suited as embedded Dual ARM Cortex A-9 Fixed MPCore (800 MHz) I2C Peripheral peripherals the evidence grid need to be encrypted fast. SelectIO Resources Processing Programmable SD System Logic cryptography co-processors as compared to FPGAs. JTAG • FPGAs and GPUs can be used in gateways to speed UART 2x 12-bit Custom Programmable • Future research efforts may address the use of Zynq platform as a complete, low- GPIO MSPS ADC Memory Logic up the TEA/XTEA encryption and decryption of bulk information for improved throughput and latency. Analog Monitors Analog cost cryptographic co-processor for more complex cryptographic algorithms Zynq Internal block diagram Hardware in Loop setup References [1] D. J. Wheeler and R. M. Needham. TEA, a tiny encryption algorithm, 1995. [2] D. J. Wheeler and R. M. Needham. TEA extensions. Technical report, Cambridge University, England, October 1997. [3] Xilinx Inc. Xilinx Zynq-7000 SoC ZC702 Evaluation kit. [4] Nvidia Inc. (Last Accessed: February 2012) Nvidia Tesla C2070 GPU Computing Processor, Nvidia GeoForce GT650M Notebook GPU [Available Online]