Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors

Hardware Acceleration of TEA and XTEA Algorithms on FPGA, GPU and
Multi-Core Processors
Vivek Venugopal and Devu Manikantan Shila {venugov, manikad}@utrc.utc.com

Introduction Tiny Encryption Algorithm (TEA) Extended Tiny Encryption Algorithm (XTEA)
half round1 half round 2 half round1 half round 2
v1 32
32 v1 32 << 4 32
<< 4
<< 4 << 4
k0 32 + k2 32 + v1 32
>> 5
XOR
32
>> 5
XOR

v1 32 32
v1 32
32 + XOR
32 + XOR + +
sum sum

Gateway to 32 32 32 32
v1 >> 5 >> 5 sum0 ky
Internet
GPU + ARM (NVIDIA CARMA) k1 32 + XOR k3 32 + XOR
kx 32 + XOR
sum1 32 + XOR
v1_new
v1_new
Planning 32 +/- v0_new 32 +/- 32 +/- 32 +/-
v0 v1 v0 v0_new v1
Computer
encrypt/decrypt encrypt/decrypt
Encrypted communication

Flight Control and
Navigation Computer • TEA uses addition, XOR and shift operations on 32-bit words • The Extended Tiny Encryption Algorithm (XTEA) was introduced after
and has a very small code footprint. weaknesses for smaller rounds were found in TEA.
Smart meter application FPGA + ARM (Xilinx Zynq)
Unmanned Autonomous Vehicle • TEA has security holes and weaknesses for smaller rounds, • In XTEA, the key scheduling is modified to reflect different patterns for
especially the Avalanche Effect seen for 6 rounds mixing the data and key continuously per round.
• In smart grids, sensitive information such as power
consumption, price update, or outage awareness is
exchanged between the meters and the power utility
Implementation platforms and Results 8000
8000 Intel Xeon X5650 Nvidia C2070
company in real-time over the Internet. • Nvidia's Tesla C2070 high-end GPU, 2 hexa-core Intel Xeon X5650
Nvidia C2070
Intel Quad core i7 Nvidia GT650M
• Unmanned Autonomous Vehicles (UAV) continuously Intel Xeon processors, Nvidia's GeForce GT 650M Intel Quad core i7
Nvidia GT650M 6000
Zynq

exchange dynamic information regarding the urban notebook GPU consisting of 384 cores, quad-core 6000

Throughput in Mbps
Zynq

Throughput in Mbps
environment with a gateway. The gateway also provides Intel Core i7 CPU.
feedback regarding the optimization parameters that • Xilinx's Zynq-7000 SoC ZC702 evaluation board. 4000
4000

need to be fed into the UAV's path planning algorithm The Zynq-7000 platform consists of a dual ARM
for mapping different routes to reach it's destination Cortex A-9 processor clocked at 800 MHz and 2000
2000
safely. Artix-7 FPGA as the programmable logic. Streaming Multiprocessor (SMX) Architecture
Kepler GK110’s new SMX introduces several architectural innovations that make it not only the most

• Cyber attacks on such critical and dynamic
powerful multiprocessor we’ve built, but also the most programmable and power efficient.

Copy input data and
keys to GPU memory
0
information can lead to severe losses of 0
8 KB 16 KB 8 MB 128 MB 1 GB
8 KB 16 KB 8 MB 128 MB 1 GB
resources and finance. SMX

Control Logic
SMX

Control Logic
pre-compute sum values
for each round and store
in shared memory Plaintext size
Plaintext size
Throughput (Mbps) comparison of TEA Throughput (Mbps) comparison of XTEA

Motivation calculate ciphers for
blocks in parallel

• All the information from/to these smart meters need GT650M: 2 SMX with
copy ciphers back to
CPU
Conclusion
to be decrypted/encrypted at the gateway, which in 192 cores each Inside SMX GPU Implementation
• GPUs and FPGAs provide better throughput for both TEA and XTEA as
SMX: 192 single precision CUDA cores, 64 double precision units, 32 special function units (SFU), and 32 load/store units
(LD/ST).

turn can lead to very large response times. A larger
compared to CPUs.
Flash DRAM SRAM

response time implies poorer performance in terms of
both throughput and latency.
GIGe

USB
Processing
System
Memory
Interfaces Custom
Displays

PCIe Running on Zynq board Running in ISIM
• FPGAs perform better for smaller plaintext sizes whereas GPUs are better for
larger plaintext sizes.
• Continuous transmission of data from UAV regarding CAN
AXI Interconnect

• In terms of development time and cost, GPUs are better suited as embedded
Dual ARM Cortex A-9
Fixed MPCore (800 MHz)
I2C Peripheral
peripherals

the evidence grid need to be encrypted fast.
SelectIO
Resources
Processing Programmable
SD System Logic

cryptography co-processors as compared to FPGAs.
JTAG

• FPGAs and GPUs can be used in gateways to speed
UART
2x 12-bit
Custom Programmable

• Future research efforts may address the use of Zynq platform as a complete, low-
GPIO MSPS ADC Memory
Logic

up the TEA/XTEA encryption and decryption of bulk
information for improved throughput and latency.
Analog Monitors Analog
cost cryptographic co-processor for more complex cryptographic algorithms
Zynq Internal block diagram Hardware in Loop setup

References
[1] D. J. Wheeler and R. M. Needham. TEA, a tiny encryption algorithm, 1995.
[2] D. J. Wheeler and R. M. Needham. TEA extensions. Technical report, Cambridge University, England, October 1997.
[3] Xilinx Inc. Xilinx Zynq-7000 SoC ZC702 Evaluation kit.
[4] Nvidia Inc. (Last Accessed: February 2012) Nvidia Tesla C2070 GPU Computing Processor, Nvidia GeoForce GT650M Notebook GPU [Available Online]

Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors

Recommandé

Recommandé

Contenu connexe

Similaire à Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors

Similaire à Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors (20)

Plus de Vivek Venugopalan

Plus de Vivek Venugopalan (6)

Dernier

Dernier (20)

Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors