SlideShare a Scribd company logo
1 of 45
Download to read offline
Shinya Morino, Sr. Solution Architect, NVIDIA, 2/14/2020
QGATE 0.3:
QUANTUM CIRCUIT SIMULATOR
2
NVIDIA AI TECHNOLOGY CENTER (NVAITC)
Catalyse AI transformation through Research-Centric Integrated Engagements
Singapore (AP HQ)
Taiwan
China
Australia
Hong Kong
Luxembourg
Established Aug 2015 in Singapore
Collaboration Footprint: Singapore. ASEAN. Taiwan. China. Hong Kong. Australia. Europe.
Thailand
London
Indonesia
3
QUANTUM COMPUTING
Qubit (Quantum bit):
- The basic unit of quantum computers.
- Qubits are represented as a linear superposition of
two basis states, |0> and |1>.
ۧ|𝜓 = 𝛼 ۧ|0 + 𝛽 ۧ|1
𝛼 2
+ 𝛽 2
= 1
- |0> or|1> is observed by measurement.
Observation probabilities of |0> and |1> are 𝛼 2
and
𝛽 2
respectively.
Qubit
ۧ|0 = cos
𝜃
2
, ۧ|1 = 𝑒 𝑖𝜙 sin
𝜃
2
4
QUANTUM COMPUTING
Quantum circuits consist with qubits and quantum logic gates.
- With N qubits, 2N states can be represented (if entangled).
- One quantum state corresponds to one complex number.
Ex. With 53 qubits, 253 ( 10 Peta) states can be represented.
Quantum states are controlled by using quantum logic gates.
- Applying one gate can change 2N qubit states at the same
time.
- Developing quantum circuits is the programming for quantum
computing
Quantum circuit
H
H
H
H
5
QUANTUM CIRCUIT SIMULATION
State vector
- Quantum states are expended to a vector of
complex numbers
- Vector size is 2N for N-qubit circuits.
- Each bit in index is corresponding to one qubit.
Quantum states and state vector
𝑠0
𝑠1
𝑠2
⋮
𝑠2 𝑁
−2
𝑠2 𝑁
−1
ۧ|0 … 00
ۧ|0 … 01
ۧ|0 … 10
⋮
ۧ|1 … 10
ۧ|1 … 11
index of state vector
Quantum state
(complex number)
q0q1qN-1 …
Qubits
6
Represented as a 2x2 unitary matrix
Applying quantum gate to a state vector.
QUANTUM CIRCUIT SIMULATION
Quantum Logic Gate
U 𝑈 =
𝑢00 𝑢01
𝑢10 𝑢11
𝑠𝑖+1,| ۧ…𝟎…
𝑠𝑖+1,| ۧ…𝟏…
= 𝑈
𝑠𝑖,| ۧ…𝟎…
𝑠𝑖,| ۧ…𝟏…
Gate
U =
1 0
0 1
0 0
0 0
0 0
0 0
u00 u01
u10 u11
U
Control
Target
Gate is applied when controlling gbit is |1>.
Control gates can make qubits entangled.
𝑠𝑖+1,| ۧ…𝟏…𝟎…
𝑠𝑖+1,| ۧ…𝟏…𝟏…
= 𝑈
𝑠𝑖,| ۧ…𝟏…𝟎…
𝑠𝑖,| ۧ…𝟏…𝟏…
7
It’s said …
“Number of qubits” is the limitation,
because vast amount of memory proportional to 2N, is required for simulations.
PROBLEM DEFINITION
Quantum circuit simulator is an essential tool to develop quantum circuits, but there’re
limitations:
But actual issue as of today is:
“Simulation is very slow.”
Needing long time for debugging and verifying quantum circuits
8
QUANTUM CIRCUIT EXAMPLES
Circuit # qubits # gates
Capacity of
State vector
Estimated simulation time
Python*1
(CPU 1core)
CPU*2
(multi-core)
Quantum Volume*3
(width 32, depth 32)
32 5,120 64 GB 2 days 3 hours
iQFT *4
(Ex: 32 qubits)
32 560 64 GB 3 hours 13 min
Modulo operation
( 5n mod 12 )
27 5,449 2 GB 45 min 3 min
*1: Simulation with 1 cpu core. *2: Assuming 55 GB/sec of CPU memory bandwidth with naïve simulation algorithm.
*3: https://github.com/Qiskit/openqasm/blob/master/benchmarks/quantum_volume/quantum_volume_n32_d32.qasm,
*4: iQFT, Inversed Quantum Fourier Transform,
9
QUANTUM CIRCUIT SIMULATOR
QGATE
11
QGATE DESIGN CONCEPT
1. Easy development of quantum circuits with fast simulations for experiments
Rich built-in gate set to quickly develop circuits
Utilizing modern computing devices for performance
2. Single node, Multi GPU (multi devices)
Utilizing a big server with a huge amount of memory.
Focusing on performance. No intra-node communication.
3. Works as backends of other SDKs
Simulations can be accelerated on Blueqat, various SDKs.
12
1. EASY DEVELOPMENT OF QUANTUM CIRCUITS
Rich built-in gate set
- Multi-bit-controlled gates, such as Toffoli gate is included in built-in gate set
- Adjoint for all gates
All qubits are fully connected
IBM’s OpenQASM gate set is also supported
13
BUILT-IN OPERATORS
Quantum logic gate Symbol
Identity I
Hadamard gate H
Pauli gates and their rotations X, Y, Z, Rx(theta), Ry(theta), Rz(theta)
Exponential of identity and Pauli gates Exp(I, X, Y, Z)
Global phase Exp(theta)
Phase shift gates P(theta), T, S
Measurement, Probability Measure(qubit), Prob(qubit)
Extensions
OpenQASM’s U gates U3, U2, U1
Multi qubit measurement Measure(pauli gates)
14
UTILIZING MODERN COMPUTING DEVICES
FOR PERFORMANCE
Tesla V100 (SXM2)
7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS
32 GB HBM2 @ 900GB/s | 300GB/s NVLink
GPU CPU
CPU runtime is also implemented.
(Utilizing multi cores in one CPU socket)
15
TARGET HARDWARE
Requirement:
- Quantum circuit simulations need a
huge amount of memory
- Performance is important as well.
DGX-2
- 512 GB of GPU memory in 16 Tesla
V100
- By using NVLink, all memories in
GPUs are in one address space.
NVIDIA DGX-2
16
DGX-2
All GPUs are sharing a single address space.
All-to-all connections by NVLink
(300 GB/sec, bidirectional)
- 512 GB of ultra-fast memory
is available
- FP32: 35 qubits
FP64: 34 qubits
16 NVIDIA High-end GPUs + NVLink2
17
At a Glance
GPUs 4x NVIDIA® Tesla® V100
TFLOPS (GPU FP16) 500
GPU Memory 32 GB per GPU
NVIDIA Tensor Cores 2,560 (total)
NVIDIA CUDA Cores 20,480 (total)
CPU Intel Xeon E5-2698 v4 2.2 GHz (20-core)
System Memory 256 GB LRDIMM DDR4
Storage
Data: 3 x 1.92 TB SSD RAID 0
OS: 1 x 1.92 TB SSD
Network Dual 10 Gb LAN
Display 3x DisplayPort, 4K Resolution
Acoustics < 35 dB
Maximum Power Requirements 1500 W
Operating Temperature Range 10 - 30 oC
Software
Ubuntu Desktop Linux OS
DGX Recommended GPU Driver
CUDA Toolkit
17
NVIDIA DGX STATION
18
DGX STATION NVLINK NETWORK TOPOLOGY
For Efficient Application Scaling
NVIDIA NVLink Bridge
- Four NVIDIA Tesla V100 accelerators
- Each Tesla V100 GPU in DGX Station has four
NVLink connection points, each providing a point-
to-point connection to another GPU at a peak
bandwidth of 25GB/s
- Optimized for:
- The bandwidth achievable for a variety of point-
to-point and collective communications primitives
- The flexibility of the topology
- Performance with a subset of the GPUs
19
GPU REQUIREMENT
Qgate runs with a single GPU, and scales to multiple GPUs in a single node.
- Works with Kepler GPU (Cc3.5) or later. Recommendation is Maxwell GPU (Cc5.0) or later.
Multi GPU requirement
- NVLink : All-to-all NVLink connections between GPUs are required.
For performance, NVLink is strongly recommended.
- PCIe: All GPUs should be connected to the same PCIe root complex.
CPU
- Running with 1 CPU socket is supported. There’s no consideration for NUMA.
20
PERFORMANCE MEASUREMENT
Quantum circuit for measurement
- 10 Hadamard gates are placed on each qubit.
- FP64 is used.
Baseline, Single GPU Performance
H
H
H
H
H
H
H
H
H
H
H
H
...
...
...
Device
CPU (1 core)*1 Single thread on CPU
CPU (multi-core)
Multi-threaded*2 on CPU
(40 threads, 20 physical cores)
GPU GPU / CUDA
10 Hadamard gates
*1: CPU(1 core) is a model of python-based simulator which is
sometimes implemented by using 1 CPU core.
*2: Implemented by using C++ STL’s thread class
21
SUMMARY
Performance Baseline (30 qubits, Single GPU)
# gates applied in
sec.
Memory bandwidth Acc.
CPU (1 core) 0.11 3.7 GB/sec 1 -
CPU (multi-core) 1.59 54.8 GB/sec 14.9x 1
GPU 23.5 806 GB/sec 220x 14.7x
22
PROCESSING PIPELINE
(0.3 RELEASE)
23
PROCESSING PIPELINE
Built with Python and Native Extensions
Gate cancellation
Runtime
Removing cancelling gates
Dynamically grouping qubits, Reducing number of variables
required to represent quantum states
Reordering operators (gates and measurements)
in order to maximize effects of dynamic qubit grouping.
Parallelization on computing devices
CPU(multi-core), and GPU(CUDA)
Python
Input (Intermediate repr.)
Native
extension
Output (state vector)
Operator reordering
Dynamic qubit grouping
Quantum
computing
specific
Device
specific
Reordering qubits to reduce data transfer between devices.Qubit reordering
24
Backend
SOFTWARE DIAGRAM
qgate.model
Quantum circuit object model
Built-in gate definitions
qgate.simulator.runtimeqgate.simulator
Simulator
qgate.script
Circuit definition on python
qgate.openqasm
Importing OpenQASM files
qgate.simulator.qubits
State vector
Complex number
probability
Other plugins …
Frontend
Plugin
Blueqat plugin
qgate
pyruntime:
Python, reference
cpuruntime:
CPU, multi-core
cudaruntime:
CUDA, GPU
OM (object model)
Analyses and optimizations for
quantum circuits
Runtime
Accelerating numerical
operations
25
Products of some gate pairs cancel out
𝐼 = 𝑋 ∙ 𝑋 = 𝑌 ∙ 𝑌 = 𝑍 ∙ 𝑍 = 𝐻 ∙ 𝐻
GATE CANCELLATION
Quantum Circuit Optimization
U
U
U
X
U: Arbitrary unitary gate
X U
X XX
Ex: Modulo arithmetic*
(5^x mod 12, 27 qubits)Cancel out
Cancel out *This circuit was developed by Kato-san in MDR.
Ref: V. Vedral, A. Barenco, A. Ekert, https://arxiv.org/abs/quant-ph/9511018v1
Item Value
Before cancellation 5449 gates
After cancellation 3885 gates
Reduction rate 71.3 %
Also works for pairs of Y, Z, H gates whose squares are Identity.
26
DYNAMIC QUBIT GROUPING
If qubits are not entangled,
- State vector can be factorized.
- Reducing number of variables.
ۧ𝑠0|000
ۧ𝑠1|001
ۧ𝑠2|010
ۧ𝑠3|011
ۧ𝑠4|100
ۧ𝑠5|101
ۧ𝑠6|110
ۧ𝑠7|111
If 1 qubit is
not entangled,
ۧ𝑠10|00
ۧ𝑠11|01
ۧ𝑠12|10
ۧ𝑠13|11
ۧ𝑠00|0
ۧ𝑠01|1
⨂
3 qubit state vector
Size: 8 Size: 6 = (2 + 4)
1 qubit 2 qubits
27
ۧ𝑠0|0 … 00
ۧ𝑠1|0 … 01
ۧ𝑠2|0 … 10
ۧ𝑠220
−2|1 … 10
ۧ𝑠220
−1|1 … 11
ۧ𝑠0|0 … 00
ۧ𝑠1|0 … 01
ۧ𝑠210
−1|1 … 11
ۧ𝑠0|0 … 00
ۧ𝑠1|0 … 01
ۧ𝑠2|0 … 10
ۧ𝑠230
−3|1 … 01
ۧ𝑠230
−2|1 … 10
ۧ𝑠230
−1|1 … 11
DYNAMIC QUBIT GROUPING
30 qubit case
If qubits are divided to
10- and 20-qubit groups.
⨂
30 qubit state vector
Size: 230 Size: 220 + 210 ( 0.1 %)
10 qubits 20 qubits
…
…
…
28
EX. INVERSED QUANTUM FOURIER TRANSFORM
R1
R1
H
R2
R3
H
R1R2 H
R3 R1R2 HR4
# Variables 10
(2x5)
10
(22 + 2x3)
12
(23 + 2x2)
18
(24 + 2)
32
(25)
H
Qubits are grouped when
a controlled gate applied.
29
EFFECTS OF DYNAMIC QUBIT GROUPING
Calculation amount reduced by applying qubit grouping.
iQFT, Numerical Estimation
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1.0E+07
1.0E+08
1.0E+09
1.0E+10
1.0E+11
1.0E+12
0.0E+00
2.0E-01
4.0E-01
6.0E-01
8.0E-01
1.0E+00
0 4 8 12 16 20 24 28 32
w/o Qubit grouping
w/ Qubit grouping
Reduction ratio
Ratioofcalculationamount
(Qubitgroupingenabled/disabled)
CalculationAmount
Log axis.
12.1 % at 30 qubits.
# qubits
30
CALCULATION AMOUNT COMPARISON
In the range where # qubits is small,
- Processing overheads are observed.
In the range where # qubits is big,
- Computation time is enough long, and
overhead is relatively small.
- Estimation and measurement matched.
Observed overhead
- Time for analyzing quantum circuit
- Managing grouped state vectors.
CUIDA/CPU/Theoretical
0
0.2
0.4
0.6
0.8
1
1.2
1.4
8 12 16 20 24 28 32
# Qubits
Reductionratio
Processing overheads
observed
Performance
improved as expected
CUDA
CPU(multi core)
Theoretical
31
OPERATOR REORDERING
Maximizing effects of dynamic qubit grouping
- Reordering operators into a smaller qubit
group
- Reducing amount of calculation.
U0 U1
U3
U4
U2
U0 U1
U3
U4
U2
32
BENCHMARK
One of the most important algorithms of quantum computing
- Shor’s algorithm
Used for order-finding problem (https://en.wikipedia.org/wiki/Shor%27s_algorithm)
- Quantum chemistry
Used for obtaining matrix eigen values
Phase Estimation
33
PHASE ESTIMATION
Without Operator Reordering
R1
R1
H
R2
R3
H
R1R2 H
R3 R1R2 HR4
H
U16 U8 U4 U2 U
34
PHASE ESTIMATION
Operators are Reordered
R1
R1
H
R2
R3
H
R1R2 H
R3 R1R2 HR4
H
U16 U8 U4 U2 U
35
PHASE ESTIMATION
30 qubit circuit, 493 gates, FP64
- Measuring global phase of one qubit.
- 29 qubits are used for measurements.
- Running on a single Tesla V100 (32 GB)
Benchmark
exp(i 2n-1q) exp(i 2n-2q) exp(i q)
…
…
29qubits
iQFT
36
AN EXAMPLE OF CALCULATION RESULTS
1024 shots of sampling.
The initial value is 0.1
The initial value is 0.1.
Raw sampling results.
(0.09999997168779373, 1)
(0.09999998286366463, 1)
(0.09999998472630978, 1)
(0.09999999031424522, 1)
(0.09999999217689037, 1)
(0.09999999403953552, 4)
(0.09999999590218067, 4)
(0.09999999776482582, 26)
(0.09999999962747097, 900)
(0.10000000149011612, 57)
(0.10000000335276127, 17)
(0.10000000521540642, 7)
(0.10000000707805157, 1)
(0.10000000894069672, 1)
(0.10000001080334187, 1)
(0.10000001639127731, 1)
37
PHASE ESTIMATION
30 qubit circuit, 493 gates, FP64
- Measuring global phase of one qubit.
- 29 qubits are used for measurements.
Operator Reordering, Single GPU
Runtime/ optimization Elapsed time [s] Acceleration
CPU / no optimization 213 1
CPU / optimized 24.7 8.6x
CUDA / no optimization 13.7 15.5x
CUDA / optimized 1.86 114x
exp(i 2n-1q) exp(i 2n-2q) exp(i q)
…
…
29 qubitsiQFT
38
MULTI GPU + NVLINK
39
IDEAL MULTI GPU PERFORMANCE
Performance Baseline (30 qubits, Single GPU)
# gates applied
in sec.
Memory
bandwidth
Acc.
CPU (1 core) 0.11 3.7 GB/sec 1 -
CPU (multi-
core)
1.59 54.8 GB/sec 14.9x 1
GPU 23.5 806 GB/sec 220x 14.7x
58.8 = 14.7 x 4 GPUs (DGX Station)
40
BOTTLENECK : DATA TRANSFER
Ex. DGX Station
NVLink is fast, but slower than GPU memory.
100 GB/s
100 GB/s
50 GB/s50 GB/s
50 GB/s 50 GB/s
900 GB/s 900 GB/s
900 GB/s900 GB/s
Bandwidth
GPU 900 GB/s
NVLink
(1 Link, bidirectional)
50 GB/s
41
QUBIT REORDERING
Applying gates to q0 ~ q3 is done in
each GPU.
When q4, q5 are included in target
qubits, data transfers between GPUs
happen.
Multi GPU, Reducing Data Transfers
Ex)
q0
q1
q2
q3
q4
q5
Gates are applied in each GPU
Data transfers between GPUs happen
for each gate application.
Ref: 0.5 Petabyte Simulation of a 45-Qubit Quantum Circuit, Thomas Häner, Damian S.Steiger, https://arxiv.org/abs/1704.01127
42
QUBIT REORDERING
Reordering qubits
- Swapping q0 ~ q2 and q3 ~ q5.
- All required inter-device
communications are done during
reordering qubits.
- All gates are applied in each
GPU.
Multi GPU, Reducing Data Transfers
Ex)
Gates are applied
in each GPU
Data transfers
between GPUs happen only here.
Reorderingqubits
q0
q1
q2
q3
q4
q5
q3
q4
q5
q0
q1
q2
Gates are applied
in each GPU
43
BENCHMARK
https://github.com/Qiskit/openqasm/blob/master/benchmarks/quantum_volume/quantum_volume_n32_d32.qasm
32 qubit circuit, 5120 gates, FP64
Hardware: NVIDIA DGX Station. CPU: Xeon E5-2698 v4 2.2 GHz, GPU Tesla V100 x 4
Quantum Volume(n=32, d=32), FP64, DGX Station (4 GPUs)
Runtime Optimization Elapsed time Acc.
CPU No optimization 3.1 hours -
CUDA,
4 Tesla V100
No optimization 370 sec 29.7 x
+ Qubit reordering* 318 sec 56.7 x
+ Qubit grouping
+ Operator reordering
176 sec 62.5 x
*: Qubits are reordered for 10 times during execution of the whole circuit.
44
BENCHMARK
32 qubit circuit, 558 gates, FP64
Hardware: NVIDIA DGX Station. CPU: Xeon E5-2698 v4 2.2 GHz, GPU Tesla V100 x 4
Phase estimation, 32 qubit circuit
Runtime Optimization Elapsed time Acc.
CPU No optimization 774 sec -
CUDA,
4 Tesla V100
No optimization 18.4 sec 42 x
+ Qubit reordering* 15.4 sec 50 x
+ Qubit grouping
+ Operator reordering
3.2 sec 242 x
*: Qubits are reordered for 8 times during execution of the whole circuit.
45
PLANS FOR THE NEXT VERSION
• Supporting hyper-cube-mesh topology.
• Fully utilizing 8 GPUs on servers such as DGX-1 and AWS p3dn.24xlarge instance
• Enabling to run 33 qubit circuit(float64).
• Acceleration for GPU kernels.
• Qgate 0.3 implements naïve GPU kernels to apply gates, not optimized yet.
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR

More Related Content

What's hot

Implementation of k-means clustering algorithm in C
Implementation of k-means clustering algorithm in CImplementation of k-means clustering algorithm in C
Implementation of k-means clustering algorithm in CKasun Ranga Wijeweera
 
Post quantum cryptography
Post quantum cryptographyPost quantum cryptography
Post quantum cryptographyMartins Okoi
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term MemoryYan Xu
 
Quantum Computing
Quantum ComputingQuantum Computing
Quantum Computingt0pgun
 
Quantum communication in space
Quantum communication in spaceQuantum communication in space
Quantum communication in spaceSwapnil Gourkar
 
Quantum Information Technology
Quantum Information TechnologyQuantum Information Technology
Quantum Information TechnologyFenny Thakrar
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number GeneratorsDarshini Parikh
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep LearningNatasha Latysheva
 
Emily Stamm - Post-Quantum Cryptography
Emily Stamm - Post-Quantum CryptographyEmily Stamm - Post-Quantum Cryptography
Emily Stamm - Post-Quantum CryptographyCSNP
 
Quantum computing - Introduction
Quantum computing - IntroductionQuantum computing - Introduction
Quantum computing - Introductionrushmila
 
Where's Jarvis? The future of Voice Recognition and Natural Language User Int...
Where's Jarvis? The future of Voice Recognition and Natural Language User Int...Where's Jarvis? The future of Voice Recognition and Natural Language User Int...
Where's Jarvis? The future of Voice Recognition and Natural Language User Int...UXPA International
 

What's hot (20)

Implementation of k-means clustering algorithm in C
Implementation of k-means clustering algorithm in CImplementation of k-means clustering algorithm in C
Implementation of k-means clustering algorithm in C
 
Lstm
LstmLstm
Lstm
 
Post quantum cryptography
Post quantum cryptographyPost quantum cryptography
Post quantum cryptography
 
Quantum & AI in Finance
Quantum & AI in FinanceQuantum & AI in Finance
Quantum & AI in Finance
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
LSTM
LSTMLSTM
LSTM
 
Quantum Computing
Quantum ComputingQuantum Computing
Quantum Computing
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Quantum Computing
Quantum ComputingQuantum Computing
Quantum Computing
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Random number generator
Random number generatorRandom number generator
Random number generator
 
Quantum communication in space
Quantum communication in spaceQuantum communication in space
Quantum communication in space
 
Random Forest
Random ForestRandom Forest
Random Forest
 
Quantum Information Technology
Quantum Information TechnologyQuantum Information Technology
Quantum Information Technology
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number Generators
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
 
Emily Stamm - Post-Quantum Cryptography
Emily Stamm - Post-Quantum CryptographyEmily Stamm - Post-Quantum Cryptography
Emily Stamm - Post-Quantum Cryptography
 
Quantum computing - Introduction
Quantum computing - IntroductionQuantum computing - Introduction
Quantum computing - Introduction
 
Where's Jarvis? The future of Voice Recognition and Natural Language User Int...
Where's Jarvis? The future of Voice Recognition and Natural Language User Int...Where's Jarvis? The future of Voice Recognition and Natural Language User Int...
Where's Jarvis? The future of Voice Recognition and Natural Language User Int...
 
Pseudo Random Number
Pseudo Random NumberPseudo Random Number
Pseudo Random Number
 

Similar to QGATE 0.3: QUANTUM CIRCUIT SIMULATOR

Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDKNVIDIA Japan
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelKoichi Shirahata
 
IQM slide pitch deck
IQM slide pitch deckIQM slide pitch deck
IQM slide pitch deckKan Yuenyong
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9inside-BigData.com
 
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCRISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCGanesan Narayanasamy
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
GPGPU Computation
GPGPU ComputationGPGPU Computation
GPGPU Computationjtsagata
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_reportMichael Zhang
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Fisnik Kraja
 
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCarlos Reaño González
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelKoichi Shirahata
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stackHajime Tazaki
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rFerdinand Jamitzky
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiectureHaris456
 

Similar to QGATE 0.3: QUANTUM CIRCUIT SIMULATOR (20)

Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
 
IQM slide pitch deck
IQM slide pitch deckIQM slide pitch deck
IQM slide pitch deck
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9
 
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCRISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
 
GPGPU Computation
GPGPU ComputationGPGPU Computation
GPGPU Computation
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_report
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming Model
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
 

More from NVIDIA Japan

HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?NVIDIA Japan
 
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化NVIDIA Japan
 
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情NVIDIA Japan
 
20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdfNVIDIA Japan
 
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワークNVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワークNVIDIA Japan
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
HPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのHPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのNVIDIA Japan
 
Magnum IO GPUDirect Storage 最新情報
Magnum IO GPUDirect Storage 最新情報Magnum IO GPUDirect Storage 最新情報
Magnum IO GPUDirect Storage 最新情報NVIDIA Japan
 
データ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラデータ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラNVIDIA Japan
 
Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことNVIDIA Japan
 
GPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIAGPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIANVIDIA Japan
 
GTC November 2021 – テレコム関連アップデート サマリー
GTC November 2021 – テレコム関連アップデート サマリーGTC November 2021 – テレコム関連アップデート サマリー
GTC November 2021 – テレコム関連アップデート サマリーNVIDIA Japan
 
テレコムのビッグデータ解析 & AI サイバーセキュリティ
テレコムのビッグデータ解析 & AI サイバーセキュリティテレコムのビッグデータ解析 & AI サイバーセキュリティ
テレコムのビッグデータ解析 & AI サイバーセキュリティNVIDIA Japan
 
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~NVIDIA Japan
 
2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ
2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ
2020年10月29日 プロフェッショナルAI×RoboticsエンジニアへのロードマップNVIDIA Japan
 
2020年10月29日 Jetson活用によるAI教育
2020年10月29日 Jetson活用によるAI教育2020年10月29日 Jetson活用によるAI教育
2020年10月29日 Jetson活用によるAI教育NVIDIA Japan
 
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育NVIDIA Japan
 
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報NVIDIA Japan
 
Jetson Xavier NX クラウドネイティブをエッジに
Jetson Xavier NX クラウドネイティブをエッジにJetson Xavier NX クラウドネイティブをエッジに
Jetson Xavier NX クラウドネイティブをエッジにNVIDIA Japan
 
GTC 2020 発表内容まとめ
GTC 2020 発表内容まとめGTC 2020 発表内容まとめ
GTC 2020 発表内容まとめNVIDIA Japan
 

More from NVIDIA Japan (20)

HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?
 
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
 
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
 
20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf
 
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワークNVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
HPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなのHPC+AI ってよく聞くけど結局なんなの
HPC+AI ってよく聞くけど結局なんなの
 
Magnum IO GPUDirect Storage 最新情報
Magnum IO GPUDirect Storage 最新情報Magnum IO GPUDirect Storage 最新情報
Magnum IO GPUDirect Storage 最新情報
 
データ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラデータ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラ
 
Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないこと
 
GPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIAGPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIA
 
GTC November 2021 – テレコム関連アップデート サマリー
GTC November 2021 – テレコム関連アップデート サマリーGTC November 2021 – テレコム関連アップデート サマリー
GTC November 2021 – テレコム関連アップデート サマリー
 
テレコムのビッグデータ解析 & AI サイバーセキュリティ
テレコムのビッグデータ解析 & AI サイバーセキュリティテレコムのビッグデータ解析 & AI サイバーセキュリティ
テレコムのビッグデータ解析 & AI サイバーセキュリティ
 
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~
必見!絶対におすすめの通信業界セッション 5 つ ~秋の GTC 2020~
 
2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ
2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ
2020年10月29日 プロフェッショナルAI×Roboticsエンジニアへのロードマップ
 
2020年10月29日 Jetson活用によるAI教育
2020年10月29日 Jetson活用によるAI教育2020年10月29日 Jetson活用によるAI教育
2020年10月29日 Jetson活用によるAI教育
 
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育
2020年10月29日 Jetson Nano 2GBで始めるAI x Robotics教育
 
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報
COVID-19 研究・対策に活用可能な NVIDIA ソフトウェアと関連情報
 
Jetson Xavier NX クラウドネイティブをエッジに
Jetson Xavier NX クラウドネイティブをエッジにJetson Xavier NX クラウドネイティブをエッジに
Jetson Xavier NX クラウドネイティブをエッジに
 
GTC 2020 発表内容まとめ
GTC 2020 発表内容まとめGTC 2020 発表内容まとめ
GTC 2020 発表内容まとめ
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

QGATE 0.3: QUANTUM CIRCUIT SIMULATOR

  • 1. Shinya Morino, Sr. Solution Architect, NVIDIA, 2/14/2020 QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
  • 2. 2 NVIDIA AI TECHNOLOGY CENTER (NVAITC) Catalyse AI transformation through Research-Centric Integrated Engagements Singapore (AP HQ) Taiwan China Australia Hong Kong Luxembourg Established Aug 2015 in Singapore Collaboration Footprint: Singapore. ASEAN. Taiwan. China. Hong Kong. Australia. Europe. Thailand London Indonesia
  • 3. 3 QUANTUM COMPUTING Qubit (Quantum bit): - The basic unit of quantum computers. - Qubits are represented as a linear superposition of two basis states, |0> and |1>. ۧ|𝜓 = 𝛼 ۧ|0 + 𝛽 ۧ|1 𝛼 2 + 𝛽 2 = 1 - |0> or|1> is observed by measurement. Observation probabilities of |0> and |1> are 𝛼 2 and 𝛽 2 respectively. Qubit ۧ|0 = cos 𝜃 2 , ۧ|1 = 𝑒 𝑖𝜙 sin 𝜃 2
  • 4. 4 QUANTUM COMPUTING Quantum circuits consist with qubits and quantum logic gates. - With N qubits, 2N states can be represented (if entangled). - One quantum state corresponds to one complex number. Ex. With 53 qubits, 253 ( 10 Peta) states can be represented. Quantum states are controlled by using quantum logic gates. - Applying one gate can change 2N qubit states at the same time. - Developing quantum circuits is the programming for quantum computing Quantum circuit H H H H
  • 5. 5 QUANTUM CIRCUIT SIMULATION State vector - Quantum states are expended to a vector of complex numbers - Vector size is 2N for N-qubit circuits. - Each bit in index is corresponding to one qubit. Quantum states and state vector 𝑠0 𝑠1 𝑠2 ⋮ 𝑠2 𝑁 −2 𝑠2 𝑁 −1 ۧ|0 … 00 ۧ|0 … 01 ۧ|0 … 10 ⋮ ۧ|1 … 10 ۧ|1 … 11 index of state vector Quantum state (complex number) q0q1qN-1 … Qubits
  • 6. 6 Represented as a 2x2 unitary matrix Applying quantum gate to a state vector. QUANTUM CIRCUIT SIMULATION Quantum Logic Gate U 𝑈 = 𝑢00 𝑢01 𝑢10 𝑢11 𝑠𝑖+1,| ۧ…𝟎… 𝑠𝑖+1,| ۧ…𝟏… = 𝑈 𝑠𝑖,| ۧ…𝟎… 𝑠𝑖,| ۧ…𝟏… Gate U = 1 0 0 1 0 0 0 0 0 0 0 0 u00 u01 u10 u11 U Control Target Gate is applied when controlling gbit is |1>. Control gates can make qubits entangled. 𝑠𝑖+1,| ۧ…𝟏…𝟎… 𝑠𝑖+1,| ۧ…𝟏…𝟏… = 𝑈 𝑠𝑖,| ۧ…𝟏…𝟎… 𝑠𝑖,| ۧ…𝟏…𝟏…
  • 7. 7 It’s said … “Number of qubits” is the limitation, because vast amount of memory proportional to 2N, is required for simulations. PROBLEM DEFINITION Quantum circuit simulator is an essential tool to develop quantum circuits, but there’re limitations: But actual issue as of today is: “Simulation is very slow.” Needing long time for debugging and verifying quantum circuits
  • 8. 8 QUANTUM CIRCUIT EXAMPLES Circuit # qubits # gates Capacity of State vector Estimated simulation time Python*1 (CPU 1core) CPU*2 (multi-core) Quantum Volume*3 (width 32, depth 32) 32 5,120 64 GB 2 days 3 hours iQFT *4 (Ex: 32 qubits) 32 560 64 GB 3 hours 13 min Modulo operation ( 5n mod 12 ) 27 5,449 2 GB 45 min 3 min *1: Simulation with 1 cpu core. *2: Assuming 55 GB/sec of CPU memory bandwidth with naïve simulation algorithm. *3: https://github.com/Qiskit/openqasm/blob/master/benchmarks/quantum_volume/quantum_volume_n32_d32.qasm, *4: iQFT, Inversed Quantum Fourier Transform,
  • 10. 11 QGATE DESIGN CONCEPT 1. Easy development of quantum circuits with fast simulations for experiments Rich built-in gate set to quickly develop circuits Utilizing modern computing devices for performance 2. Single node, Multi GPU (multi devices) Utilizing a big server with a huge amount of memory. Focusing on performance. No intra-node communication. 3. Works as backends of other SDKs Simulations can be accelerated on Blueqat, various SDKs.
  • 11. 12 1. EASY DEVELOPMENT OF QUANTUM CIRCUITS Rich built-in gate set - Multi-bit-controlled gates, such as Toffoli gate is included in built-in gate set - Adjoint for all gates All qubits are fully connected IBM’s OpenQASM gate set is also supported
  • 12. 13 BUILT-IN OPERATORS Quantum logic gate Symbol Identity I Hadamard gate H Pauli gates and their rotations X, Y, Z, Rx(theta), Ry(theta), Rz(theta) Exponential of identity and Pauli gates Exp(I, X, Y, Z) Global phase Exp(theta) Phase shift gates P(theta), T, S Measurement, Probability Measure(qubit), Prob(qubit) Extensions OpenQASM’s U gates U3, U2, U1 Multi qubit measurement Measure(pauli gates)
  • 13. 14 UTILIZING MODERN COMPUTING DEVICES FOR PERFORMANCE Tesla V100 (SXM2) 7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS 32 GB HBM2 @ 900GB/s | 300GB/s NVLink GPU CPU CPU runtime is also implemented. (Utilizing multi cores in one CPU socket)
  • 14. 15 TARGET HARDWARE Requirement: - Quantum circuit simulations need a huge amount of memory - Performance is important as well. DGX-2 - 512 GB of GPU memory in 16 Tesla V100 - By using NVLink, all memories in GPUs are in one address space. NVIDIA DGX-2
  • 15. 16 DGX-2 All GPUs are sharing a single address space. All-to-all connections by NVLink (300 GB/sec, bidirectional) - 512 GB of ultra-fast memory is available - FP32: 35 qubits FP64: 34 qubits 16 NVIDIA High-end GPUs + NVLink2
  • 16. 17 At a Glance GPUs 4x NVIDIA® Tesla® V100 TFLOPS (GPU FP16) 500 GPU Memory 32 GB per GPU NVIDIA Tensor Cores 2,560 (total) NVIDIA CUDA Cores 20,480 (total) CPU Intel Xeon E5-2698 v4 2.2 GHz (20-core) System Memory 256 GB LRDIMM DDR4 Storage Data: 3 x 1.92 TB SSD RAID 0 OS: 1 x 1.92 TB SSD Network Dual 10 Gb LAN Display 3x DisplayPort, 4K Resolution Acoustics < 35 dB Maximum Power Requirements 1500 W Operating Temperature Range 10 - 30 oC Software Ubuntu Desktop Linux OS DGX Recommended GPU Driver CUDA Toolkit 17 NVIDIA DGX STATION
  • 17. 18 DGX STATION NVLINK NETWORK TOPOLOGY For Efficient Application Scaling NVIDIA NVLink Bridge - Four NVIDIA Tesla V100 accelerators - Each Tesla V100 GPU in DGX Station has four NVLink connection points, each providing a point- to-point connection to another GPU at a peak bandwidth of 25GB/s - Optimized for: - The bandwidth achievable for a variety of point- to-point and collective communications primitives - The flexibility of the topology - Performance with a subset of the GPUs
  • 18. 19 GPU REQUIREMENT Qgate runs with a single GPU, and scales to multiple GPUs in a single node. - Works with Kepler GPU (Cc3.5) or later. Recommendation is Maxwell GPU (Cc5.0) or later. Multi GPU requirement - NVLink : All-to-all NVLink connections between GPUs are required. For performance, NVLink is strongly recommended. - PCIe: All GPUs should be connected to the same PCIe root complex. CPU - Running with 1 CPU socket is supported. There’s no consideration for NUMA.
  • 19. 20 PERFORMANCE MEASUREMENT Quantum circuit for measurement - 10 Hadamard gates are placed on each qubit. - FP64 is used. Baseline, Single GPU Performance H H H H H H H H H H H H ... ... ... Device CPU (1 core)*1 Single thread on CPU CPU (multi-core) Multi-threaded*2 on CPU (40 threads, 20 physical cores) GPU GPU / CUDA 10 Hadamard gates *1: CPU(1 core) is a model of python-based simulator which is sometimes implemented by using 1 CPU core. *2: Implemented by using C++ STL’s thread class
  • 20. 21 SUMMARY Performance Baseline (30 qubits, Single GPU) # gates applied in sec. Memory bandwidth Acc. CPU (1 core) 0.11 3.7 GB/sec 1 - CPU (multi-core) 1.59 54.8 GB/sec 14.9x 1 GPU 23.5 806 GB/sec 220x 14.7x
  • 22. 23 PROCESSING PIPELINE Built with Python and Native Extensions Gate cancellation Runtime Removing cancelling gates Dynamically grouping qubits, Reducing number of variables required to represent quantum states Reordering operators (gates and measurements) in order to maximize effects of dynamic qubit grouping. Parallelization on computing devices CPU(multi-core), and GPU(CUDA) Python Input (Intermediate repr.) Native extension Output (state vector) Operator reordering Dynamic qubit grouping Quantum computing specific Device specific Reordering qubits to reduce data transfer between devices.Qubit reordering
  • 23. 24 Backend SOFTWARE DIAGRAM qgate.model Quantum circuit object model Built-in gate definitions qgate.simulator.runtimeqgate.simulator Simulator qgate.script Circuit definition on python qgate.openqasm Importing OpenQASM files qgate.simulator.qubits State vector Complex number probability Other plugins … Frontend Plugin Blueqat plugin qgate pyruntime: Python, reference cpuruntime: CPU, multi-core cudaruntime: CUDA, GPU OM (object model) Analyses and optimizations for quantum circuits Runtime Accelerating numerical operations
  • 24. 25 Products of some gate pairs cancel out 𝐼 = 𝑋 ∙ 𝑋 = 𝑌 ∙ 𝑌 = 𝑍 ∙ 𝑍 = 𝐻 ∙ 𝐻 GATE CANCELLATION Quantum Circuit Optimization U U U X U: Arbitrary unitary gate X U X XX Ex: Modulo arithmetic* (5^x mod 12, 27 qubits)Cancel out Cancel out *This circuit was developed by Kato-san in MDR. Ref: V. Vedral, A. Barenco, A. Ekert, https://arxiv.org/abs/quant-ph/9511018v1 Item Value Before cancellation 5449 gates After cancellation 3885 gates Reduction rate 71.3 % Also works for pairs of Y, Z, H gates whose squares are Identity.
  • 25. 26 DYNAMIC QUBIT GROUPING If qubits are not entangled, - State vector can be factorized. - Reducing number of variables. ۧ𝑠0|000 ۧ𝑠1|001 ۧ𝑠2|010 ۧ𝑠3|011 ۧ𝑠4|100 ۧ𝑠5|101 ۧ𝑠6|110 ۧ𝑠7|111 If 1 qubit is not entangled, ۧ𝑠10|00 ۧ𝑠11|01 ۧ𝑠12|10 ۧ𝑠13|11 ۧ𝑠00|0 ۧ𝑠01|1 ⨂ 3 qubit state vector Size: 8 Size: 6 = (2 + 4) 1 qubit 2 qubits
  • 26. 27 ۧ𝑠0|0 … 00 ۧ𝑠1|0 … 01 ۧ𝑠2|0 … 10 ۧ𝑠220 −2|1 … 10 ۧ𝑠220 −1|1 … 11 ۧ𝑠0|0 … 00 ۧ𝑠1|0 … 01 ۧ𝑠210 −1|1 … 11 ۧ𝑠0|0 … 00 ۧ𝑠1|0 … 01 ۧ𝑠2|0 … 10 ۧ𝑠230 −3|1 … 01 ۧ𝑠230 −2|1 … 10 ۧ𝑠230 −1|1 … 11 DYNAMIC QUBIT GROUPING 30 qubit case If qubits are divided to 10- and 20-qubit groups. ⨂ 30 qubit state vector Size: 230 Size: 220 + 210 ( 0.1 %) 10 qubits 20 qubits … … …
  • 27. 28 EX. INVERSED QUANTUM FOURIER TRANSFORM R1 R1 H R2 R3 H R1R2 H R3 R1R2 HR4 # Variables 10 (2x5) 10 (22 + 2x3) 12 (23 + 2x2) 18 (24 + 2) 32 (25) H Qubits are grouped when a controlled gate applied.
  • 28. 29 EFFECTS OF DYNAMIC QUBIT GROUPING Calculation amount reduced by applying qubit grouping. iQFT, Numerical Estimation 1.0E+00 1.0E+01 1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07 1.0E+08 1.0E+09 1.0E+10 1.0E+11 1.0E+12 0.0E+00 2.0E-01 4.0E-01 6.0E-01 8.0E-01 1.0E+00 0 4 8 12 16 20 24 28 32 w/o Qubit grouping w/ Qubit grouping Reduction ratio Ratioofcalculationamount (Qubitgroupingenabled/disabled) CalculationAmount Log axis. 12.1 % at 30 qubits. # qubits
  • 29. 30 CALCULATION AMOUNT COMPARISON In the range where # qubits is small, - Processing overheads are observed. In the range where # qubits is big, - Computation time is enough long, and overhead is relatively small. - Estimation and measurement matched. Observed overhead - Time for analyzing quantum circuit - Managing grouped state vectors. CUIDA/CPU/Theoretical 0 0.2 0.4 0.6 0.8 1 1.2 1.4 8 12 16 20 24 28 32 # Qubits Reductionratio Processing overheads observed Performance improved as expected CUDA CPU(multi core) Theoretical
  • 30. 31 OPERATOR REORDERING Maximizing effects of dynamic qubit grouping - Reordering operators into a smaller qubit group - Reducing amount of calculation. U0 U1 U3 U4 U2 U0 U1 U3 U4 U2
  • 31. 32 BENCHMARK One of the most important algorithms of quantum computing - Shor’s algorithm Used for order-finding problem (https://en.wikipedia.org/wiki/Shor%27s_algorithm) - Quantum chemistry Used for obtaining matrix eigen values Phase Estimation
  • 32. 33 PHASE ESTIMATION Without Operator Reordering R1 R1 H R2 R3 H R1R2 H R3 R1R2 HR4 H U16 U8 U4 U2 U
  • 33. 34 PHASE ESTIMATION Operators are Reordered R1 R1 H R2 R3 H R1R2 H R3 R1R2 HR4 H U16 U8 U4 U2 U
  • 34. 35 PHASE ESTIMATION 30 qubit circuit, 493 gates, FP64 - Measuring global phase of one qubit. - 29 qubits are used for measurements. - Running on a single Tesla V100 (32 GB) Benchmark exp(i 2n-1q) exp(i 2n-2q) exp(i q) … … 29qubits iQFT
  • 35. 36 AN EXAMPLE OF CALCULATION RESULTS 1024 shots of sampling. The initial value is 0.1 The initial value is 0.1. Raw sampling results. (0.09999997168779373, 1) (0.09999998286366463, 1) (0.09999998472630978, 1) (0.09999999031424522, 1) (0.09999999217689037, 1) (0.09999999403953552, 4) (0.09999999590218067, 4) (0.09999999776482582, 26) (0.09999999962747097, 900) (0.10000000149011612, 57) (0.10000000335276127, 17) (0.10000000521540642, 7) (0.10000000707805157, 1) (0.10000000894069672, 1) (0.10000001080334187, 1) (0.10000001639127731, 1)
  • 36. 37 PHASE ESTIMATION 30 qubit circuit, 493 gates, FP64 - Measuring global phase of one qubit. - 29 qubits are used for measurements. Operator Reordering, Single GPU Runtime/ optimization Elapsed time [s] Acceleration CPU / no optimization 213 1 CPU / optimized 24.7 8.6x CUDA / no optimization 13.7 15.5x CUDA / optimized 1.86 114x exp(i 2n-1q) exp(i 2n-2q) exp(i q) … … 29 qubitsiQFT
  • 37. 38 MULTI GPU + NVLINK
  • 38. 39 IDEAL MULTI GPU PERFORMANCE Performance Baseline (30 qubits, Single GPU) # gates applied in sec. Memory bandwidth Acc. CPU (1 core) 0.11 3.7 GB/sec 1 - CPU (multi- core) 1.59 54.8 GB/sec 14.9x 1 GPU 23.5 806 GB/sec 220x 14.7x 58.8 = 14.7 x 4 GPUs (DGX Station)
  • 39. 40 BOTTLENECK : DATA TRANSFER Ex. DGX Station NVLink is fast, but slower than GPU memory. 100 GB/s 100 GB/s 50 GB/s50 GB/s 50 GB/s 50 GB/s 900 GB/s 900 GB/s 900 GB/s900 GB/s Bandwidth GPU 900 GB/s NVLink (1 Link, bidirectional) 50 GB/s
  • 40. 41 QUBIT REORDERING Applying gates to q0 ~ q3 is done in each GPU. When q4, q5 are included in target qubits, data transfers between GPUs happen. Multi GPU, Reducing Data Transfers Ex) q0 q1 q2 q3 q4 q5 Gates are applied in each GPU Data transfers between GPUs happen for each gate application. Ref: 0.5 Petabyte Simulation of a 45-Qubit Quantum Circuit, Thomas Häner, Damian S.Steiger, https://arxiv.org/abs/1704.01127
  • 41. 42 QUBIT REORDERING Reordering qubits - Swapping q0 ~ q2 and q3 ~ q5. - All required inter-device communications are done during reordering qubits. - All gates are applied in each GPU. Multi GPU, Reducing Data Transfers Ex) Gates are applied in each GPU Data transfers between GPUs happen only here. Reorderingqubits q0 q1 q2 q3 q4 q5 q3 q4 q5 q0 q1 q2 Gates are applied in each GPU
  • 42. 43 BENCHMARK https://github.com/Qiskit/openqasm/blob/master/benchmarks/quantum_volume/quantum_volume_n32_d32.qasm 32 qubit circuit, 5120 gates, FP64 Hardware: NVIDIA DGX Station. CPU: Xeon E5-2698 v4 2.2 GHz, GPU Tesla V100 x 4 Quantum Volume(n=32, d=32), FP64, DGX Station (4 GPUs) Runtime Optimization Elapsed time Acc. CPU No optimization 3.1 hours - CUDA, 4 Tesla V100 No optimization 370 sec 29.7 x + Qubit reordering* 318 sec 56.7 x + Qubit grouping + Operator reordering 176 sec 62.5 x *: Qubits are reordered for 10 times during execution of the whole circuit.
  • 43. 44 BENCHMARK 32 qubit circuit, 558 gates, FP64 Hardware: NVIDIA DGX Station. CPU: Xeon E5-2698 v4 2.2 GHz, GPU Tesla V100 x 4 Phase estimation, 32 qubit circuit Runtime Optimization Elapsed time Acc. CPU No optimization 774 sec - CUDA, 4 Tesla V100 No optimization 18.4 sec 42 x + Qubit reordering* 15.4 sec 50 x + Qubit grouping + Operator reordering 3.2 sec 242 x *: Qubits are reordered for 8 times during execution of the whole circuit.
  • 44. 45 PLANS FOR THE NEXT VERSION • Supporting hyper-cube-mesh topology. • Fully utilizing 8 GPUs on servers such as DGX-1 and AWS p3dn.24xlarge instance • Enabling to run 33 qubit circuit(float64). • Acceleration for GPU kernels. • Qgate 0.3 implements naïve GPU kernels to apply gates, not optimized yet.