The document discusses quantum algorithms for pattern matching in genomic sequences. It begins with an overview of the presentation topics, including classical approaches to genomic sequence analysis, sub-sequence index search, and using a quantum accelerator. It then provides background on quantum computing concepts like Grover's algorithm and discusses how it could be applied to sub-sequence search through a conditional oracle and OpenQL kernels. The document considers the potential for quantum algorithms to evolve genomic analysis, including through unitary decomposition and using ancilla qubits.
Quantum algorithms for pattern matching in genomic sequences - 2018-06-22
1. Quantum Algorithms
for pattern-matching
in genomic sequences
22-06-2018
Aritra Sarkar
M.Sc. Thesis Project
Quantum Computer Architecture Lab, QuTech
Department of Quantum & Computer Engineering
Faculty of Electrical Engineering, Mathematics and Computer Sciences
Delft University of Technology
2. 2
Presentation overview
● Code of life
● WGS pipeline
● What it is (not)?
● Genomical big data
● Classical approaches
● Sub-sequence index search
● Quantum accelerator
● Searching solutions
● Quantum 101
● Grover search
● Q-search (de-)motivation
● Conditional oracle
● OpenQL kernels
● Q phone directory
● Q associativememory
● Evolution
● Unitary decomposition
● Borrow ing ancilla
● Testing
● IDE w ith circuit designer
● iBAM
● QiBAM
● Algorithm complexity
● Related applications
● Looking |𝑏𝑎𝑐𝑘⟩+|𝑎ℎ𝑒𝑎𝑑⟩
Why? What?
How Quantumly?
Existing ways
Thesis contribution
Into the future
3. 3
Code of life
high sequence similarity usually implies significant functional or structural similarity
Genetic Similarity %
Other humans 99.9
Chimpanzees 98.6
Mouse 92
Cats 90
Cows 85
Dogs 84
Zebra-fish 73
Chicken 65
Banana 60
Honey bee 44
Grapes 24
Yeast 18
E. Coli 7
+ 97% Biological Dark
Matter
Expression
Replication
Metabolism
Reproduction
5. 5
What it is (not)?
Quantum
Biology
- “if evolution is smart enough to create a creature who understands QM, it must be using it for itself”
naturally occurring QM phenomena advantages, not necessarily for Computational purpose
e.g. photosynthesis, navigation in birds, neurons firing, … (sense of smell, emotions, past life, etc.…. keeps getting weirder)
Quantum
Genomics
Quantum-mechanical
Sequencing
Quantum-accelerated
Analysis
Sequencing
Gen2
NGS
Gen3
SMS
Gen1
Illumina
Roche 454
~100 bp
parallelism
high yield
Sanger
~1000 bp
PacificBiosciences
Oxford Nanopore
~10000 bp
Overlap
Layout
Consensus
Pairwise
alignment
de Bruijn
k-mer
Analysis
Sorting
Deduplication
Variant
Calling
Reconstruction
De novoAb initio
(reference-based)
alignment/mapping
(reference-free)
assembly
Exact Heuristic
Approximate
Optimal
9. 9
…
for each shortread in sample:
do:
find index in referencegenome
assess answer
while (resultnot satisfactory)
saveshortread matched index
reconstructsequenced genome
…
Quantum accelerator
QASM
Simulator
multi-qubit regime target algorithm
current techs. have ~50 physical qubits
current Q Processor designs are not well scalable
exponentially difficult to simulate qubits
large planar topology yet to be implemented
full connectivity to specific topology can be compiled
number of gates related to total decoherence of result
gate fidelity guarantee with QEC codes
universal set to allow full domain exploration
Unlimited
Qubits
Unlimited
Gates
space complexity is a
critical design parameter
~ 50 bound for
feasible QX simulation
full connectivity
(complete graph)
time complexity is a
critical design parameter
Gate Fidelity = 1 (no errors)
available gates
(σX/Y/Z, H, CX, CZ, Rθ, Toffoli)
14. 14
• Compile once,run many
• Oracle independentof search pattern
Q-search (de-)motivation
Initialise Oracle|0⟩⊗n A.A. index
P
Depends on T
Grover iterations
P. Mateus, Quantum Pattern Matching, Institute Superior Técnico, Aug. 2005, arXiv:quant-ph/0508237 v1,pp.1-5.
17. 17
• Σ ≔ 𝐴, 𝐶, 𝐺, 𝑇 = 0,1,2,3 = {00,01,10,11}
– 𝐴 = Σ = 4
• 𝑆𝑅 = 𝐺𝐴𝑇 = 100011
– 𝑚 = 𝑆𝑅 = 3
• 𝑅𝐺 = 𝐺𝑇𝐴𝐺𝐴𝑇𝐶𝐴𝐺𝐴 = 10110010001101001000
– 𝑁 = 𝑅𝐺 = 10
• Evolve data register to Hamming distance
• Amplify zero Hamming Distance
– Fixed oracle
• Measure Tag qubits to get match index
Q phone directory
TAG DATA DIST
000 101100 001111
001 110010 010001
010 001000 101011
011 100011 000000
100 001101 101110
101 110100 010111
110 010010 110001
111 001000 101011
10110010001101001000
𝑄𝑡𝑎𝑔
𝑄 𝑑𝑎𝑡𝑎
X
X
X
𝑄𝑡𝑎𝑔
𝑄 𝑑𝑎𝑡𝑎
𝑄𝑡𝑎𝑔
𝑄 𝑑𝑎𝑡𝑎
18. 18
Q associative memory
D. Ventura et al., Quantum AssociativeMemory, Information Sciences 123, 2000, pp. 273-296.
Store T Recall P|0⟩⊗n
Depends on T Depends on
partial/approx.P
P
Machine
Learning
Pattern
Recognition
Searching
Q
Q
Q
19. 19
Evolution
Tight bounds on
quantum searching
… arbitraryinitial
amplitude distribution
QuantumPattern
Matching
GroverSearch one solution
full, uniform
database
known Oraclefor
solutionin database
optimaliterations
multiple(un)known
solutions
full, uniform
database
known Oraclefor
solutionin database
optimaliterations
multipleknown
solutions
arbitrary database
known Oraclefor
solutionin database
optimaliterations
multipleunknown
solutions
sliding index
database
alphabet based
Oracles
optimaliterations
one solution
sub-string
phonebook
0 Hamming
DistanceOracle
optimaliterations
… Quantum
Bioinformatics
QuantumAssociative
Memory
multipleknown
solutions
arbitrary database
known Oraclefor
solutionin database
higher Pmax
iteration
… associative memory
with distributedqueries
multipleknown
solutions
arbitrary database BinomialOracle optimaliterations
… improveddistributed
queries
multipleunknown
solutions
arbitrary database BinomialOracle
higher Pmax
iteration
Gen 1
(tested)
QUS
Gen 2
(tested)
QPM
Gen 3
(tested)
QNN
Q Walk/ GraphSearchQ Unstructured Search Q StructuredSearch HSP (abelian/dihedral)
33. 33
Quantum kernels
• Arbitrary Boolean function
• If state of an index to be marked
– Take Boolean value of Index
– Apply CPhase on all s’ qubits
– X Control on qubits with value = 0
• 111000000
• Sequential copy and increment
• Grover Gate on all s’*M
qubits
• Inversion about Mean
• Amplitude Amplification
Initialize Make/Call Oracles Inversion about mean
34. 34
DNA strings
• 4 Oracles:A, T, G, C
• Qubit complexitylinear in Alphabet size
• Oracles will (typically) be less complexfor higher alphabet sizes
– Mark ~ 1/4th states instead of ~1/2
• Algorithm is more robust for higherAlphabet sizes
– Less possibility of 1 character dominating > 50% of the string
• Finding “13” in “22013230”
– 10 qubits
– 37 h, 38 x, 9 cnot, 33 toffoli gates
35. 35
• States
– −0.366457,+0.000000 0000000110
– (+0.349626,+0.000000) next largest stages
• Circuit
– 10 qubits
– 69 H +117 C0X + 280 C2X
Results
Hollenberg, L.C., 2000.Fast quantum search algorithms in protein sequence comparisons:Quantumbioinformatics. Physical ReviewE, 62(5), p.7532.
Initialise
QPD
Oracle|0⟩⊗n A.A. index
Depends on T
Evolve to
Hamming
distances
Depends on P
Grover iterations