2. 2
RelationalAI
Tier 1 firms in Silicon Valley,
NYC, and Seattle
170+ people
120+ computer scientists
2,000+ publications with
100K+ citations and
37 awards and counting
Founded in 2017, Cloud-native
Just out of stealth with enterprise customers
Self-service availability in 2023
Leadership with 6 successful exits
Active board of directors includes
Bob Muglia, former CEO of Snowflake
Team with broad expertise
Databases, languages, and algorithms to
machine learning and operations research
$122M in Funding
Berkeley HQ but 100%
Remote
50+ PhDs
5. 5
What is Relational AI?
Easier
Modeling
Faster
Queries
Larger
Scale
A Cloud Native Database
6. 6
What is Relational AI?
A Cloud Native Database
Narrow Tables
No Nulls
No Duplicates
More Indexes
One per Column
Free Composites
More Joins
Worst Case
Optimal
Semantic Optimizer
It knows Math
Recursion
Generic Queries
Limitless Language
Stepwise Definitions
Embedded Logic
Demand Driven
Incremental Computation
Immutable Data
Time Travel
Infinitely Scalable
On the Cloud
7. Key Differences
9. Data and Model are Together
a. 10-100x less code
b. Higher Quality
10. Compute in Increments
11. Compute on Demand
12. Immutable Data
a. No Read Locks
b. Time-Travel
13. Cloud Native
a. Infinite Compute
b. Infinite Storage
1. Removed Nulls
2. Removed Duplicates
3. Key-Value or Key-Key Tables
4. Dynamic Composite Indexes
5. More Joins = Slower Faster
6. Semantic Query Optimizer
knows more Math than you
7. Knows Recursion
8. Rel Query Language:
a. Lego Blocks
b. Type System
c. Declarative, Logical,
Relational
12. What’s wrong with Null?
• (a and NOT(a)) != True
• Aggregation requires special
cases
• Outer Joins are not commutative
a x b != b x a
SELECT *
FROM parts
WHERE (price <= 99) OR (price > 99)
SELECT *
FROM parts
WHERE (price <= 99) OR (price > 99) OR isNull(price)
SELECT AVG(height)
FROM parts
SELECT orders.id, parts.id
FROM orders LEFT OUTER JOIN
parts ON parts.id = orders.part_id
SELECT orders.id, parts.id
FROM parts LEFT OUTER JOIN
orders ON parts.id = orders.part_id
14. Why do we have to lose the bags?
Queries that use only ANDs (no ORs)
are called “conjunctive queries”
Conjunctive Queries under Set
Semantics are MUCH Easier to
Optimize
Lots of Math can be defined by Sets
Logicomix
A graphic novel about the search for the foundations of mathematics
Sets: {1,2,3}, {3,4,8}
Bags: {1,2,2,3}, {3, 3, 5, 5}
Sets have Unique Values
Bags allow Duplicate Values
25. Problem with Joins All of NoSQL is because
of this
Table 1
ID
0
1
3
4
5
6
7
8
9
11
Table 2
ID
0
2
6
7
8
9
Table 3
ID
2
4
5
8
10
Results
Table 1 Table 2 Table 3
8 8 8
Intermediate Results
Table1 and Table 2
0
6
7
8
9
26. Worst Case Optimal Joins
• Worst-Case Optimal Join Algorithms: Techniques, Results, and
Open Problems. Ngo. (Gems of PODS 2018)
• Worst-Case Optimal Join Algorithms: Techniques, Results, and
Open Problems. Ngo, Porat, Re, Rudra. (Journal of the ACM 2018)
• What do Shannon-type inequalities, submodular width, and
disjunctive datalog have to do with one another? Abo Khamis, Ngo,
Suciu, (PODS 2017 - Invited to Journal of ACM)
• Computing Join Queries with Functional Dependencies. Abo
Khamis, Ngo, Suciu. (PODS 2017)
• Joins via Geometric Resolutions: Worst-case and Beyond. Abo
Khamis, Ngo, Re, Rudra. (PODS 2015, Invited to TODS 2015)
• Beyond Worst-Case Analysis for Joins with Minesweeper. Abo
Khamis, Ngo, Re, Rudra. (PODS 2014)
• Leapfrog Triejoin: A Simple Worst-Case Optimal Join Algorithm.
Veldhuizen (ICDT 2014 - Best Newcomer)
• Skew Strikes Back: New Developments in the Theory of Join
Algorithms. Ngo, Re, Rudra. (Invited to SIGMOD Record 2013)
• Worst Case Optimal Join Algorithms. Ngo, Porat, Re, Rudra. (PODS
2012 – Best Paper)
28. More than 3 Tables
Worst-Case Optimal Joins take advantage of sorted keys and gaps in the data to
eliminate intermediate results, speed up queries and get rid of the Join problem.
m
a
14
Brand
Category
Retailer
Rating
p
o
n
b
7) seek m
6) seek m
3) seek f
5) seek m
4) seek
g
2) seek c
1) seek c
c d e f g
34. Math
1 + (2 + 3) = (1 + 2) + 3
3 + 4 = 4 + 3
3 + 0 = 3
1 + (-1) = 0
2 x (3 x 4) = (2 x 3) x 4
2 x 5 = 5 x 2
2 x 1 = 2
2 x 0.5 = 1
2 x (3 + 4) = (2 x 3) + (2 x 4)
(3 + 4) x 2 = (3 x 2) + (4 x 2)
35. Math
a + (b + c) = (a + b) + c
a + b = b + a
a + 0 = a
a + (-a) = 0
a x (b x c) = (a x b) x c
a x b = b x a
a x 1 = a
a x a-1 = 1, a != 0
a x (b + c) = (a x b) + (a x c)
(a + b) x c = (a x c) + (b x c)
36. Math
Addition:
Associativity:
a ⊕ (b ⊕ c) = (a ⊕ b) ⊕
c
Commutativity:
a ⊕ b = b ⊕ a
Identity: a ⊕ ō = a
Inverse: a ⊕ (-a) = ō
Multiplication
Associativity:
a ⊗ (b ⊗ c) = (a ⊗ b) ⊗ c
Commutativity:
a ⊗ b = b ⊗ a
Identity: a ⊗ ī = a
Inverse: a ⊗ a-1 = ī
Distribution of Multiplication over Addition:
a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c)
(a ⊕ b) ⊗ c = (a ⊗ c) ⊕ (b ⊗ c)
37. Example One
Query: find the count of the combined rows a, b, c in tables R, S and T
def result = count[a,b,c: R(a) and S(b) and T(c)]
Mathematical Representation:
41. Example One
Query: find the count of the combined rows a, b, c in tables R, S and T
def result = count[a,b,c: R(a) and S(b) and T(c)]
Mathematical Representation:
The original expression n^3 is much slower than the optimized 3n.
42. Example Two
Query: find the minimum sum of the combined rows a, b, c in tables R, S and T
def result = min[a,b,c,v: v= R[a] + S[b] + T[c]]
Mathematical Representation:
44. Example Two
Query: find the minimum sum of the combined rows a, b, c in tables R, S and T
def result = min[a,b,c,v: v= R[a] + S[b] + T[c]]
Optimized Query:
def result = min[R] + min[S] + min[T]
45. C
B D
A E F
1
2
9 4
6
3
5
AEF = 9 + 4 = 13
ABDF = 1 + 6 + 5 = 12
ABCDF = 1 + 2 + 3 + 5 = 11
min{13,12,11} = 11
Shortest Path
from A to F
46. C
B D
A E F
0.9
0.9
0.4 0.8
0.2
1.0
0.7
AEF = 0.4 x 0.8 = 0.32
ABDF = 0.9 x 0.2 x 0.7 = 0.126
ABCDF = 0.9 x 0.9 x 1.0 x 0.7 = 0.567
max{0.32,0.126,0.567} = 0.567
Maximum Reliability
from A to F
47. C
B D
A E F
T
I
A T
H
M
E
AEF = A · T = AT
ABDF = T · H · E = THE
ABCDF = T · I · M · E = TIME
union{at, the, time} = at the time
Words
from A to F
48. Math
min { (9 + 4), (1 + 6 + 5), ( 1 + 2 + 3 + 5 ) }
max { (0.4 x 0.8), (0.9 x 0.2 x 0.7), (0.9 x 0.9 x 1.0 x 0.7) }
union { (A · T), (T · H · E), (T · I · M · E) }
50. Example Three
Query: count the number of 3-hop paths per node in a graph
def path3(a, b, c, d) = edge(a,b) and edge(b,c) and edge(c,d)
def result[a] = count[path3[a]]
Mathematical Representation:
A B C D
52. Example Three
Query: count the number of 3-hop paths per node in a graph
def path3(a, b, c, d) = edge(a,b) and edge(b,c) and edge(c,d)
def result[a] = count[path3[a]]
Optimized Query:
def path1[c] = count[edge[c]]
def path2[b] = sum[path1[c] for c in edge[b]]
def result[a] = sum[path2[b] for b in edge[a]]
A B C D
53. Semantic Optimizer
Compute Discrete Fourier Transform in Fast Fourier Transform-time
Junction Tree Algorithm for inference in Probabilistic Graphical Models
Message passing, belief propagation
Viterbi Algorithm, forward/backward for Hidden Markov Models most probable
paths
Counting sub-graph patterns (motifs)
Yannakakis Algorithm for acyclic conjunctive queries in Polynomial Time
Fractional hypertree-width time algorithm for Constraint Satisfaction Problems
Best known results for Conjunctive Queries and Quantified Conjunctive
Queries
54. Math
This optimizer produces much better code than the average developer
because it knows a ton more math than the average developer.
Maryam Mirzakhani
Terence Tao
Ramanujan
Katherine Goble
Good Will Hunting
71. How do you get to the Moon?
We don’t have to reinvent fire to get to Mars
Why does your new SQL query start with a blank page?
72. Entities
// Cliques can be empty
entity Clique nil = true
// In order to be clique you must be added to the clique.
// It requires a new member to be connected to existing members
entity Clique add_to_clique(member, members) = connected(member, members)
// new_member must just be a node if there are no other members
def connected(new_member, members) = node(new_member) and nil(member)
// If there are other members in the clique then the new_member must be connected to all of them
def connected(new_member, members) =
neighbor(new_member, last_member)
and new_member > last_member
and connected(new_member, other_members)
and add_to_clique(last_member, other_members, members) from (last_member, other_members)
// Note: We can walk add_to_clique backwards!
77. Flight Model and Queries
Model
def cancelled(f) = flight(f) and flight_cancelled(f, "Y")
def diverted(f) = flight(f) and flight_diverted(f, "Y")
def arrived(f) = flight(f) and not (cancelled(f) or diverted(f))
def arrival_delay[f] = maximum[^Minute[0], arr_delay[f]]
ic forall(f in flight: cancelled(f) xor diverted(f) xor arrived(f))
ic forall(f in cancelled: not exists flight_time[f])
Use
count[cancelled]
count[f: cancelled(f) and operated_by(f, c)] for c in Carrier
ratio[cancelled, (f: destination(f,x))] for x in Airport
mean[arrival_delay]
mean[arrival_delay[f] for f where operated_by(f, c)]] for c in Carrier
78. Reasoning with the Data
Manage both your data and your business
logic together.
No more rewriting your logic procedurally
in languages like Java, C#, Python, Rust,
PL/SQL, T/SQL, etc.
10-100x Less Code
Much Higher Quality
80. Betweenness Centrality
One of many of graph centrality measures which are
useful for assessing the importance of a node.
High Level Definition: Number of times a node appears on
shortest paths within a network
Why it’s Useful: Identify which nodes control information
flow between different areas of the graph; also called
“Bridge Nodes”
Business Use-Cases:
Communication Analysis: Identify important people
which communicate across different groups
Retail Purchase Analysis: Which products introduce
customers to new categories
81. Betweenness Centrality
Brandes Algorithm is applied as follows:
1. For each pair of nodes, compute all
shortest paths and capture nodes (less
endpoints) on said path(s)
2. For each pair of nodes, assign each node
along path a value of one if there is only
one shortest path, or the fractional
contribution (1/n) if n shortest paths
3. Sum the value from step 2 for each node;
this is the Betweenness Centrality
82. Betweenness Centrality
// Shortest path between s and t when they are the same is 0.
def shortest_path[s, t] = Min[
v, w:
(shortest_path(s, t, w) and v = 1) or
(w = shortest_path[s,v] +1 and E(v, t))
]
// When s and t are the same, there is only one shortest path between
// them, namely the one with length 0.
def nb_shortest(s, t, n) = V(s) and V(t) and s = t and n = 1
// When s and t are *not* the same, it is the sum of the number of
// shortest paths between s and v for all the v's adjacent to t and
// on the shortest path between s and t.
def nb_shortest(s, t, n) =
s != t and
n = sum[v, m:
shortest_path[s, v] + 1 = shortest_path[s, t] and E(v, t) and
nb_shortest(s, v, m)
]
// sum over all t's such that there is an edge between v and t,
// and v is on the shortest path between s and t
def C[s, v] = sum[t, r:
E(v, t) and shortest_path[s, t] = shortest_path[s, v] + 1 and
(
a = C[s, t] or
not C(s, t, _) and a = 0.0
) and
r = (nb_shortest[s, v] / nb_shortest[s, t]) * (1 + a)
] from a
// Note that below we divide by 2 because we are double
counting every edge.
def betweenness_centrality_brandes[v] =
sum[s, p : s != v and C[s, v] = p]/2
83.
84. Betweenness Centrality Recomputation
Incremental updates to
data and recomputation of
Betweenness Centrality
takes only a few
seconds, whereas the
entire graph needs to be re-
computed in other systems.
87. 89
Eager maintenance is bad
Lazy maintenance is bad
detecting dirty computations is too
expensive when an output is queried.
Best: Eager invalidation
lazy evaluation
Inputs
Outputs
Inputs
Outputs
Inputs
Outputs
90. Data Storage and Memory Management
92
Scalable, durable object storage
Ephemeral SSD cache
RAM cache (buffer pool)
fetch and evict
evict
fetch
evict
commit
91. RAI databases are immutable, including the catalog
demo
key/value store with CAS
rel A
rel B
rel C
...
92. RAI databases are immutable, including the catalog
demo
key/value store with CAS
transaction
updates C
rel A
rel B
rel C
rel C'
...
93. RAI databases are immutable, including the catalog
demo
demo-2022-03-25
key/value store with CAS
transaction
updates C
rel A
rel B
rel C
rel C'
...
94. 96
Time Travel
● Run queries on your data as it “was”
in the past.
● Low cost time travel
● No need for a Flux Capacitor
96. RAI databases are immutable, including the catalog
demo-2022-03-25
demo
key/value store with CAS
rel A
rel B
rel C'
transaction
...
97. 99
Immutable Data
Strict Serializable Isolation
No Locks on Read Workloads
No Limits on Transaction
Duration
Cloning copies pointers not data
Low cost Time Travel/What If
Analysis
No Logs for Transactions
100. 102
So what is Relational AI?
It is your next database
Narrow Tables
No Nulls
No Duplicates
More Indexes
One per Column
Free Composites
More Joins
Worst Case
Optimal
Semantic Optimizer
It knows Math
Recursion
Generic Queries
Limitless Language
Stepwise Definitions
Embedded Logic
Demand Driven
Incremental Computation
Immutable Data
Time Travel
Infinitely Scalable
On the Cloud
101. 103
So what is Relational AI?
Easier
Modeling
Faster
Queries
Larger
Scale
It is your next database