SlideShare une entreprise Scribd logo
1  sur  101
RelationalAI, Inc. All rights reserved 2022
Introduction
August 2022
2
RelationalAI
Tier 1 firms in Silicon Valley,
NYC, and Seattle
170+ people
120+ computer scientists
2,000+ publications with
100K+ citations and
37 awards and counting
Founded in 2017, Cloud-native
Just out of stealth with enterprise customers
Self-service availability in 2023
Leadership with 6 successful exits
Active board of directors includes
Bob Muglia, former CEO of Snowflake
Team with broad expertise
Databases, languages, and algorithms to
machine learning and operations research
$122M in Funding
Berkeley HQ but 100%
Remote
50+ PhDs
RelationalAI, Inc. All rights reserved 2022
What is Relational AI?
5
What is Relational AI?
Easier
Modeling
Faster
Queries
Larger
Scale
A Cloud Native Database
6
What is Relational AI?
A Cloud Native Database
Narrow Tables
No Nulls
No Duplicates
More Indexes
One per Column
Free Composites
More Joins
Worst Case
Optimal
Semantic Optimizer
It knows Math
Recursion
Generic Queries
Limitless Language
Stepwise Definitions
Embedded Logic
Demand Driven
Incremental Computation
Immutable Data
Time Travel
Infinitely Scalable
On the Cloud
Key Differences
9. Data and Model are Together
a. 10-100x less code
b. Higher Quality
10. Compute in Increments
11. Compute on Demand
12. Immutable Data
a. No Read Locks
b. Time-Travel
13. Cloud Native
a. Infinite Compute
b. Infinite Storage
1. Removed Nulls
2. Removed Duplicates
3. Key-Value or Key-Key Tables
4. Dynamic Composite Indexes
5. More Joins = Slower Faster
6. Semantic Query Optimizer
knows more Math than you
7. Knows Recursion
8. Rel Query Language:
a. Lego Blocks
b. Type System
c. Declarative, Logical,
Relational
8
Fact:
Most people have
never tasted real
Wasabi
9
Fact:
Most people have
never used a
Relational
Database
Vision Reality
Relational Databases
Ted Codd - Inventor of the Relational Model
What’s wrong with Null?
• (a and NOT(a)) != True
• Aggregation requires special
cases
• Outer Joins are not commutative
a x b != b x a
SELECT *
FROM parts
WHERE (price <= 99) OR (price > 99)
SELECT *
FROM parts
WHERE (price <= 99) OR (price > 99) OR isNull(price)
SELECT AVG(height)
FROM parts
SELECT orders.id, parts.id
FROM orders LEFT OUTER JOIN
parts ON parts.id = orders.part_id
SELECT orders.id, parts.id
FROM parts LEFT OUTER JOIN
orders ON parts.id = orders.part_id
Ted Codd - Inventor of the Relational Model
Why do we have to lose the bags?
Queries that use only ANDs (no ORs)
are called “conjunctive queries”
Conjunctive Queries under Set
Semantics are MUCH Easier to
Optimize
Lots of Math can be defined by Sets
Logicomix
A graphic novel about the search for the foundations of mathematics
Sets: {1,2,3}, {3,4,8}
Bags: {1,2,2,3}, {3, 3, 5, 5}
Sets have Unique Values
Bags allow Duplicate Values
Ted Codd - Inventor of the Relational Model
How?
RelationalAI, Inc. All rights reserved 2022
Narrow Tables
Idea One
Graph Normal Form
Key-Value and Key-Key
3rd Normal Form
Graph Normal Form
RelationalAI, Inc. All rights reserved 2022
More Indexes
Idea Two
One Index per Column Order
Composite Index Explosion
Dual Index Narrow Tables => Dynamically Generated Composite Indexes
RelationalAI, Inc. All rights reserved 2022
More Joins
Idea Three Terrible
Problem with Joins All of NoSQL is because
of this
Table 1
ID
0
1
3
4
5
6
7
8
9
11
Table 2
ID
0
2
6
7
8
9
Table 3
ID
2
4
5
8
10
Results
Table 1 Table 2 Table 3
8 8 8
Intermediate Results
Table1 and Table 2
0
6
7
8
9
Worst Case Optimal Joins
• Worst-Case Optimal Join Algorithms: Techniques, Results, and
Open Problems. Ngo. (Gems of PODS 2018)
• Worst-Case Optimal Join Algorithms: Techniques, Results, and
Open Problems. Ngo, Porat, Re, Rudra. (Journal of the ACM 2018)
• What do Shannon-type inequalities, submodular width, and
disjunctive datalog have to do with one another? Abo Khamis, Ngo,
Suciu, (PODS 2017 - Invited to Journal of ACM)
• Computing Join Queries with Functional Dependencies. Abo
Khamis, Ngo, Suciu. (PODS 2017)
• Joins via Geometric Resolutions: Worst-case and Beyond. Abo
Khamis, Ngo, Re, Rudra. (PODS 2015, Invited to TODS 2015)
• Beyond Worst-Case Analysis for Joins with Minesweeper. Abo
Khamis, Ngo, Re, Rudra. (PODS 2014)
• Leapfrog Triejoin: A Simple Worst-Case Optimal Join Algorithm.
Veldhuizen (ICDT 2014 - Best Newcomer)
• Skew Strikes Back: New Developments in the Theory of Join
Algorithms. Ngo, Re, Rudra. (Invited to SIGMOD Record 2013)
• Worst Case Optimal Join Algorithms. Ngo, Porat, Re, Rudra. (PODS
2012 – Best Paper)
Worst Case Optimal Join
Table 1
ID
0
1
3
4
5
6
7
8
9
11
Table 2
ID
0
2
6
7
8
9
Table 3
ID
2
4
5
8
10
Table IDs Action
Table 1 Table 2 Table 3
0 0 2 Table 1: Seek 2
3 0 2 Table 2: Seek 3
3 6 2 Table 3: Seek 6
3 6 8 Table 1: Seek 8
8 6 8 Table 2: Seek 8
8 8 8 Emit, Table 3: Next
8 8 10 Table 1: Seek 10
11 8 10 Table 2: Seek 11 END
Results
Table 1 Table 2 Table 3
8 8 8
Start
End
Seek 2 Seek 3 Seek 6
Seek 8
Seek 10
Seek 8
Next
Seek 11
More than 3 Tables
Worst-Case Optimal Joins take advantage of sorted keys and gaps in the data to
eliminate intermediate results, speed up queries and get rid of the Join problem.
m
a
14
Brand
Category
Retailer
Rating
p
o
n
b
7) seek m
6) seek m
3) seek f
5) seek m
4) seek
g
2) seek c
1) seek c
c d e f g
RelationalAI, Inc. All rights reserved 2022
Semantic Optimizer
Idea Four
Traditional Query Optimizers
• Predicate pushdown (push selection through join)
• Projection pushdown (push projection through join)
• Aggregation pushdown
• Their “pull ups” counter parts
• Split conjunctive predicates (split AND statements)
• Replace cartesian products (use inner joins with predicates)
• (Un)Nesting Sub-Queries
• Etc.
Semantic Optimizer
Data Answer
Rel
model
Equivalent Rel
models
Knowledge
Semantic
Optimizer
Optimized
Rel model
Math
1 + (2 + 3) = (1 + 2) + 3
3 + 4 = 4 + 3
3 + 0 = 3
1 + (-1) = 0
2 x (3 x 4) = (2 x 3) x 4
2 x 5 = 5 x 2
2 x 1 = 2
2 x 0.5 = 1
2 x (3 + 4) = (2 x 3) + (2 x 4)
(3 + 4) x 2 = (3 x 2) + (4 x 2)
Math
a + (b + c) = (a + b) + c
a + b = b + a
a + 0 = a
a + (-a) = 0
a x (b x c) = (a x b) x c
a x b = b x a
a x 1 = a
a x a-1 = 1, a != 0
a x (b + c) = (a x b) + (a x c)
(a + b) x c = (a x c) + (b x c)
Math
Addition:
Associativity:
a ⊕ (b ⊕ c) = (a ⊕ b) ⊕
c
Commutativity:
a ⊕ b = b ⊕ a
Identity: a ⊕ ō = a
Inverse: a ⊕ (-a) = ō
Multiplication
Associativity:
a ⊗ (b ⊗ c) = (a ⊗ b) ⊗ c
Commutativity:
a ⊗ b = b ⊗ a
Identity: a ⊗ ī = a
Inverse: a ⊗ a-1 = ī
Distribution of Multiplication over Addition:
a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c)
(a ⊕ b) ⊗ c = (a ⊗ c) ⊕ (b ⊗ c)
Example One
Query: find the count of the combined rows a, b, c in tables R, S and T
def result = count[a,b,c: R(a) and S(b) and T(c)]
Mathematical Representation:
Math
Example One
Query: find the count of the combined rows a, b, c in tables R, S and T
Example One
Query: find the count of the combined rows a, b, c in tables R, S and T
Example One
Query: find the count of the combined rows a, b, c in tables R, S and T
def result = count[a,b,c: R(a) and S(b) and T(c)]
Mathematical Representation:
The original expression n^3 is much slower than the optimized 3n.
Example Two
Query: find the minimum sum of the combined rows a, b, c in tables R, S and T
def result = min[a,b,c,v: v= R[a] + S[b] + T[c]]
Mathematical Representation:
Math
Example Two
Query: find the minimum sum of the combined rows a, b, c in tables R, S and T
def result = min[a,b,c,v: v= R[a] + S[b] + T[c]]
Optimized Query:
def result = min[R] + min[S] + min[T]
C
B D
A E F
1
2
9 4
6
3
5
AEF = 9 + 4 = 13
ABDF = 1 + 6 + 5 = 12
ABCDF = 1 + 2 + 3 + 5 = 11
min{13,12,11} = 11
Shortest Path
from A to F
C
B D
A E F
0.9
0.9
0.4 0.8
0.2
1.0
0.7
AEF = 0.4 x 0.8 = 0.32
ABDF = 0.9 x 0.2 x 0.7 = 0.126
ABCDF = 0.9 x 0.9 x 1.0 x 0.7 = 0.567
max{0.32,0.126,0.567} = 0.567
Maximum Reliability
from A to F
C
B D
A E F
T
I
A T
H
M
E
AEF = A · T = AT
ABDF = T · H · E = THE
ABCDF = T · I · M · E = TIME
union{at, the, time} = at the time
Words
from A to F
Math
min { (9 + 4), (1 + 6 + 5), ( 1 + 2 + 3 + 5 ) }
max { (0.4 x 0.8), (0.9 x 0.2 x 0.7), (0.9 x 0.9 x 1.0 x 0.7) }
union { (A · T), (T · H · E), (T · I · M · E) }
Math
⊕ { (9 ⊗ 4), (1 ⊗ 6 ⊗ 5), ( 1 ⊗ 2 ⊗ 3 ⊗ 5 ) }
⊕ { (0.4 ⊗ 0.8), (0.9 ⊗ 0.2 ⊗ 0.7), (0.9 ⊗ 0.9 ⊗ 1.0 ⊗
0.7) }
⊕ { (A ⊗ T), (T ⊗ H ⊗ E), (T ⊗ I ⊗ M ⊗ E) }
Example Three
Query: count the number of 3-hop paths per node in a graph
def path3(a, b, c, d) = edge(a,b) and edge(b,c) and edge(c,d)
def result[a] = count[path3[a]]
Mathematical Representation:
A B C D
Math
Query: count the number of 3-hop paths per node in a graph
A B C D
Example Three
Query: count the number of 3-hop paths per node in a graph
def path3(a, b, c, d) = edge(a,b) and edge(b,c) and edge(c,d)
def result[a] = count[path3[a]]
Optimized Query:
def path1[c] = count[edge[c]]
def path2[b] = sum[path1[c] for c in edge[b]]
def result[a] = sum[path2[b] for b in edge[a]]
A B C D
Semantic Optimizer
Compute Discrete Fourier Transform in Fast Fourier Transform-time
Junction Tree Algorithm for inference in Probabilistic Graphical Models
Message passing, belief propagation
Viterbi Algorithm, forward/backward for Hidden Markov Models most probable
paths
Counting sub-graph patterns (motifs)
Yannakakis Algorithm for acyclic conjunctive queries in Polynomial Time
Fractional hypertree-width time algorithm for Constraint Satisfaction Problems
Best known results for Conjunctive Queries and Quantified Conjunctive
Queries
Math
This optimizer produces much better code than the average developer
because it knows a ton more math than the average developer.
Maryam Mirzakhani
Terence Tao
Ramanujan
Katherine Goble
Good Will Hunting
RelationalAI, Inc. All rights reserved 2022
Recursion
Idea Five
Recursion
def reachable = edge; reachable.edge
Loopy Lattice
Loopy Lattice
• 2 Paths
• 6 Paths
How many Shortest Paths are there
from the top left node to
the bottom right node?
How many Shortest Paths are there?
14x14 = 11 minutes
15x15 = 10 Hours
20x20 = Nope
How many Shortest Paths are there?
20x20 = 10 Minutes
137
Billion
How many Shortest Paths are there?
137
Billion
20 x 20 in
0.41 Seconds
@function @transient
def :_intermediate#0(other_node#1, path_length#0, _t#0) =
reduce[(_x#0, _y#0, _z#0) : :rel_primitive_add(_x#0, _y#0, _z#0),
(x#8, paths_of_length#1) :
:number_of_paths_of_length(other_node#1, x#8,
paths_of_length#1) and
:rel_primitive_add(1, x#8, path_length#0),
(_no_init#0) : false](_t#0)
@function @transient
def :_intermediate#1(node_number#0, path_length#0, path_count#0) =
reduce[(_x#1, _y#1, _z#1) : :rel_primitive_add(_x#1, _y#1, _z#1),
(other_node#1, _t#0) :
:edge(other_node#1, node_number#0) and
:_intermediate#0(other_node#1, path_length#0, _t#0),
(_no_init#1) : false](path_count#0)
def :number_of_paths_of_length(node_number#0, path_length#0,
path_count#0) =
:_base_case#0(node_number#0, path_length#0, path_count#0) or
:_intermediate#1(node_number#0, path_length#0, path_count#0)
Naive recursion, iteration 1
Evaluating `_intermediate#0`:
(1, 1) => (1,)
Evaluating `_intermediate#1`:
(2, 1) => (1,)
(4, 1) => (1,)
Evaluating `number_of_paths_of_length`:
(1, 0, 1)
(2, 1, 1)
(4, 1, 1)
@function @transient
def :_intermediate#0(other_node#1, path_length#0, _t#0) =
reduce[(_x#0, _y#0, _z#0) : :rel_primitive_add(_x#0, _y#0, _z#0),
(x#8, paths_of_length#1) :
:number_of_paths_of_length(other_node#1, x#8,
paths_of_length#1) and
:rel_primitive_add(1, x#8, path_length#0),
(_no_init#0) : false](_t#0)
@function @transient
def :_intermediate#1(node_number#0, path_length#0, path_count#0) =
reduce[(_x#1, _y#1, _z#1) : :rel_primitive_add(_x#1, _y#1, _z#1),
(other_node#1, _t#0) :
:edge(other_node#1, node_number#0) and
:_intermediate#0(other_node#1, path_length#0, _t#0),
(_no_init#1) : false](path_count#0)
def :number_of_paths_of_length(node_number#0, path_length#0,
path_count#0) =
:_base_case#0(node_number#0, path_length#0, path_count#0) or
:_intermediate#1(node_number#0, path_length#0, path_count#0)
Naive recursion, iteration 2
Evaluating `_intermediate#0`:
(1, 1) => (1,)
(2, 2) => (1,)
(4, 2) => (1,)
Evaluating `_intermediate#1`:
(2, 1) => (1,)
(3, 2) => (1,)
(4, 1) => (1,)
(5, 2) => (2,)
(7, 2) => (1,)
Evaluating `number_of_paths_of_length`:
(1, 0, 1)
(2, 1, 1)
(3, 2, 1)
(4, 1, 1)
(5, 2, 2)
(7, 2, 1)
@function @transient
def :_intermediate#0(other_node#1, path_length#0, _t#0) =
reduce[(_x#0, _y#0, _z#0) : :rel_primitive_add(_x#0, _y#0, _z#0),
(x#8, paths_of_length#1) :
:number_of_paths_of_length(other_node#1, x#8,
paths_of_length#1) and
:rel_primitive_add(1, x#8, path_length#0),
(_no_init#0) : false](_t#0)
@function @transient
def :_intermediate#1(node_number#0, path_length#0, path_count#0) =
reduce[(_x#1, _y#1, _z#1) : :rel_primitive_add(_x#1, _y#1, _z#1),
(other_node#1, _t#0) :
:edge(other_node#1, node_number#0) and
:_intermediate#0(other_node#1, path_length#0, _t#0),
(_no_init#1) : false](path_count#0)
def :number_of_paths_of_length(node_number#0, path_length#0,
path_count#0) =
:_base_case#0(node_number#0, path_length#0, path_count#0) or
:_intermediate#1(node_number#0, path_length#0, path_count#0)
Naive recursion, iteration 3
Evaluating `_intermediate#0`:
(1, 1) => (1,)
(2, 2) => (1,)
(3, 3) => (1,)
(4, 2) => (1,)
(5, 3) => (2,)
(7, 3) => (1,)
Evaluating `_intermediate#1`:
(2, 1) => (1,)
(3, 2) => (1,)
(4, 1) => (1,)
(5, 2) => (2,)
(6, 3) => (3,)
(7, 2) => (1,)
(8, 3) => (3,)
Evaluating `number_of_paths_of_length`:
(1, 0, 1)
(2, 1, 1)
(3, 2, 1)
(4, 1, 1)
(5, 2, 2)
(6, 3, 3)
(7, 2, 1)
(8, 3, 3)
@function @transient
def :_intermediate#0(other_node#1, path_length#0, _t#0) =
reduce[(_x#0, _y#0, _z#0) : :rel_primitive_add(_x#0, _y#0, _z#0),
(x#8, paths_of_length#1) :
:number_of_paths_of_length(other_node#1, x#8,
paths_of_length#1) and
:rel_primitive_add(1, x#8, path_length#0),
(_no_init#0) : false](_t#0)
@function @transient
def :_intermediate#1(node_number#0, path_length#0, path_count#0) =
reduce[(_x#1, _y#1, _z#1) : :rel_primitive_add(_x#1, _y#1, _z#1),
(other_node#1, _t#0) :
:edge(other_node#1, node_number#0) and
:_intermediate#0(other_node#1, path_length#0, _t#0),
(_no_init#1) : false](path_count#0)
def :number_of_paths_of_length(node_number#0, path_length#0,
path_count#0) =
:_base_case#0(node_number#0, path_length#0, path_count#0) or
:_intermediate#1(node_number#0, path_length#0, path_count#0)
Naive recursion, iteration 4
Evaluating `_intermediate#0`:
(1, 1) => (1,)
(2, 2) => (1,)
(3, 3) => (1,)
(4, 2) => (1,)
(5, 3) => (2,)
(6, 4) => (3,)
(7, 3) => (1,)
(8, 4) => (3,)
Evaluating `_intermediate#1`:
(2, 1) => (1,)
(3, 2) => (1,)
(4, 1) => (1,)
(5, 2) => (2,)
(6, 3) => (3,)
(7, 2) => (1,)
(8, 3) => (3,)
(9, 4) => (6,)
Evaluating `number_of_paths_of_length`:
(1, 0, 1)
(2, 1, 1)
(3, 2, 1)
(4, 1, 1)
(5, 2, 2)
(6, 3, 3)
(7, 2, 1)
(8, 3, 3)
(9, 4, 6)
RelationalAI, Inc. All rights reserved 2022
Stepwise Language
Idea Six
Graph Analytics
module graph_analytics[G]
with G use node, edge
def neighbor(x, y) = edge(x, y) or edge(y, x)
def outdegree[x] = count[edge[x]]
def degree[x] = count[neighbor[x]]
def cn[x, y] = count[intersect[neighbor[x], neighbor[y]]] // Count of Common Neighbors
def reachable = edge; reachable.edge
// Recursive!
def reachable_undirected = neighbor; reachable_undirected.neighbor // Recursive!
def scc[x] = min[v: reachable(x, v) and reachable(v, x)] // Strongly Connected Component
def wcc[x] = min[reachable_undirected[x]] // Weakly Connected Component
def cosine_sim[x, y] = cn[x, y] / sqrt[degree[x] * degree[y]]
def jaccard_sim[x, y] = cn[x, y] / (count[neighbor[x]] + count[neighbor[y]] - cn[x, y])
…
end
Graph Analytics
Dependencies
module graph_analytics[G]
with G use node, edge
def neighbor(x, y) = edge(x, y) or edge(y, x)
def outdegree[x] = count[edge[x]]
def degree[x] = count[neighbor[x]]
def cn[x, y] = count[intersect[neighbor[x], neighbor[y]]]
def reachable = edge; reachable.edge
def reachable_undirected = neighbor; reachable_undirected.neighbor
def scc[x] = min[v: reachable(x, v) and reachable(v, x)]
def wcc[x] = min[reachable_undirected[x]]
def cosine_sim[x, y] = cn[x, y] / sqrt[degree[x] * degree[y]]
def jaccard_sim[x, y] = cn[x, y] / count[neighbor[x]] + count[neighbor[y]] - cn[x, y]
…
end
How do you get to the Moon?
We don’t have to reinvent fire to get to Mars
Why does your new SQL query start with a blank page?
Entities
// Cliques can be empty
entity Clique nil = true
// In order to be clique you must be added to the clique.
// It requires a new member to be connected to existing members
entity Clique add_to_clique(member, members) = connected(member, members)
// new_member must just be a node if there are no other members
def connected(new_member, members) = node(new_member) and nil(member)
// If there are other members in the clique then the new_member must be connected to all of them
def connected(new_member, members) =
neighbor(new_member, last_member)
and new_member > last_member
and connected(new_member, other_members)
and add_to_clique(last_member, other_members, members) from (last_member, other_members)
// Note: We can walk add_to_clique backwards!
RelationalAI, Inc. All rights reserved 2022
Embedded Logic
Idea Seven
Miles?
Kilometers?
Period?
Minutes?
Hours?
Hours? Minutes?
Primary key?
Possible values?
Risk of
messing up
aggregates
Are these
exclusive?
Include helicopters?
Not a delay
Real World Data
A better Flight Model
Flight Model and Queries
Model
def cancelled(f) = flight(f) and flight_cancelled(f, "Y")
def diverted(f) = flight(f) and flight_diverted(f, "Y")
def arrived(f) = flight(f) and not (cancelled(f) or diverted(f))
def arrival_delay[f] = maximum[^Minute[0], arr_delay[f]]
ic forall(f in flight: cancelled(f) xor diverted(f) xor arrived(f))
ic forall(f in cancelled: not exists flight_time[f])
Use
count[cancelled]
count[f: cancelled(f) and operated_by(f, c)] for c in Carrier
ratio[cancelled, (f: destination(f,x))] for x in Airport
mean[arrival_delay]
mean[arrival_delay[f] for f where operated_by(f, c)]] for c in Carrier
Reasoning with the Data
Manage both your data and your business
logic together.
No more rewriting your logic procedurally
in languages like Java, C#, Python, Rust,
PL/SQL, T/SQL, etc.
10-100x Less Code
Much Higher Quality
RelationalAI, Inc. All rights reserved 2022
Incremental
Computation
Idea Eight
Betweenness Centrality
One of many of graph centrality measures which are
useful for assessing the importance of a node.
High Level Definition: Number of times a node appears on
shortest paths within a network
Why it’s Useful: Identify which nodes control information
flow between different areas of the graph; also called
“Bridge Nodes”
Business Use-Cases:
Communication Analysis: Identify important people
which communicate across different groups
Retail Purchase Analysis: Which products introduce
customers to new categories
Betweenness Centrality
Brandes Algorithm is applied as follows:
1. For each pair of nodes, compute all
shortest paths and capture nodes (less
endpoints) on said path(s)
2. For each pair of nodes, assign each node
along path a value of one if there is only
one shortest path, or the fractional
contribution (1/n) if n shortest paths
3. Sum the value from step 2 for each node;
this is the Betweenness Centrality
Betweenness Centrality
// Shortest path between s and t when they are the same is 0.
def shortest_path[s, t] = Min[
v, w:
(shortest_path(s, t, w) and v = 1) or
(w = shortest_path[s,v] +1 and E(v, t))
]
// When s and t are the same, there is only one shortest path between
// them, namely the one with length 0.
def nb_shortest(s, t, n) = V(s) and V(t) and s = t and n = 1
// When s and t are *not* the same, it is the sum of the number of
// shortest paths between s and v for all the v's adjacent to t and
// on the shortest path between s and t.
def nb_shortest(s, t, n) =
s != t and
n = sum[v, m:
shortest_path[s, v] + 1 = shortest_path[s, t] and E(v, t) and
nb_shortest(s, v, m)
]
// sum over all t's such that there is an edge between v and t,
// and v is on the shortest path between s and t
def C[s, v] = sum[t, r:
E(v, t) and shortest_path[s, t] = shortest_path[s, v] + 1 and
(
a = C[s, t] or
not C(s, t, _) and a = 0.0
) and
r = (nb_shortest[s, v] / nb_shortest[s, t]) * (1 + a)
] from a
// Note that below we divide by 2 because we are double
counting every edge.
def betweenness_centrality_brandes[v] =
sum[s, p : s != v and C[s, v] = p]/2
Betweenness Centrality Recomputation
Incremental updates to
data and recomputation of
Betweenness Centrality
takes only a few
seconds, whereas the
entire graph needs to be re-
computed in other systems.
Algorithm Change Recomputation
Incremental updates
to code is also
recomputated,
whereas the entire
algorithm needs to be
re-computed in other
systems.
RelationalAI, Inc. All rights reserved 2022
Demand Driven
Idea Nine
89
Eager maintenance is bad
Lazy maintenance is bad
detecting dirty computations is too
expensive when an output is queried.
Best: Eager invalidation
lazy evaluation
Inputs
Outputs
Inputs
Outputs
Inputs
Outputs
RelationalAI, Inc. All rights reserved 2022
Immutable Data
Idea Ten
Data Storage and Memory Management
92
Scalable, durable object storage
Ephemeral SSD cache
RAM cache (buffer pool)
fetch and evict
evict
fetch
evict
commit
RAI databases are immutable, including the catalog
demo
key/value store with CAS
rel A
rel B
rel C
...
RAI databases are immutable, including the catalog
demo
key/value store with CAS
transaction
updates C
rel A
rel B
rel C
rel C'
...
RAI databases are immutable, including the catalog
demo
demo-2022-03-25
key/value store with CAS
transaction
updates C
rel A
rel B
rel C
rel C'
...
96
Time Travel
● Run queries on your data as it “was”
in the past.
● Low cost time travel
● No need for a Flux Capacitor
demo-2022-03-25
demo
RAI databases are immutable, including the catalog
key/value store with CAS
rel A
rel B
rel C'
...
RAI databases are immutable, including the catalog
demo-2022-03-25
demo
key/value store with CAS
rel A
rel B
rel C'
transaction
...
99
Immutable Data
Strict Serializable Isolation
No Locks on Read Workloads
No Limits on Transaction
Duration
Cloning copies pointers not data
Low cost Time Travel/What If
Analysis
No Logs for Transactions
RelationalAI, Inc. All rights reserved 2022
Infinitely Scalable
Idea Eleven
101
Cloud Native
Infinite Storage
Infinite Compute
Fully Managed
Data Sharing
Versioning
Pay per Use
102
So what is Relational AI?
It is your next database
Narrow Tables
No Nulls
No Duplicates
More Indexes
One per Column
Free Composites
More Joins
Worst Case
Optimal
Semantic Optimizer
It knows Math
Recursion
Generic Queries
Limitless Language
Stepwise Definitions
Embedded Logic
Demand Driven
Incremental Computation
Immutable Data
Time Travel
Infinitely Scalable
On the Cloud
103
So what is Relational AI?
Easier
Modeling
Faster
Queries
Larger
Scale
It is your next database

Contenu connexe

Tendances

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 

Tendances (20)

Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
Vector database
Vector databaseVector database
Vector database
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 

Similaire à Developer Intro Deck-PowerPoint - Download for Speaker Notes

CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
J Singh
 

Similaire à Developer Intro Deck-PowerPoint - Download for Speaker Notes (20)

Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph Databases
 
Adobe
AdobeAdobe
Adobe
 
Qp cdsi18-math
Qp cdsi18-mathQp cdsi18-math
Qp cdsi18-math
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLAB
 
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHESVARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...
Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...
Introduction to Data structure & Algorithms - Sethuonline.com | Sathyabama Un...
 
3rd Semester Computer Science and Engineering (ACU) Question papers
3rd Semester Computer Science and Engineering  (ACU) Question papers3rd Semester Computer Science and Engineering  (ACU) Question papers
3rd Semester Computer Science and Engineering (ACU) Question papers
 
Chapter0
Chapter0Chapter0
Chapter0
 
Getting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commitsGetting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commits
 
AP PGECET Computer Science 2016 question paper
AP PGECET Computer Science 2016 question paperAP PGECET Computer Science 2016 question paper
AP PGECET Computer Science 2016 question paper
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 
R for you
R for youR for you
R for you
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
 
K means clustering
K means clusteringK means clustering
K means clustering
 
03 boolean algebra
03 boolean algebra03 boolean algebra
03 boolean algebra
 
Topik 1
Topik 1Topik 1
Topik 1
 
Dat 305 dat305 dat 305 education for service uopstudy.com
Dat 305 dat305 dat 305 education for service   uopstudy.comDat 305 dat305 dat 305 education for service   uopstudy.com
Dat 305 dat305 dat 305 education for service uopstudy.com
 
Algebra formulas
Algebra formulasAlgebra formulas
Algebra formulas
 

Plus de Max De Marzi

Plus de Max De Marzi (20)

DataDay 2023 Presentation
DataDay 2023 PresentationDataDay 2023 Presentation
DataDay 2023 Presentation
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - Notes
 
Neo4j Training Cypher
Neo4j Training CypherNeo4j Training Cypher
Neo4j Training Cypher
 
Neo4j Training Modeling
Neo4j Training ModelingNeo4j Training Modeling
Neo4j Training Modeling
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
 
Detenga el fraude complejo con Neo4j
Detenga el fraude complejo con Neo4jDetenga el fraude complejo con Neo4j
Detenga el fraude complejo con Neo4j
 
Data Modeling Tricks for Neo4j
Data Modeling Tricks for Neo4jData Modeling Tricks for Neo4j
Data Modeling Tricks for Neo4j
 
Fraud Detection and Neo4j
Fraud Detection and Neo4j Fraud Detection and Neo4j
Fraud Detection and Neo4j
 
Detecion de Fraude con Neo4j
Detecion de Fraude con Neo4jDetecion de Fraude con Neo4j
Detecion de Fraude con Neo4j
 
Neo4j Data Science Presentation
Neo4j Data Science PresentationNeo4j Data Science Presentation
Neo4j Data Science Presentation
 
Neo4j Stored Procedure Training Part 2
Neo4j Stored Procedure Training Part 2Neo4j Stored Procedure Training Part 2
Neo4j Stored Procedure Training Part 2
 
Neo4j Stored Procedure Training Part 1
Neo4j Stored Procedure Training Part 1Neo4j Stored Procedure Training Part 1
Neo4j Stored Procedure Training Part 1
 
Decision Trees in Neo4j
Decision Trees in Neo4jDecision Trees in Neo4j
Decision Trees in Neo4j
 
Neo4j y Fraude Spanish
Neo4j y Fraude SpanishNeo4j y Fraude Spanish
Neo4j y Fraude Spanish
 
Data modeling with neo4j tutorial
Data modeling with neo4j tutorialData modeling with neo4j tutorial
Data modeling with neo4j tutorial
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j Fundamentals
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j Presentation
 
Fraud Detection Class Slides
Fraud Detection Class SlidesFraud Detection Class Slides
Fraud Detection Class Slides
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Bootstrapping Recommendations OSCON 2015
Bootstrapping Recommendations OSCON 2015Bootstrapping Recommendations OSCON 2015
Bootstrapping Recommendations OSCON 2015
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Developer Intro Deck-PowerPoint - Download for Speaker Notes

  • 1. RelationalAI, Inc. All rights reserved 2022 Introduction August 2022
  • 2. 2 RelationalAI Tier 1 firms in Silicon Valley, NYC, and Seattle 170+ people 120+ computer scientists 2,000+ publications with 100K+ citations and 37 awards and counting Founded in 2017, Cloud-native Just out of stealth with enterprise customers Self-service availability in 2023 Leadership with 6 successful exits Active board of directors includes Bob Muglia, former CEO of Snowflake Team with broad expertise Databases, languages, and algorithms to machine learning and operations research $122M in Funding Berkeley HQ but 100% Remote 50+ PhDs
  • 3. RelationalAI, Inc. All rights reserved 2022 What is Relational AI?
  • 4.
  • 5. 5 What is Relational AI? Easier Modeling Faster Queries Larger Scale A Cloud Native Database
  • 6. 6 What is Relational AI? A Cloud Native Database Narrow Tables No Nulls No Duplicates More Indexes One per Column Free Composites More Joins Worst Case Optimal Semantic Optimizer It knows Math Recursion Generic Queries Limitless Language Stepwise Definitions Embedded Logic Demand Driven Incremental Computation Immutable Data Time Travel Infinitely Scalable On the Cloud
  • 7. Key Differences 9. Data and Model are Together a. 10-100x less code b. Higher Quality 10. Compute in Increments 11. Compute on Demand 12. Immutable Data a. No Read Locks b. Time-Travel 13. Cloud Native a. Infinite Compute b. Infinite Storage 1. Removed Nulls 2. Removed Duplicates 3. Key-Value or Key-Key Tables 4. Dynamic Composite Indexes 5. More Joins = Slower Faster 6. Semantic Query Optimizer knows more Math than you 7. Knows Recursion 8. Rel Query Language: a. Lego Blocks b. Type System c. Declarative, Logical, Relational
  • 8. 8 Fact: Most people have never tasted real Wasabi
  • 9. 9 Fact: Most people have never used a Relational Database
  • 11. Ted Codd - Inventor of the Relational Model
  • 12. What’s wrong with Null? • (a and NOT(a)) != True • Aggregation requires special cases • Outer Joins are not commutative a x b != b x a SELECT * FROM parts WHERE (price <= 99) OR (price > 99) SELECT * FROM parts WHERE (price <= 99) OR (price > 99) OR isNull(price) SELECT AVG(height) FROM parts SELECT orders.id, parts.id FROM orders LEFT OUTER JOIN parts ON parts.id = orders.part_id SELECT orders.id, parts.id FROM parts LEFT OUTER JOIN orders ON parts.id = orders.part_id
  • 13. Ted Codd - Inventor of the Relational Model
  • 14. Why do we have to lose the bags? Queries that use only ANDs (no ORs) are called “conjunctive queries” Conjunctive Queries under Set Semantics are MUCH Easier to Optimize Lots of Math can be defined by Sets Logicomix A graphic novel about the search for the foundations of mathematics Sets: {1,2,3}, {3,4,8} Bags: {1,2,2,3}, {3, 3, 5, 5} Sets have Unique Values Bags allow Duplicate Values
  • 15. Ted Codd - Inventor of the Relational Model
  • 16. How?
  • 17. RelationalAI, Inc. All rights reserved 2022 Narrow Tables Idea One
  • 21. RelationalAI, Inc. All rights reserved 2022 More Indexes Idea Two
  • 22. One Index per Column Order
  • 23. Composite Index Explosion Dual Index Narrow Tables => Dynamically Generated Composite Indexes
  • 24. RelationalAI, Inc. All rights reserved 2022 More Joins Idea Three Terrible
  • 25. Problem with Joins All of NoSQL is because of this Table 1 ID 0 1 3 4 5 6 7 8 9 11 Table 2 ID 0 2 6 7 8 9 Table 3 ID 2 4 5 8 10 Results Table 1 Table 2 Table 3 8 8 8 Intermediate Results Table1 and Table 2 0 6 7 8 9
  • 26. Worst Case Optimal Joins • Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems. Ngo. (Gems of PODS 2018) • Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems. Ngo, Porat, Re, Rudra. (Journal of the ACM 2018) • What do Shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? Abo Khamis, Ngo, Suciu, (PODS 2017 - Invited to Journal of ACM) • Computing Join Queries with Functional Dependencies. Abo Khamis, Ngo, Suciu. (PODS 2017) • Joins via Geometric Resolutions: Worst-case and Beyond. Abo Khamis, Ngo, Re, Rudra. (PODS 2015, Invited to TODS 2015) • Beyond Worst-Case Analysis for Joins with Minesweeper. Abo Khamis, Ngo, Re, Rudra. (PODS 2014) • Leapfrog Triejoin: A Simple Worst-Case Optimal Join Algorithm. Veldhuizen (ICDT 2014 - Best Newcomer) • Skew Strikes Back: New Developments in the Theory of Join Algorithms. Ngo, Re, Rudra. (Invited to SIGMOD Record 2013) • Worst Case Optimal Join Algorithms. Ngo, Porat, Re, Rudra. (PODS 2012 – Best Paper)
  • 27. Worst Case Optimal Join Table 1 ID 0 1 3 4 5 6 7 8 9 11 Table 2 ID 0 2 6 7 8 9 Table 3 ID 2 4 5 8 10 Table IDs Action Table 1 Table 2 Table 3 0 0 2 Table 1: Seek 2 3 0 2 Table 2: Seek 3 3 6 2 Table 3: Seek 6 3 6 8 Table 1: Seek 8 8 6 8 Table 2: Seek 8 8 8 8 Emit, Table 3: Next 8 8 10 Table 1: Seek 10 11 8 10 Table 2: Seek 11 END Results Table 1 Table 2 Table 3 8 8 8 Start End Seek 2 Seek 3 Seek 6 Seek 8 Seek 10 Seek 8 Next Seek 11
  • 28. More than 3 Tables Worst-Case Optimal Joins take advantage of sorted keys and gaps in the data to eliminate intermediate results, speed up queries and get rid of the Join problem. m a 14 Brand Category Retailer Rating p o n b 7) seek m 6) seek m 3) seek f 5) seek m 4) seek g 2) seek c 1) seek c c d e f g
  • 29.
  • 30. RelationalAI, Inc. All rights reserved 2022 Semantic Optimizer Idea Four
  • 31. Traditional Query Optimizers • Predicate pushdown (push selection through join) • Projection pushdown (push projection through join) • Aggregation pushdown • Their “pull ups” counter parts • Split conjunctive predicates (split AND statements) • Replace cartesian products (use inner joins with predicates) • (Un)Nesting Sub-Queries • Etc.
  • 32. Semantic Optimizer Data Answer Rel model Equivalent Rel models Knowledge Semantic Optimizer Optimized Rel model
  • 33.
  • 34. Math 1 + (2 + 3) = (1 + 2) + 3 3 + 4 = 4 + 3 3 + 0 = 3 1 + (-1) = 0 2 x (3 x 4) = (2 x 3) x 4 2 x 5 = 5 x 2 2 x 1 = 2 2 x 0.5 = 1 2 x (3 + 4) = (2 x 3) + (2 x 4) (3 + 4) x 2 = (3 x 2) + (4 x 2)
  • 35. Math a + (b + c) = (a + b) + c a + b = b + a a + 0 = a a + (-a) = 0 a x (b x c) = (a x b) x c a x b = b x a a x 1 = a a x a-1 = 1, a != 0 a x (b + c) = (a x b) + (a x c) (a + b) x c = (a x c) + (b x c)
  • 36. Math Addition: Associativity: a ⊕ (b ⊕ c) = (a ⊕ b) ⊕ c Commutativity: a ⊕ b = b ⊕ a Identity: a ⊕ ō = a Inverse: a ⊕ (-a) = ō Multiplication Associativity: a ⊗ (b ⊗ c) = (a ⊗ b) ⊗ c Commutativity: a ⊗ b = b ⊗ a Identity: a ⊗ ī = a Inverse: a ⊗ a-1 = ī Distribution of Multiplication over Addition: a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c) (a ⊕ b) ⊗ c = (a ⊗ c) ⊕ (b ⊗ c)
  • 37. Example One Query: find the count of the combined rows a, b, c in tables R, S and T def result = count[a,b,c: R(a) and S(b) and T(c)] Mathematical Representation:
  • 38. Math
  • 39. Example One Query: find the count of the combined rows a, b, c in tables R, S and T
  • 40. Example One Query: find the count of the combined rows a, b, c in tables R, S and T
  • 41. Example One Query: find the count of the combined rows a, b, c in tables R, S and T def result = count[a,b,c: R(a) and S(b) and T(c)] Mathematical Representation: The original expression n^3 is much slower than the optimized 3n.
  • 42. Example Two Query: find the minimum sum of the combined rows a, b, c in tables R, S and T def result = min[a,b,c,v: v= R[a] + S[b] + T[c]] Mathematical Representation:
  • 43. Math
  • 44. Example Two Query: find the minimum sum of the combined rows a, b, c in tables R, S and T def result = min[a,b,c,v: v= R[a] + S[b] + T[c]] Optimized Query: def result = min[R] + min[S] + min[T]
  • 45. C B D A E F 1 2 9 4 6 3 5 AEF = 9 + 4 = 13 ABDF = 1 + 6 + 5 = 12 ABCDF = 1 + 2 + 3 + 5 = 11 min{13,12,11} = 11 Shortest Path from A to F
  • 46. C B D A E F 0.9 0.9 0.4 0.8 0.2 1.0 0.7 AEF = 0.4 x 0.8 = 0.32 ABDF = 0.9 x 0.2 x 0.7 = 0.126 ABCDF = 0.9 x 0.9 x 1.0 x 0.7 = 0.567 max{0.32,0.126,0.567} = 0.567 Maximum Reliability from A to F
  • 47. C B D A E F T I A T H M E AEF = A · T = AT ABDF = T · H · E = THE ABCDF = T · I · M · E = TIME union{at, the, time} = at the time Words from A to F
  • 48. Math min { (9 + 4), (1 + 6 + 5), ( 1 + 2 + 3 + 5 ) } max { (0.4 x 0.8), (0.9 x 0.2 x 0.7), (0.9 x 0.9 x 1.0 x 0.7) } union { (A · T), (T · H · E), (T · I · M · E) }
  • 49. Math ⊕ { (9 ⊗ 4), (1 ⊗ 6 ⊗ 5), ( 1 ⊗ 2 ⊗ 3 ⊗ 5 ) } ⊕ { (0.4 ⊗ 0.8), (0.9 ⊗ 0.2 ⊗ 0.7), (0.9 ⊗ 0.9 ⊗ 1.0 ⊗ 0.7) } ⊕ { (A ⊗ T), (T ⊗ H ⊗ E), (T ⊗ I ⊗ M ⊗ E) }
  • 50. Example Three Query: count the number of 3-hop paths per node in a graph def path3(a, b, c, d) = edge(a,b) and edge(b,c) and edge(c,d) def result[a] = count[path3[a]] Mathematical Representation: A B C D
  • 51. Math Query: count the number of 3-hop paths per node in a graph A B C D
  • 52. Example Three Query: count the number of 3-hop paths per node in a graph def path3(a, b, c, d) = edge(a,b) and edge(b,c) and edge(c,d) def result[a] = count[path3[a]] Optimized Query: def path1[c] = count[edge[c]] def path2[b] = sum[path1[c] for c in edge[b]] def result[a] = sum[path2[b] for b in edge[a]] A B C D
  • 53. Semantic Optimizer Compute Discrete Fourier Transform in Fast Fourier Transform-time Junction Tree Algorithm for inference in Probabilistic Graphical Models Message passing, belief propagation Viterbi Algorithm, forward/backward for Hidden Markov Models most probable paths Counting sub-graph patterns (motifs) Yannakakis Algorithm for acyclic conjunctive queries in Polynomial Time Fractional hypertree-width time algorithm for Constraint Satisfaction Problems Best known results for Conjunctive Queries and Quantified Conjunctive Queries
  • 54. Math This optimizer produces much better code than the average developer because it knows a ton more math than the average developer. Maryam Mirzakhani Terence Tao Ramanujan Katherine Goble Good Will Hunting
  • 55.
  • 56. RelationalAI, Inc. All rights reserved 2022 Recursion Idea Five
  • 57. Recursion def reachable = edge; reachable.edge
  • 59. Loopy Lattice • 2 Paths • 6 Paths How many Shortest Paths are there from the top left node to the bottom right node?
  • 60. How many Shortest Paths are there? 14x14 = 11 minutes 15x15 = 10 Hours 20x20 = Nope
  • 61. How many Shortest Paths are there? 20x20 = 10 Minutes 137 Billion
  • 62. How many Shortest Paths are there? 137 Billion 20 x 20 in 0.41 Seconds
  • 63. @function @transient def :_intermediate#0(other_node#1, path_length#0, _t#0) = reduce[(_x#0, _y#0, _z#0) : :rel_primitive_add(_x#0, _y#0, _z#0), (x#8, paths_of_length#1) : :number_of_paths_of_length(other_node#1, x#8, paths_of_length#1) and :rel_primitive_add(1, x#8, path_length#0), (_no_init#0) : false](_t#0) @function @transient def :_intermediate#1(node_number#0, path_length#0, path_count#0) = reduce[(_x#1, _y#1, _z#1) : :rel_primitive_add(_x#1, _y#1, _z#1), (other_node#1, _t#0) : :edge(other_node#1, node_number#0) and :_intermediate#0(other_node#1, path_length#0, _t#0), (_no_init#1) : false](path_count#0) def :number_of_paths_of_length(node_number#0, path_length#0, path_count#0) = :_base_case#0(node_number#0, path_length#0, path_count#0) or :_intermediate#1(node_number#0, path_length#0, path_count#0) Naive recursion, iteration 1 Evaluating `_intermediate#0`: (1, 1) => (1,) Evaluating `_intermediate#1`: (2, 1) => (1,) (4, 1) => (1,) Evaluating `number_of_paths_of_length`: (1, 0, 1) (2, 1, 1) (4, 1, 1)
  • 64. @function @transient def :_intermediate#0(other_node#1, path_length#0, _t#0) = reduce[(_x#0, _y#0, _z#0) : :rel_primitive_add(_x#0, _y#0, _z#0), (x#8, paths_of_length#1) : :number_of_paths_of_length(other_node#1, x#8, paths_of_length#1) and :rel_primitive_add(1, x#8, path_length#0), (_no_init#0) : false](_t#0) @function @transient def :_intermediate#1(node_number#0, path_length#0, path_count#0) = reduce[(_x#1, _y#1, _z#1) : :rel_primitive_add(_x#1, _y#1, _z#1), (other_node#1, _t#0) : :edge(other_node#1, node_number#0) and :_intermediate#0(other_node#1, path_length#0, _t#0), (_no_init#1) : false](path_count#0) def :number_of_paths_of_length(node_number#0, path_length#0, path_count#0) = :_base_case#0(node_number#0, path_length#0, path_count#0) or :_intermediate#1(node_number#0, path_length#0, path_count#0) Naive recursion, iteration 2 Evaluating `_intermediate#0`: (1, 1) => (1,) (2, 2) => (1,) (4, 2) => (1,) Evaluating `_intermediate#1`: (2, 1) => (1,) (3, 2) => (1,) (4, 1) => (1,) (5, 2) => (2,) (7, 2) => (1,) Evaluating `number_of_paths_of_length`: (1, 0, 1) (2, 1, 1) (3, 2, 1) (4, 1, 1) (5, 2, 2) (7, 2, 1)
  • 65. @function @transient def :_intermediate#0(other_node#1, path_length#0, _t#0) = reduce[(_x#0, _y#0, _z#0) : :rel_primitive_add(_x#0, _y#0, _z#0), (x#8, paths_of_length#1) : :number_of_paths_of_length(other_node#1, x#8, paths_of_length#1) and :rel_primitive_add(1, x#8, path_length#0), (_no_init#0) : false](_t#0) @function @transient def :_intermediate#1(node_number#0, path_length#0, path_count#0) = reduce[(_x#1, _y#1, _z#1) : :rel_primitive_add(_x#1, _y#1, _z#1), (other_node#1, _t#0) : :edge(other_node#1, node_number#0) and :_intermediate#0(other_node#1, path_length#0, _t#0), (_no_init#1) : false](path_count#0) def :number_of_paths_of_length(node_number#0, path_length#0, path_count#0) = :_base_case#0(node_number#0, path_length#0, path_count#0) or :_intermediate#1(node_number#0, path_length#0, path_count#0) Naive recursion, iteration 3 Evaluating `_intermediate#0`: (1, 1) => (1,) (2, 2) => (1,) (3, 3) => (1,) (4, 2) => (1,) (5, 3) => (2,) (7, 3) => (1,) Evaluating `_intermediate#1`: (2, 1) => (1,) (3, 2) => (1,) (4, 1) => (1,) (5, 2) => (2,) (6, 3) => (3,) (7, 2) => (1,) (8, 3) => (3,) Evaluating `number_of_paths_of_length`: (1, 0, 1) (2, 1, 1) (3, 2, 1) (4, 1, 1) (5, 2, 2) (6, 3, 3) (7, 2, 1) (8, 3, 3)
  • 66. @function @transient def :_intermediate#0(other_node#1, path_length#0, _t#0) = reduce[(_x#0, _y#0, _z#0) : :rel_primitive_add(_x#0, _y#0, _z#0), (x#8, paths_of_length#1) : :number_of_paths_of_length(other_node#1, x#8, paths_of_length#1) and :rel_primitive_add(1, x#8, path_length#0), (_no_init#0) : false](_t#0) @function @transient def :_intermediate#1(node_number#0, path_length#0, path_count#0) = reduce[(_x#1, _y#1, _z#1) : :rel_primitive_add(_x#1, _y#1, _z#1), (other_node#1, _t#0) : :edge(other_node#1, node_number#0) and :_intermediate#0(other_node#1, path_length#0, _t#0), (_no_init#1) : false](path_count#0) def :number_of_paths_of_length(node_number#0, path_length#0, path_count#0) = :_base_case#0(node_number#0, path_length#0, path_count#0) or :_intermediate#1(node_number#0, path_length#0, path_count#0) Naive recursion, iteration 4 Evaluating `_intermediate#0`: (1, 1) => (1,) (2, 2) => (1,) (3, 3) => (1,) (4, 2) => (1,) (5, 3) => (2,) (6, 4) => (3,) (7, 3) => (1,) (8, 4) => (3,) Evaluating `_intermediate#1`: (2, 1) => (1,) (3, 2) => (1,) (4, 1) => (1,) (5, 2) => (2,) (6, 3) => (3,) (7, 2) => (1,) (8, 3) => (3,) (9, 4) => (6,) Evaluating `number_of_paths_of_length`: (1, 0, 1) (2, 1, 1) (3, 2, 1) (4, 1, 1) (5, 2, 2) (6, 3, 3) (7, 2, 1) (8, 3, 3) (9, 4, 6)
  • 67.
  • 68. RelationalAI, Inc. All rights reserved 2022 Stepwise Language Idea Six
  • 69. Graph Analytics module graph_analytics[G] with G use node, edge def neighbor(x, y) = edge(x, y) or edge(y, x) def outdegree[x] = count[edge[x]] def degree[x] = count[neighbor[x]] def cn[x, y] = count[intersect[neighbor[x], neighbor[y]]] // Count of Common Neighbors def reachable = edge; reachable.edge // Recursive! def reachable_undirected = neighbor; reachable_undirected.neighbor // Recursive! def scc[x] = min[v: reachable(x, v) and reachable(v, x)] // Strongly Connected Component def wcc[x] = min[reachable_undirected[x]] // Weakly Connected Component def cosine_sim[x, y] = cn[x, y] / sqrt[degree[x] * degree[y]] def jaccard_sim[x, y] = cn[x, y] / (count[neighbor[x]] + count[neighbor[y]] - cn[x, y]) … end
  • 70. Graph Analytics Dependencies module graph_analytics[G] with G use node, edge def neighbor(x, y) = edge(x, y) or edge(y, x) def outdegree[x] = count[edge[x]] def degree[x] = count[neighbor[x]] def cn[x, y] = count[intersect[neighbor[x], neighbor[y]]] def reachable = edge; reachable.edge def reachable_undirected = neighbor; reachable_undirected.neighbor def scc[x] = min[v: reachable(x, v) and reachable(v, x)] def wcc[x] = min[reachable_undirected[x]] def cosine_sim[x, y] = cn[x, y] / sqrt[degree[x] * degree[y]] def jaccard_sim[x, y] = cn[x, y] / count[neighbor[x]] + count[neighbor[y]] - cn[x, y] … end
  • 71. How do you get to the Moon? We don’t have to reinvent fire to get to Mars Why does your new SQL query start with a blank page?
  • 72. Entities // Cliques can be empty entity Clique nil = true // In order to be clique you must be added to the clique. // It requires a new member to be connected to existing members entity Clique add_to_clique(member, members) = connected(member, members) // new_member must just be a node if there are no other members def connected(new_member, members) = node(new_member) and nil(member) // If there are other members in the clique then the new_member must be connected to all of them def connected(new_member, members) = neighbor(new_member, last_member) and new_member > last_member and connected(new_member, other_members) and add_to_clique(last_member, other_members, members) from (last_member, other_members) // Note: We can walk add_to_clique backwards!
  • 73.
  • 74. RelationalAI, Inc. All rights reserved 2022 Embedded Logic Idea Seven
  • 75. Miles? Kilometers? Period? Minutes? Hours? Hours? Minutes? Primary key? Possible values? Risk of messing up aggregates Are these exclusive? Include helicopters? Not a delay Real World Data
  • 77. Flight Model and Queries Model def cancelled(f) = flight(f) and flight_cancelled(f, "Y") def diverted(f) = flight(f) and flight_diverted(f, "Y") def arrived(f) = flight(f) and not (cancelled(f) or diverted(f)) def arrival_delay[f] = maximum[^Minute[0], arr_delay[f]] ic forall(f in flight: cancelled(f) xor diverted(f) xor arrived(f)) ic forall(f in cancelled: not exists flight_time[f]) Use count[cancelled] count[f: cancelled(f) and operated_by(f, c)] for c in Carrier ratio[cancelled, (f: destination(f,x))] for x in Airport mean[arrival_delay] mean[arrival_delay[f] for f where operated_by(f, c)]] for c in Carrier
  • 78. Reasoning with the Data Manage both your data and your business logic together. No more rewriting your logic procedurally in languages like Java, C#, Python, Rust, PL/SQL, T/SQL, etc. 10-100x Less Code Much Higher Quality
  • 79. RelationalAI, Inc. All rights reserved 2022 Incremental Computation Idea Eight
  • 80. Betweenness Centrality One of many of graph centrality measures which are useful for assessing the importance of a node. High Level Definition: Number of times a node appears on shortest paths within a network Why it’s Useful: Identify which nodes control information flow between different areas of the graph; also called “Bridge Nodes” Business Use-Cases: Communication Analysis: Identify important people which communicate across different groups Retail Purchase Analysis: Which products introduce customers to new categories
  • 81. Betweenness Centrality Brandes Algorithm is applied as follows: 1. For each pair of nodes, compute all shortest paths and capture nodes (less endpoints) on said path(s) 2. For each pair of nodes, assign each node along path a value of one if there is only one shortest path, or the fractional contribution (1/n) if n shortest paths 3. Sum the value from step 2 for each node; this is the Betweenness Centrality
  • 82. Betweenness Centrality // Shortest path between s and t when they are the same is 0. def shortest_path[s, t] = Min[ v, w: (shortest_path(s, t, w) and v = 1) or (w = shortest_path[s,v] +1 and E(v, t)) ] // When s and t are the same, there is only one shortest path between // them, namely the one with length 0. def nb_shortest(s, t, n) = V(s) and V(t) and s = t and n = 1 // When s and t are *not* the same, it is the sum of the number of // shortest paths between s and v for all the v's adjacent to t and // on the shortest path between s and t. def nb_shortest(s, t, n) = s != t and n = sum[v, m: shortest_path[s, v] + 1 = shortest_path[s, t] and E(v, t) and nb_shortest(s, v, m) ] // sum over all t's such that there is an edge between v and t, // and v is on the shortest path between s and t def C[s, v] = sum[t, r: E(v, t) and shortest_path[s, t] = shortest_path[s, v] + 1 and ( a = C[s, t] or not C(s, t, _) and a = 0.0 ) and r = (nb_shortest[s, v] / nb_shortest[s, t]) * (1 + a) ] from a // Note that below we divide by 2 because we are double counting every edge. def betweenness_centrality_brandes[v] = sum[s, p : s != v and C[s, v] = p]/2
  • 83.
  • 84. Betweenness Centrality Recomputation Incremental updates to data and recomputation of Betweenness Centrality takes only a few seconds, whereas the entire graph needs to be re- computed in other systems.
  • 85. Algorithm Change Recomputation Incremental updates to code is also recomputated, whereas the entire algorithm needs to be re-computed in other systems.
  • 86. RelationalAI, Inc. All rights reserved 2022 Demand Driven Idea Nine
  • 87. 89 Eager maintenance is bad Lazy maintenance is bad detecting dirty computations is too expensive when an output is queried. Best: Eager invalidation lazy evaluation Inputs Outputs Inputs Outputs Inputs Outputs
  • 88.
  • 89. RelationalAI, Inc. All rights reserved 2022 Immutable Data Idea Ten
  • 90. Data Storage and Memory Management 92 Scalable, durable object storage Ephemeral SSD cache RAM cache (buffer pool) fetch and evict evict fetch evict commit
  • 91. RAI databases are immutable, including the catalog demo key/value store with CAS rel A rel B rel C ...
  • 92. RAI databases are immutable, including the catalog demo key/value store with CAS transaction updates C rel A rel B rel C rel C' ...
  • 93. RAI databases are immutable, including the catalog demo demo-2022-03-25 key/value store with CAS transaction updates C rel A rel B rel C rel C' ...
  • 94. 96 Time Travel ● Run queries on your data as it “was” in the past. ● Low cost time travel ● No need for a Flux Capacitor
  • 95. demo-2022-03-25 demo RAI databases are immutable, including the catalog key/value store with CAS rel A rel B rel C' ...
  • 96. RAI databases are immutable, including the catalog demo-2022-03-25 demo key/value store with CAS rel A rel B rel C' transaction ...
  • 97. 99 Immutable Data Strict Serializable Isolation No Locks on Read Workloads No Limits on Transaction Duration Cloning copies pointers not data Low cost Time Travel/What If Analysis No Logs for Transactions
  • 98. RelationalAI, Inc. All rights reserved 2022 Infinitely Scalable Idea Eleven
  • 99. 101 Cloud Native Infinite Storage Infinite Compute Fully Managed Data Sharing Versioning Pay per Use
  • 100. 102 So what is Relational AI? It is your next database Narrow Tables No Nulls No Duplicates More Indexes One per Column Free Composites More Joins Worst Case Optimal Semantic Optimizer It knows Math Recursion Generic Queries Limitless Language Stepwise Definitions Embedded Logic Demand Driven Incremental Computation Immutable Data Time Travel Infinitely Scalable On the Cloud
  • 101. 103 So what is Relational AI? Easier Modeling Faster Queries Larger Scale It is your next database