The document discusses analyzing the performance of algorithmic SQL and PL/SQL. It begins with an agenda that includes topics on algorithms at different levels in SQL and PL/SQL, network analysis problems, solving network path problems via pure SQL, timing two PL/SQL network analysis algorithms, Oracle profiling tools, and tuning SQL queries and a PL/SQL procedure. The document then dives into each topic, providing examples of applying various algorithms like joins, grouping, analytics, pattern matching, and recursive queries in SQL to solve problems, as well as examples of PL/SQL algorithms and using Oracle tools to analyze performance.
Enterprise Document Management System - Qualityze Inc
Analysing Performance of Algorithmic SQL and PLSQL.pptx
1. Analysing Performance of Algorithmic SQL and PL/SQL
Brendan Furey, September 2022
A Programmer Writes… (Brendan's Blog)
Ireland Oracle User Group, September 5-6, 2022
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 1
2. whoami
Freelance Oracle developer and blogger
Keen interest in programming concepts
Started career as a Fortran programmer at British Gas
Dublin-based Europhile
30 years Oracle experience, currently working in Finance
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 2
3. Agenda
Algorithms and SQL (9 slides)
On algorithms at different levels in SQL and PL/SQL
Network Analysis Problems (4 slides)
On shortest path and subnetwork grouping problems
Network Paths by SQL (7 slides)
Solving all- and shortest- path problems via pure SQL
Two Algorithms with Code Timing (7 slides)
Two PL/SQL network analysis algorithms with code timing and performance analysis
Oracle Standard Profilers (2 slides)
Results from two standard Oracle profiling tools for the Subnetwork Grouper procedure
Tuning 1 - SQL for Isolated Nodes (5 slides)
Recap of join methods and types, then queries with antijoin structures and hints
Tuning 2 - SQL for Isolated Links (8 slides)
Disastrous ‘Bitmap Or’ expansion, good & bad antijoin plans and efficient group counting query
Tuning 3 - SQL for Root Node Selector (4 slides)
Code timing several methods for root node selection
Tuning – Results (2 slides)
Code timing results for one dataset and before and after results for Subnetwork Grouper for all
Conclusion (1 slide)
A few recommendations split between SQL and PL/SQL
Brendan Furey, 2022 3
Analysing Performance of Algorithmic SQL and PL/SQL
4. Algorithms and SQL
Brendan Furey, 2022 4
Algorithms and SQL (9 slides)
On algorithms at different levels in SQL and PL/SQL
Analysing Performance of Algorithmic SQL and PL/SQL
5. The Algorithm (extracts from Computer Hope web page)
Brendan Furey, 2022 5
Analysing Performance of Algorithmic SQL and PL/SQL
Algorithm - Computer Hope
Derived from the name of the
mathematician Muhammed ibn-Musa
Al-Khowarizmi, an algorithm is a
solution to a problem that meets the
following criteria.
A list of instructions, procedures,
or formula that solves a problem
Can be proven
Something that always finishes
and works
When was the first algorithm?
Because a cooking recipe could be considered an algorithm, the first algorithm could go back as far as written
language
However, many find Euclid's algorithm for finding the greatest common divisor to be the first algorithm. This
algorithm was first described in 300 B.C.
Ada Lovelace is credited as being the first computer programmer and the first person to develop an algorithm
for a machine
6. Algorithms and SQL 1 - Built-In Algorithms and Subquery Sequence
Brendan Furey, 2022 6
Analysing Performance of Algorithmic SQL and PL/SQL
Declarative Language (paraphrase from Britannia.com)
Declarative languages are programming languages in which a program specifies what is to be
done rather than how to do it
SQL as a declarative language?
SQL is often described as a declarative (or non-procedural) language
But it’s a bit more complicated than that, especially when performance is important…
Built-In Algorithms
Oracle provides built-in algorithms for joining tables and other rowsets, and grouping
Oracle provides additional specific built-in algorithms for processing an input rowset …
Analytics allows aggregation over partition key within rowset windows
Match Recognize allows patterns to be reported across rows
These algorithms are configured declaratively within an SQL subquery
Also, we have more general algorithms
Recursive subquery factors allow for recursive algorithms
Model clause allows for iteration over cells within a spreadsheet-like array
Subquery Sequence Build queries in a sequence of subquery steps
7. Algorithms and SQL 2 - Joins and Grouping
Brendan Furey, 2022 7
Analysing Performance of Algorithmic SQL and PL/SQL
SELECT d.department_name, Avg(e.salary) avg_sal
FROM departments d
JOIN employees e
ON e.department_id = d.department_id
GROUP BY d.department_name
ORDER BY d.department_name
Simple Query with Joins and Grouping
A simple query joins data sources, and may group by a key…
with aggregate functions on non-key columns
Oracle CBO has multiple algorithms for joining and for aggregation
Hash Join – using full table scans for larger data sets
Nested Loops – using indexes for smaller data sets
CBO chooses algorithm based on table statistics
We can override with hints:
USE_HASH(e)
USE_NL(e)
DEPARTMENT_NAME AVG_SAL
---------------- -------
Accounting 10,154
Administration 4,400
Executive 19,333
Finance 8,601
Human Resources 6,500
IT 5,760
Marketing 9,500
Public Relations 10,000
Purchasing 4,150
Sales 8,956
Shipping 3,476
Example: Average salary grouped by department
8. Algorithms and SQL 3 - Analytics
Brendan Furey, 2022 8
Analysing Performance of Algorithmic SQL and PL/SQL
Analytics allows aggregation over partition key within rowset windows
WITH rowset AS (
SELECT d.department_name, e.hire_date, e.last_name, e.salary
FROM departments d
JOIN employees e ON e.department_id = d.department_id
)
SELECT department_name, hire_date, last_name, salary,
Sum(salary) OVER (PARTITION BY department_name
ORDER BY hire_date) rsum_sal,
salary - Lag(salary) OVER (PARTITION BY department_name
ORDER BY hire_date) sal_incr
FROM rowset
ORDER BY department_name, hire_date
DEPARTMENT_NAME HIRE_DATE LAST_NAME SALARY RSUM_SAL SAL_INCR
---------------- --------- ------------ ------- -------- --------
Accounting 07-JUN-02 Gietz 8,300 20,308
Accounting 07-JUN-02 Higgins 12,008 20,308 3,708
Administration 17-SEP-03 Whalen 4,400 4,400
.
Sales 04-JAN-08 Johnson 6,200 261,300 -800
Sales 24-JAN-08 Marvins 7,200 268,500 1,000
Sales 29-JAN-08 Zlotkey 10,500 279,000 3,300
.
Example: Running sum of salaries and salary increase by department
Can have multiple independent expressions
Aggregate functions on fields (or expressions), apply over the partition
Row set is unaltered, and does not have to be a separate subquery
Range specifies a window based on the Order By expression
Often range is defaulted, in example is Unbounded Preceding
9. Algorithms and SQL 4 - Pattern Matching
Brendan Furey, 2022 9
Analysing Performance of Algorithmic SQL and PL/SQL
Match Recognize allows patterns to be reported across rows
WITH rowset AS (
SELECT dep.department_name, emp.hire_date, emp.last_name, emp.salary
FROM departments dep
JOIN employees emp ON emp.department_id = dep.department_id)
SELECT * FROM rowset
MATCH_RECOGNIZE (
PARTITION BY department_name
ORDER BY hire_date
MEASURES last_name AS last_name, salary AS salary
ONE ROW PER MATCH AFTER MATCH SKIP TO NEXT ROW
PATTERN ( up{2} )
DEFINE up AS up.salary > PREV(up.salary))
DEPARTMENT_NAME LAST_NAME SALARY
---------------- ------------ -------
Sales Bloom 10,000
Sales Zlotkey 10,500
Shipping OConnell 2,600
Shipping Mourgos 5,800
Shipping Grant 2,600
Shipping Geoni 2,800
Example: Two consecutive salary increases
The Partition By allows for independent patterns across keys
Order By defines row sequence
Measures specifies fields (or expressions) to output
Specify behaviour in relation to matches
Pattern expresses sequences of values across rows
Using a regex-like syntax
Referencing variables from the Define section
In example up{2} ~ 2 adjacent instances of salary increase
10. Algorithms and SQL 5 - Recursive Subqueries
Brendan Furey, 2022 10
Analysing Performance of Algorithmic SQL and PL/SQL
Recursive subquery has anchor branch in union with
…recursive branch that reads from subquery itself
Partitioning via where clause
DEPARTMENT_NAME LAST_NAME MULT R_PROD
--------------- --------- ------ --------
Accounting Gietz 1.83 1.83
Accounting Higgins 2.2008 4.027464
Administration Whalen 1.44 1.44
Executive De Haan 2.7 2.7
Executive King 3.4 9.18
Executive Kochhar 2.7 24.786
.
Example: Running Products
WITH multipliers AS (
SELECT d.department_name, e.last_name, (1 + e.salary/10000) mult,
Row_Number() OVER (PARTITION BY d.department_name
ORDER BY e.last_name) rn
FROM departments d
JOIN employees e ON e.department_id = d.department_id
), rsf (department_name, last_name, rn, mult, running_prod) AS (
SELECT department_name, last_name, rn, mult, mult running_prod
FROM multipliers
WHERE rn = 1
UNION ALL
SELECT m.department_name, m.last_name, m.rn, m.mult,
r.running_prod * m.mult
FROM rsf r
JOIN multipliers m ON m.rn = r.rn + 1
AND m.department_name = r.department_name)
SELECT department_name, last_name, mult, running_prod FROM rsf
ORDER BY department_name, last_name
Performs well for hierarchies, less well for looped
structures (as we’ll see later)
11. Algorithms and SQL 6 - Model Clause
Brendan Furey, 2022 11
Analysing Performance of Algorithmic SQL and PL/SQL
Example: Running Products
WITH multipliers AS (
SELECT d.department_name, e.last_name, (1 + e.salary/10000) mult
FROM departments d
JOIN employees e ON e.department_id = d.department_id
)
SELECT department_name, last_name, mult, running_prod
FROM multipliers
MODEL
PARTITION BY (department_name)
DIMENSION BY (Row_Number() OVER (PARTITION BY department_name
ORDER BY last_name) rn)
MEASURES (last_name, mult, mult running_prod)
RULES (running_prod[rn > 1] = mult[CV()] * running_prod[CV() - 1])
ORDER BY department_name, last_name
Model clause does not have the best
reputation for performance
Rarely seen in the wild…
Model clause reads records from a rowset, then allows
…rules to reference the rows and columns as array cells
Partition By allows for independent patterns across keys
Dimension By defines the indexing over rows, and can use
analytic functions
Measures specifies fields (or expressions) to output
Rules may update or insert rows, and optionally iterate
Order By defines output order
12. Algorithms and SQL 7 - Subquery Sequence
Brendan Furey, 2022 12
Analysing Performance of Algorithmic SQL and PL/SQL
Subqueries can reference not only tables and views, but…
Previous subqueries
Database functions, returning scalar or tabular outputs
This allows us to build queries in a sequence of subquery steps
This can be seen as a higher level algorithm in itself…
specifying procedurally rather than declaratively at a
higher level: the how not just the what
But CBO can override and rewrite the structure
Subqueries and Performance
CBO’s query transformation can improve performance or worsen it
Hints can often improve performance here, such as
Materialize – evaluate the subquery and save the resulting rowset
No_Query_Transformation – don’t transform the query
Sometimes helps to split a complex query that CBO is transforming badly, eg
Insert subquery output into a temporary table
Put subquery into a pipelined function
We can also manually transform, eg change Not Exists into explicit antijoins, as we’ll see later
13. Algorithms and SQL 7 – General Principles
Brendan Furey, 2022 13
Analysing Performance of Algorithmic SQL and PL/SQL
Process in batches, or sets, where possible
A process often has a startup cost plus a cost per row, so spread the startup cost
Also, different algorithms may be more efficient for processing a set of rows or 1 row
Avoid cursor loops when the rowset can be processed in a single query
Prune early, avoid continued processing of rows that will later be eliminated, if possible
Use where there is no efficient pure SQL algorithm, as in some network analysis problems
But ensure SQL is used effectively within the PL/SQL algorithm
Also can use to break a complex query into smaller sections via pipelined function/temp table
Only do this when CBO performs badly
PL/SQL Algorithms
SQL Algorithms
Use SQL algorithms that meet a specific requirement, within pure SQL
Join and group rowsets
Analytic functions for aggregation over a partition key within a rowset window
Match Recognize for pattern matching across rows
Recursive subqueries for traversing hierarchies
14. Network Analysis Problems
Brendan Furey, 2022 14
Network Analysis Problems (4 slides)
On shortest path and subnetwork grouping problems
Analysing Performance of Algorithmic SQL and PL/SQL
15. 3 Subnetworks – Demo Network
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 15
Network Analysis Problems
Undirected network
Find all paths from root
Find shortest paths from
root
Group all nodes by
subnetwork
19. Network Paths by SQL
Brendan Furey, 2022 19
Network Paths by SQL (7 slides)
Solving all- and shortest- path problems via pure SQL
Analysing Performance of Algorithmic SQL and PL/SQL
20. SQL for All Paths
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 20
Get Execution Plan using Marker
WITH paths (node_id, lev) AS (
SELECT &root_id_var, 0
FROM DUAL
UNION ALL
SELECT CASE WHEN lnk.node_id_fr = pth.node_id THEN lnk.node_id_to ELSE lnk.node_id_fr END,
pth.lev + 1
FROM paths pth
JOIN links lnk
ON (lnk.node_id_fr = pth.node_id OR lnk.node_id_to = pth.node_id)
) SEARCH DEPTH FIRST BY node_id SET line_no
CYCLE node_id SET cycle TO '*' DEFAULT ' '
SELECT /*+ gather_plan_statistics XPLAN_ALL_PATHS */
n.node_name,
Substr(LPad ('.', 1 + 2 * p.lev, '.') || p.node_id, 2) node,
p.lev
FROM paths p
JOIN nodes n
ON n.id = p.node_id
WHERE cycle = ' '
ORDER BY p.line_no
Recursive subquery
CYCLE clause on node_id
Hint gather_plan_statistics
with marker string
Exclude cycle rows from
output
EXEC Utils.W(Utils.Get_XPlan(p_sql_marker => 'XPLAN_ALL_PATHS'));
For tree networks each node has only one path from the root, and the SQL is efficient
Also efficient for small looped networks
For larger looped networks, finding all paths resource-intensive
Also for non-pure-SQL methods: Intrinsically hard
SQL
21. SQL for Shortest Paths - One Recursive Subquery
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 21
WITH paths (node_id, rnk, lev) AS (
SELECT &root_id_var, 1, 0
FROM DUAL
UNION ALL
SELECT CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to
ELSE l.node_id_fr END,
Rank () OVER (PARTITION BY CASE WHEN l.node_id_fr = p.node_id
THEN l.node_id_to
ELSE l.node_id_fr END
ORDER BY p.node_id),
p.lev + 1
FROM paths p
JOIN links l
ON p.node_id IN (l.node_id_fr, l.node_id_to)
WHERE p.rnk = 1
) SEARCH DEPTH FIRST BY node_id SET line_no
CYCLE node_id SET lp TO '*' DEFAULT ' '
, node_min_levs AS (
SELECT node_id,
Min (lev) KEEP (DENSE_RANK FIRST ORDER BY lev) lev,
Min (line_no) KEEP (DENSE_RANK FIRST ORDER BY lev) line_no
FROM paths
GROUP BY node_id
)
SELECT n.node_name,
Substr(LPad ('.', 1 + 2 * m.lev, '.') || m.node_id, 2) node,
m.lev lev
FROM node_min_levs m
JOIN nodes n
ON n.id = m.node_id
ORDER BY m.line_no
SQL
Extra field, rnk = rank of record for a given
node at each iteration, based on the prior node
id
At each iteration only the record of rank 1 is
joined to new links, avoiding duplication
Subquery, node_min_levs, selects the
preferred record of minimum length for each
node
22. SQL for Shortest Paths - One Recursive Subquery - Performance
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 22
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 161 |00:00:07.90 | 5032K| | |
| 1 | SORT ORDER BY | | 1 | 381G| 161 |00:00:07.90 | 5032K| 18432 | 18432 |
|* 2 | HASH JOIN | | 1 | 381G| 161 |00:00:07.90 | 5032K| 1449K| 1449K|
| 3 | TABLE ACCESS FULL | NODES | 1 | 161 | 161 |00:00:00.01 | 7 | | |
| 4 | VIEW | | 1 | 381G| 161 |00:00:07.90 | 5032K| | |
| 5 | SORT GROUP BY | | 1 | 381G| 161 |00:00:07.90 | 5032K| 31744 | 31744 |
| 6 | VIEW | | 1 | 381G| 220K|00:00:07.81 | 5032K| | |
| 7 | UNION ALL (RECURSIVE WITH) DEPTH FIRST| | 1 | | 220K|00:00:07.77 | 5032K| 19M| 1646K|
| 8 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | |
| 9 | WINDOW SORT | | 79 | 381G| 220K|00:00:00.72 | 57440 | 478K| 448K|
| 10 | NESTED LOOPS | | 79 | 381G| 220K|00:00:00.52 | 57440 | | |
| 11 | RECURSIVE WITH PUMP | | 79 | | 3590 |00:00:00.01 | 0 | | |
|* 12 | TABLE ACCESS FULL | LINKS | 3590 | 45 | 220K|00:00:00.70 | 57440 | | |
-------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("N"."ID"="M"."NODE_ID")
12 - filter(("P"."NODE_ID"="L"."NODE_ID_FR" OR "P"."NODE_ID"="L"."NODE_ID_TO"))
Execution Plan (Extract) – Bacon/small (161 node / 3,342 link network)
SQL solution can obtain the shortest paths efficiently for tree and smaller looped networks
In larger looped networks the number of paths overall can become extremely large
Recursive subquery discards all but one path to a given node at a given iteration…
But has no access to other paths reached at earlier iterations
And so may persist with longer paths that will be discarded in the later ranking subquery
One approach to mitigating is to do a truncated search to obtain some bounds for later query…
23. SQL for Shortest Paths – Two Recursive Subqueries, Part 1
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 23
WITH paths_truncated (node_id, lev, rn) AS (
SELECT &root_id_var, 0, 1
FROM DUAL
UNION ALL
SELECT CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to
ELSE l.node_id_fr END,
p.lev + 1,
Row_Number () OVER (PARTITION BY CASE WHEN l.node_id_fr = p.node_id
THEN l.node_id_to
ELSE l.node_id_fr END
ORDER BY p.node_id)
FROM paths_truncated p
JOIN links l
ON p.node_id IN (l.node_id_fr, l.node_id_to)
WHERE p.rn = 1
AND p.lev < &LEVMAX)
CYCLE node_id SET lp TO '*' DEFAULT ' '
, approx_best_paths AS (
SELECT node_id,
Max (lev) KEEP (DENSE_RANK FIRST ORDER BY lev) lev
FROM paths_truncated
GROUP BY node_id)
paths_truncated (recursive subquery)
approx_best_paths
Same subquery as in 1-recursion
Except…
Truncate recursion at iteration &LEVMAX
( I tried 5 and 10)
Gets minimum lev by node_id from
paths_truncated
…
Any paths to node_id longer in second recursion than found here can be
discarded
24. SQL for Shortest Paths – Two Recursive Subqueries, Part 2
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 24
), paths (node_id, lev, rn) AS (
SELECT &root_id_var, 0, 1
FROM DUAL
UNION ALL
SELECT CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to
ELSE l.node_id_fr END,
p.lev + 1,
Row_Number () OVER (PARTITION BY CASE WHEN l.node_id_fr = p.node_id
THEN l.node_id_to
ELSE l.node_id_fr END
ORDER BY p.node_id)
FROM paths p
JOIN links l
ON p.node_id IN (l.node_id_fr, l.node_id_to)
LEFT JOIN approx_best_paths b
ON b.node_id = CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to
ELSE l.node_id_fr END
WHERE p.rn = 1
AND p.lev < Nvl (b.lev, 1000000)
) SEARCH DEPTH FIRST BY node_id SET line_no CYCLE node_id SET lp TO '*' DEFAULT ' '
, node_min_levs AS (
SELECT node_id,
Min (lev) KEEP (DENSE_RANK FIRST ORDER BY lev) lev,
Min (line_no) KEEP (DENSE_RANK FIRST ORDER BY lev) line_no
FROM paths GROUP BY node_id)
SELECT n.node_name,
Substr(LPad ('.', 1 + 2 * m.lev, '.') || m.node_id, 2) node,
m.lev lev
FROM node_min_levs m
JOIN nodes n
ON n.id = m.node_id
ORDER BY m.line_no
paths (recursive subquery)
node_min_levs, main section
Same subquery as in 1-recursion
Except…
Outer-join approx_best_paths
…
Discard path if longer than found in
previous subquery
node_min_levs gets the minimum lev
by node_id
Along with the line_no…
To order by in main section
26. SQL for Shortest Paths - Performance - Results
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 26
One-recursive subquery ran for hours on top250 before being aborted, ok for small datasets
Two-recursive subqueries completed top250 in 796/1,663s for Truncate at 5/10
2-RS obtains a partial, approximative solution to enable early truncation of the paths
The use of a hard-coded iteration limit in the first subquery has obvious limitations
If it’s too low, the first subquery will provide too little information to optimize the second
If it’s too large then the approximative subquery itself will have too much work to do
We’ll see that using SQL within a PL/SQL algorithm will give better results…
Dataset
#Nodes
(all)
#Links
#Nodes
(sub)
Maxlev
#Secs
(1-RS)
Truncate
at
#Secs
(2-RS)
three_subnets 14 13 11 3 0.01 3 0.02
foreign_keys 289 319 47 5 0.01 5 0.01
brightkite 58,228 214,078 56,739 10 NA 5 559
bacon/small 161 3,342 161 5 8 5 0.1
bacon/top250 12,466 583,993 11,803 7 Aborted 5 796
bacon/top250 12,466 583,993 11,803 7 Aborted 10 1,663
27. Two Algorithms with Code Timing
Brendan Furey, 2022 27
Two Algorithms with Code Timing (7 slides)
Two PL/SQL network analysis algorithms with code timing and
performance analysis
Analysing Performance of Algorithmic SQL and PL/SQL
28. Two Algorithms
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 28
•Truncate the solution table, min_tree_links, and insert the root node at level 0
•Loop while records are inserted
• Insert a new node record at the next level:
• for every link that is connected to a node at the current level:
• that does not exist in the table for any prior level
• and does not appear at the next level for any other link with a higher ranked path
• Commit
• Increment level and inserts counter
• Exit when no records inserted
•Return the number of records inserted
•Truncate the solution table, node_roots
•Loop while a new root node is found
• Select a new root node id from nodes not in node_roots
• Exit loop when none found
• Call Ins_Min_Tree_Links to populate the solution table, min_tree_links, for the new root node
• Insert all nodes in min_tree_links into node_roots against the new root node
Min Pathfinder Algorithm
Subnetwork Grouper Algorithm
Code timing will show tuning opportunities in the initial implementation
Shortest paths are inserted at each iteration, and all inserted are visible to the future iterations
This avoids the inefficiency inherent in the pure SQL solutions
29. Code Timing - Ins_Min_Tree_Links
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 29
FUNCTION Ins_Min_Tree_Links(
p_root_node_id PLS_INTEGER)
RETURN PLS_INTEGER IS
l_lev PLS_INTEGER := 0;
l_ins PLS_INTEGER;
l_ins_tot PLS_INTEGER := 0;
l_ts_id PLS_INTEGER := Timer_Set.Construct('Ins_Min_Tree_Links: ' || p_root_node_id);
BEGIN
EXECUTE IMMEDIATE 'TRUNCATE TABLE min_tree_links';
INSERT INTO min_tree_links VALUES (p_root_node_id, '', 0);
LOOP
INSERT INTO min_tree_links
SELECT CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to
ELSE lnk.node_id_fr END,
Min (mlp_cur.node_id),
l_lev + 1
FROM min_tree_links mlp_cur
JOIN links lnk
ON (lnk.node_id_fr = mlp_cur.node_id OR lnk.node_id_to = mlp_cur.node_id)
LEFT JOIN min_tree_links mlp_pri
ON mlp_pri.node_id = CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to
ELSE lnk.node_id_fr END
WHERE mlp_pri.node_id IS NULL
AND mlp_cur.lev = l_lev
GROUP BY CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to
ELSE lnk.node_id_fr END;
l_ins := SQL%ROWCOUNT;
COMMIT;
l_ins_tot := l_ins_tot + l_ins;
Timer_Set.Increment_Time(l_ts_id, 'Level: ' || l_lev || ', nodes: ' || l_ins);
EXIT WHEN l_ins = 0;
l_lev := l_lev + 1;
END LOOP;
Utils.W(Timer_Set.Format_Results(l_ts_id));
RETURN l_ins_tot;
END Ins_Min_Tree_Links;
Construct timer set,
with root node in name
Time insert, with level
and rows in name
Write timer set
30. Code Timing - Ins_Min_Tree_Links - Results
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 30
Timer Set: Ins_Min_Tree_Links: 10001, Constructed at 30 Jul 2022 16:07:35, written at 16:11:01
==============================================================================================
Timer Elapsed CPU Calls Ela/Call CPU/Call
----------------------- ---------- ---------- ---------- ------------- -------------
Level: 0, nodes: 38 0.05 0.04 1 0.04700 0.04000
Level: 1, nodes: 5169 0.04 0.03 1 0.04200 0.03000
Level: 2, nodes: 202118 13.77 13.72 1 13.76500 13.72000
Level: 3, nodes: 358824 104.69 100.59 1 104.69100 100.59000
Level: 4, nodes: 100099 75.15 74.11 1 75.14900 74.11000
Level: 5, nodes: 11298 9.61 9.61 1 9.60600 9.61000
Level: 6, nodes: 1865 1.15 1.14 1 1.14700 1.14000
Level: 7, nodes: 421 0.29 0.30 1 0.28900 0.30000
Level: 8, nodes: 170 0.16 0.16 1 0.16200 0.16000
Level: 9, nodes: 39 0.10 0.09 1 0.09700 0.09000
Level: 10, nodes: 11 0.07 0.08 1 0.07000 0.08000
Level: 11, nodes: 7 0.07 0.08 1 0.07300 0.08000
Level: 12, nodes: 0 0.07 0.06 1 0.07200 0.06000
(Other) 0.39 0.39 1 0.39400 0.39000
----------------------- ---------- ---------- ---------- ------------- -------------
Total 205.60 200.40 14 14.68600 14.31429
----------------------- ---------- ---------- ---------- ------------- -------------
[Timer timed (per call in ms): Elapsed: 0.02061, CPU: 0.02245]
Results for Bacon/only_tv_v Dataset (680,060 node subnetwork - 744,374 node / 22,503,060 link total)
The results show a total elapsed time of 206 seconds
There is a timer for each iteration, showing CPU and elapsed times, with nodes processed
As you’d expect, the largest times correspond to the most nodes inserted…
and with time per node increasing as the solution table fills up
Each iteration corresponds to a single insert, we can get the execution plan…
32. Code Timing - Ins_Node_Roots
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 32
Code Timing Output
PROCEDURE Ins_Node_Roots IS
l_root_id PLS_INTEGER;
l_ins_tot PLS_INTEGER;
l_ts_id PLS_INTEGER := Timer_Set.Construct('Ins_Node_Roots');
l_suffix VARCHAR2(60);
BEGIN
EXECUTE IMMEDIATE 'TRUNCATE TABLE node_roots';
LOOP
BEGIN
SELECT id INTO l_root_id FROM nodes WHERE id NOT IN (SELECT node_id
FROM node_roots)
AND ROWNUM = 1;
EXCEPTION
WHEN NO_DATA_FOUND THEN
l_root_id := NULL;
END;
Timer_Set.Increment_Time(l_ts_id, 'SELECT id INTO l_root_id');
EXIT WHEN l_root_id IS NULL;
l_ins_tot := Ins_Min_Tree_Links(l_root_id);
l_suffix := CASE WHEN l_ins_tot = 0 THEN '(1 node)'
WHEN l_ins_tot = 1 THEN '(2 nodes)'
WHEN l_ins_tot = 2 THEN '(3 nodes)'
WHEN l_ins_tot < 40 THEN '(4-39 nodes)'
ELSE '(root node ' || l_root_id || ', size: ' || (l_ins_tot + 1)
|| ')'
END;
Timer_Set.Increment_Time(l_ts_id, 'Insert min_tree_links ' || l_suffix);
INSERT INTO node_roots tgt
SELECT node_id, l_root_id, lev FROM min_tree_links;
Timer_Set.Increment_Time(l_ts_id, 'Insert node_roots ' || l_suffix);
END LOOP;
Utils.W(Timer_Set.Format_Results(l_ts_id));
Procedure with Code Timing
Construct timer set
Time node selector query
Timer name suffix allows aggregation
by subnetwork size group
Time Ins_Min_Tree_Links by size
group
Time Insert node_roots by size group
Write timer set
33. Code Timing - Ins_Node_Roots - Results
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 33
Code Timing Output
Timer Set: Ins_Node_Roots, Constructed at 30 Jul 2022 16:15:48, written at 16:44:22
===================================================================================
Timer Elapsed CPU Calls Ela/Call CPU/Call
--------------------------------------------------- ---------- ---------- ---------- ------------- -------------
SELECT id INTO l_root_id 1517.43 1506.68 19642 0.07725 0.07671
Insert min_tree_links (root node 579, size: 680060) 122.95 120.31 1 122.94500 120.31000
Insert node_roots (root node 579, size: 680060) 4.10 4.05 1 4.10400 4.05000
Insert min_tree_links (4-39 nodes) 20.21 23.07 5317 0.00380 0.00434
Insert node_roots (4-39 nodes) 1.56 1.61 5317 0.00029 0.00030
Insert min_tree_links (root node 646, size: 58) 0.01 0.01 1 0.00800 0.01000
Insert node_roots (root node 646, size: 58) 0.00 0.00 1 0.00000 0.00000
Insert min_tree_links (3 nodes) 7.14 7.29 2091 0.00341 0.00349
Insert node_roots (3 nodes) 0.50 0.62 2091 0.00024 0.00030
Insert min_tree_links (1 node) 24.91 24.76 8659 0.00288 0.00286
Insert node_roots (1 node) 2.18 1.75 8659 0.00025 0.00020
Insert min_tree_links (2 nodes) 11.74 11.67 3539 0.00332 0.00330
Insert node_roots (2 nodes) 0.88 1.42 3539 0.00025 0.00040
...
(Other) 0.00 0.00 1 0.00100 0.00000
--------------------------------------------------- ---------- ---------- ---------- ------------- -------------
Total 1714.02 1703.66 58925 0.02909 0.02891
--------------------------------------------------- ---------- ---------- ---------- ------------- -------------
[Timer timed (per call in ms): Elapsed: 0.01282, CPU: 0.01282]
Results for Bacon/only_tv_v Dataset (744,374 nodes and 22,503,060 links)
The results show a total elapsed time of 1,714 seconds, 90% from the SELECT timer
To improve performance we need first to focus on that code section
8,659 calls were made for '(1 node)' suffix timers and 3,539 for the '(2 nodes)' ones
Call also corresponds to an instance of SELECT id INTO l_root_id, ~ about 26% of that line
We can insert these 1/2 node node_roots records in single inserts prior to main algorithm
34. Two Algorithms - Performance Considerations
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 34
It does this by storing the paths at each iteration, and excluding nodes already reached from
future iterations
At the same time, each iteration uses a single SQL insert with subquery to process in an
efficient set-based fashion
We will find the resulting queries themselves can be tuned using query transformation and hints
It thus benefits from its efficiency to identify the subnetworks
However, code timing identified two main areas in which a still more set-based approach could
improve performance:
Firstly, One and two-node subnetworks do an insert for each node
We could in fact insert all of these in a single set-based insert each, ahead of the main
algorithm for the larger subnetworks
Secondly, a root node selector query is executed for each subnetwork
We may be able to find a way of selection that does not execute this at each iteration
Min Pathfinder
Subnetwork Grouper
algorithm allows us to prune non-shortest paths early
algorithm uses Min Pathfinder within a higher level algorithm
SQL Tuning
35. Oracle Standard Profilers
Brendan Furey, 2022 35
Oracle Standard Profilers (2 slides)
Results from two standard Oracle profiling tools for the Subnetwork
Grouper procedure
Analysing Performance of Algorithmic SQL and PL/SQL
36. Flat Profiler
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 36
VAR RUN_ID NUMBER
DECLARE
l_result PLS_INTEGER;
BEGIN
l_result := DBMS_Profiler.Start_Profiler(
run_comment => 'Profile for Ins_Node_Roots',
run_number => :RUN_ID);
Shortest_Path_SQL_Base.Ins_Node_Roots;
l_result := DBMS_Profiler.Stop_Profiler;
END;
/
@....dprof_queries :RUN_ID
Calling Flat Profiler
Profiler data by time (PLSQL_PROFILER_DATA)
Seconds Calls Unit Line# Line Text
----------- -------- ------------------------- ------- ---------------------------------------------------------------------------
-----------------------------------
1789.829 19642 SHORTEST_PATH_SQL_BASE 85 SELECT id INTO l_root_id FROM nodes WHERE id NOT IN (SELECT node_id FROM
node_roots) AND ROWNUM = 1;
128.850 31374 SHORTEST_PATH_SQL_BASE 15 INSERT INTO min_tree_links
31.519 19641 SHORTEST_PATH_SQL_BASE 11 EXECUTE IMMEDIATE 'TRUNCATE TABLE min_tree_links';
7.318 19641 SHORTEST_PATH_SQL_BASE 93 INSERT INTO node_roots tgt
4.828 19641 SHORTEST_PATH_SQL_BASE 12 INSERT INTO min_tree_links VALUES (p_root_node_id, '', 0);
1.897 31374 SHORTEST_PATH_SQL_BASE 31 COMMIT;
0.071 31374 SHORTEST_PATH_SQL_BASE 30 l_ins := SQL%ROWCOUNT;
...
157179 rows selected.
Call to be profiled
Start…
…and stop profiler
Custom reporting script, passed run id
The line text is got by joining the system view all_source to the profiler package/line number
37. Hierarchical Profiler
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 37
VAR RUN_ID NUMBER
BEGIN
HProf_Utils.Start_Profiling;
Shortest_Path_SQL_Base.Ins_Node_Roots;
:RUN_ID := HProf_Utils.Stop_Profiling(
p_run_comment => 'Profile for Ins_Node_Roots',
p_filename => 'hp_ins_node_roots_&SUB..html');
END;
/
@....hprof_queries :RUN_ID
Calling Hierarchical Profiler
Profiler data by time (PLSQL_PROFILER_DATA)
Function tree Owner Module Inst. Subtree MicroS Function MicroS Calls
------------------------------------ ------------------ ------------------------- ------ -------------- --------------- -------
INS_NODE_ROOTS SHORTEST_PATH_SQL SHORTEST_PATH_SQL_BASE 1668506464 332444 1
__static_sql_exec_line85 SHORTEST_PATH_SQL SHORTEST_PATH_SQL_BASE 1490685875 1490685875 19642
INS_MIN_TREE_LINKS SHORTEST_PATH_SQL SHORTEST_PATH_SQL_BASE 169705305 1195428 19641
__static_sql_exec_line15 SHORTEST_PATH_SQL SHORTEST_PATH_SQL_BASE 141321652 141321652 31378
__dyn_sql_exec_line11 SHORTEST_PATH_SQL SHORTEST_PATH_SQL_BASE 20610292 9775994 19641
__plsql_vm 1 of 2 10834298 71027 19641
__anonymous_block 1 of 2 10763826 2971230 19642
IS_VPD_ENABLED SYS IS_VPD_ENABLED 1 of 2 6934407 395925 39284
__static_sql_exec_line22 SYS IS_VPD_ENABLED 1 of 2 6538482 6538482 39284
DICTIONARY_OBJ_OWNER SYS DICTIONARY_OBJ_OWNER 1 of 2 812866 812866 39284
DICTIONARY_OBJ_NAME SYS DICTIONARY_OBJ_NAME 1 of 2 45323 45323 39284
__static_sql_exec_line12 SHORTEST_PATH_SQL SHORTEST_PATH_SQL_BASE 4413730 4413730 19641
__static_sql_exec_line31 SHORTEST_PATH_SQL SHORTEST_PATH_SQL_BASE 2164203 2164203 31378
__static_sql_exec_line93 SHORTEST_PATH_SQL SHORTEST_PATH_SQL_BASE 7706832 7706832 19641
__dyn_sql_exec_line81 SHORTEST_PATH_SQL SHORTEST_PATH_SQL_BASE 76008 75449 1
__plsql_vm 2 of 2 559 4 1
__static_sql_exec_line700 SYS DBMS_HPROF 128 128 1
STOP_PROFILING LIB HPROF_UTILS 22 22 1
STOP_PROFILING SYS DBMS_HPROF 0 0 1
Custom reporting script, passed run id
Call to be profiled
Custom wrapper package around start
…and stop profiling
HTML results filename
38. Tuning 1 - SQL for Isolated Nodes
Brendan Furey, 2022 38
Tuning 1 - SQL for Isolated Nodes (5 slides)
Recap of join methods and types, then queries with antijoin
structures and hints
Analysing Performance of Algorithmic SQL and PL/SQL
39. SQL Join Definitions
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 39
Join Types
For each row in the outer data set that matches the single-table predicates, the database
retrieves all rows in the inner data set that satisfy the join predicate. If an index is available, then
the database can use it to access the inner data set by rowid
Hash Join - The database uses a hash join to join larger data sets
The optimizer uses the smaller of two data sets to build a hash table on the join key in memory,
using a deterministic hash function to specify the location in the hash table in which to store
each row. The database then scans the larger data set, probing the hash table to find the rows
that meet the join condition
Extracted from: SQL Tuning Guide, 21c
Antijoin
An antijoin is a join between two data sets that returns a row from the first set when a matching
row does not exist in the subquery data set.
Like a semijoin, an antijoin stops processing the subquery data set when the first match is
found. Unlike a semijoin, the antijoin only returns a row when no match is found
Nested Loops Join - Nested loops join an outer data set to an inner data set
Join Methods
40. SQL for Isolated Nodes: SQL 1 - Not Exists / Or
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 40
Execution Plan - Ran on only_tv_v dataset (744,374 nodes and 22,503,060 links)
INSERT INTO node_roots
SELECT nod.id, nod.id, 0
FROM nodes nod
WHERE NOT EXISTS (SELECT 1
FROM links lnk
WHERE lnk.node_id_fr = nod.id
OR lnk.node_id_to = nod.id);
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | | 0 |00:00:25.78 | 191K| 93127 |
| 1 | LOAD TABLE CONVENTIONAL | NODE_ROOTS | 1 | | 0 |00:00:25.78 | 191K| 93127 |
|* 2 | HASH JOIN ANTI | | 1 | 53174 | 8659 |00:00:25.73 | 176K| 93122 |
| 3 | INDEX FAST FULL SCAN | SYS_C0018310 | 1 | 744K| 744K|00:00:00.08 | 1461 | 0 |
| 4 | VIEW | VW_SQ_1 | 1 | 45M| 45M|00:00:22.86 | 174K| 93122 |
| 5 | UNION-ALL | | 1 | | 45M|00:00:15.93 | 174K| 93122 |
| 6 | TABLE ACCESS FULL | LINKS | 1 | 22M| 22M|00:00:02.61 | 87315 | 46561 |
| 7 | TABLE ACCESS FULL | LINKS | 1 | 22M| 22M|00:00:02.19 | 87315 | 46561 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("VW_COL_1"="NOD"."ID")
UNION ALL results in a single hash antijoin, with a probe table twice the size of links
What if we replaced the NOT EXISTS with explicit antijoins?...
All the nodes that are present only in the nodes table but not in the links table
Can be expressed in a single SQL statement for the insert
Query obtains the 8,659 isolated nodes in 21 seconds
S5: OR transformed into UNION ALL of two full links scans, S6/7
S4: View of 45M rows used as probe table in hash antijoin, S2…
S3: With scan of nodes unique index as the build table
41. SQL for Isolated Nodes: SQL 2 - Outer Joins Unhinted
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 41
Execution Plan
INSERT INTO node_roots
SELECT nod.id, nod.id, 0
FROM nodes nod
LEFT JOIN links lnk_f
ON lnk_f.node_id_fr = nod.id
LEFT JOIN links lnk_t
ON lnk_t.node_id_to = nod.id
WHERE lnk_f.node_id_fr IS NULL
AND lnk_t.node_id_fr IS NULL;
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | | 0 |00:00:12.48 | 191K| 93127 |
| 1 | LOAD TABLE CONVENTIONAL | NODE_ROOTS | 1 | | 0 |00:00:12.48 | 191K| 93127 |
|* 2 | HASH JOIN ANTI | | 1 | 532 | 8659 |00:00:12.43 | 176K| 93122 |
|* 3 | HASH JOIN ANTI | | 1 | 53174 | 57851 |00:00:08.41 | 88776 | 46561 |
| 4 | INDEX FAST FULL SCAN | SYS_C0018310 | 1 | 744K| 744K|00:00:00.07 | 1461 | 0 |
| 5 | TABLE ACCESS FULL | LINKS | 1 | 22M| 22M|00:00:02.58 | 87315 | 46561 |
| 6 | TABLE ACCESS FULL | LINKS | 1 | 22M| 22M|00:00:01.83 | 87315 | 46561 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("LNK_T"."NODE_ID_TO"="NOD"."ID")
3 - access("LNK_F"."NODE_ID_FR"="NOD"."ID")
Query obtains the 8,659 isolated nodes in 12 seconds
S4: An index scan of the unique index on nodes as the build table for
a hash antijoin, S3
S5: Full scan of the links table as the probe table
S2: Hash antijoin uses result set as the build table
S6: With another full scan of the links table as the second probe table
Convert NOT EXISTS into outer antijoins
Where the CBO in SQL-1 used a view/union and a single hash antijoin…
Two outer joins resulted in two hash antijoins, but with smaller probe tables and faster
How would this compare with a plan using nested loop joins?
42. SQL for Isolated Nodes: SQL 3 - Outer Joins Hinted
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 42
Execution Plan
INSERT INTO node_roots
SELECT
/*+gather_plan_statistics USE_NL (lnk_f) USE_NL (lnk_t)*/
nod.id, nod.id, 0
FROM nodes nod
LEFT JOIN links lnk_f
ON lnk_f.node_id_fr = nod.id
LEFT JOIN links lnk_t
ON lnk_t.node_id_to = nod.id
WHERE lnk_f.node_id_fr IS NULL
AND lnk_t.node_id_fr IS NULL;
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | | 0 |00:00:01.27 | 624K| 5 |
| 1 | LOAD TABLE CONVENTIONAL | NODE_ROOTS | 1 | | 0 |00:00:01.27 | 624K| 5 |
| 2 | NESTED LOOPS ANTI | | 1 | 532 | 8659 |00:00:00.89 | 622K| 0 |
| 3 | NESTED LOOPS ANTI | | 1 | 53174 | 57851 |00:00:01.04 | 506K| 0 |
| 4 | INDEX FAST FULL SCAN | SYS_C0018310 | 1 | 744K| 744K|00:00:00.11 | 1461 | 0 |
|* 5 | INDEX RANGE SCAN | LINKS_FR_N1 | 744K| 20M| 686K|00:00:00.83 | 505K| 0 |
|* 6 | INDEX RANGE SCAN | LINKS_TO_N1 | 57851 | 22M| 49192 |00:00:00.15 | 115K| 0 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("LNK_F"."NODE_ID_FR"="NOD"."ID")
6 - access("LNK_T"."NODE_ID_TO"="NOD"."ID")
Obtains the 8,659 isolated nodes in 1.5 seconds
S2, S3: Two nested loops antijoins
S4: Drives off full scan of the unique index on
nodes
S5: First join to From index on links
S6: Then join to To index on links
Hint to use nested loops joins: USE_NL (lnk_f) USE_NL (lnk_t)
Estimated rows for the two range scans are much higher than the actual rows returned. Let’s
look at it…
43. SQL for Isolated Nodes: SQL 3 - Nested Loops Analysis
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 43
Execution Plan
----------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
----------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | | 0 |
| 1 | LOAD TABLE CONVENTIONAL | NODE_ROOTS | 1 | | 0 |
| 2 | NESTED LOOPS ANTI | | 1 | 532 | 8659 |
| 3 | NESTED LOOPS ANTI | | 1 | 53174 | 57851 |
| 4 | INDEX FAST FULL SCAN | SYS_C0018310 | 1 | 744K| 744K|
|* 5 | INDEX RANGE SCAN | LINKS_FR_N1 | 744K| 20M| 686K|
|* 6 | INDEX RANGE SCAN | LINKS_TO_N1 | 57851 | 22M| 49192 |
----------------------------------------------------------------------------
E-Rows of 20M in S5 and 22M in S6 seem to assume getting all matches
And seem to be across all starts, usually it’s per start
But, as we saw in the definitions, antijoins get only the first match
As reflected in the A-Rows of 686K and 49,192
It is almost as though (to speculate):
The SQL engine is smart enough to know that, in the context of the anti-join, there is no
point in bringing back all the joining records when these will all be eliminated later
But that the CBO is not, and chooses a bad plan, when unhinted, for that reason
Anyway, it’s important to note that the CBO does not always choose the optimal join method
E-Rows Anomaly
44. Tuning 2 - SQL for Isolated Links
Brendan Furey, 2022 44
Tuning 2 - SQL for Isolated Links (8 slides)
Disastrous ‘Bitmap Or’ expansion, good and bad antijoin plans and
efficient group counting query
Analysing Performance of Algorithmic SQL and PL/SQL
45. SQL for Isolated Links: SQL 1 - Not Exists / 4-way Or
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 45
INSERT INTO node_roots
WITH isolated_links AS (
SELECT lnk.node_id_fr, lnk.node_id_to
FROM links lnk
WHERE NOT EXISTS (
SELECT 1
FROM links lnk_1
WHERE (lnk_1.node_id_fr = lnk.node_id_to OR
lnk_1.node_id_to = lnk.node_id_fr OR
lnk_1.node_id_fr = lnk.node_id_fr OR
lnk_1.node_id_to = lnk.node_id_to)
AND lnk_1.ROWID != lnk.ROWID ))
SELECT node_id_fr, node_id_fr, 0
FROM isolated_links
UNION
SELECT node_id_to, node_id_fr, 1
FROM isolated_links
NOT EXISTS links record matching: any of 4
conditions
And not the driving links record itself
For record passing the NOT EXISTS:
Add both from and to nodes into node_roots
Links that do not connect to any other links
From and to node is neither a from nor a to node in any other link
Ran on pre1950 dataset (134,131 nodes and 8,095,294 links)
Obtains the 425 isolated links in 4,103 seconds!
Let’s look at the execution plan…
46. SQL for Isolated Links: SQL 1 - Not Exists / 4-way Or - Execution Plan
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 46
Execution Plan (Extract)
S7: CBO transforms the OR conditions into a 4-section BITMAP OR
S6: Then a BITMAP CONVERSION TO ROWIDS and
S5: A links table access to filter the driving instance (S3)
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | | 0 |00:41:31.41 |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | 0 |00:41:31.41 |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6C4F_4F65443 | 1 | | 0 |00:41:31.37 |
|* 3 | FILTER | | 1 | | 425 |01:20:08.08 |
| 4 | TABLE ACCESS FULL | LINKS | 1 | 8095K| 8095K|00:00:01.57 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED | LINKS | 8095K| 1 | 8094K|01:08:05.02 |
|* 6 | BITMAP CONVERSION TO ROWIDS | | 8095K| | 8094K|01:07:45.55 |
| 7 | BITMAP OR | | 8095K| | 8094K|01:07:35.34 |
|* 8 | BITMAP CONVERSION FROM ROWIDS | | 8095K| | 7978K|00:09:42.09 |
|* 9 | INDEX RANGE SCAN | LINKS_TO_N1 | 8095K| | 3076M|00:08:56.46 |
|* 10 | BITMAP CONVERSION FROM ROWIDS | | 8095K| | 8086K|00:17:19.09 |
|* 11 | INDEX RANGE SCAN | LINKS_TO_N1 | 8095K| | 5926M|00:16:17.18 |
|* 12 | BITMAP CONVERSION FROM ROWIDS | | 8095K| | 7974K|00:09:27.91 |
|* 13 | INDEX RANGE SCAN | LINKS_FR_N1 | 8095K| | 3076M|00:08:41.13 |
|* 14 | BITMAP CONVERSION FROM ROWIDS | | 8095K| | 8086K|00:19:00.44 |
|* 15 | INDEX RANGE SCAN | LINKS_FR_N1 | 8095K| | 6232M|00:17:21.29 |
| 16 | LOAD TABLE CONVENTIONAL | NODE_ROOTS | 1 | | 0 |00:00:00.04 |
| 17 | HASH UNIQUE | | 1 | 16M| 850 |00:00:00.03 |
| 18 | UNION-ALL | | 1 | | 850 |00:00:00.01 |
| 19 | VIEW | | 1 | 8095K| 425 |00:00:00.01 |
| 20 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6C4F_4F65443 | 1 | 8095K| 425 |00:00:00.01 |
| 21 | VIEW | | 1 | 8095K| 425 |00:00:00.01 |
| 22 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6C4F_4F65443 | 1 | 8095K| 425 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------------------------
8095K starts, S5-S15
A-Rows very high
47. SQL for Isolated Links: SQL 2 - 4 Not Exists Subqueries
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 47
INSERT INTO node_roots
WITH isolated_links AS (
SELECT lnk.node_id_fr, lnk.node_id_to
FROM links lnk
WHERE NOT EXISTS (
SELECT 1
FROM links lnk_1
WHERE lnk_1.node_id_fr = lnk.node_id_fr
AND lnk_1.ROWID != lnk.ROWID)
AND NOT EXISTS (
SELECT 1
FROM links lnk_2
WHERE lnk_2.node_id_to = lnk.node_id_to
AND lnk_2.ROWID != lnk.ROWID)
AND NOT EXISTS (
SELECT 1
FROM links lnk_3
WHERE (lnk_3.node_id_fr = lnk.node_id_to)
AND lnk_3.ROWID != lnk.ROWID)
AND NOT EXISTS (
SELECT 1
FROM links lnk_4
WHERE (lnk_4.node_id_to = lnk.node_id_fr)
AND lnk_4.ROWID != lnk.ROWID))
SELECT node_id_fr, node_id_fr, 0
FROM isolated_links
UNION
SELECT node_id_to, node_id_fr, 1
FROM isolated_links
Split the NOT EXISTS with 4 conditions into…
A NOT EXISTS for each condition, replicating…
…the ‘not the driving links record’ condition
Obtains the 425 isolated links in 20 seconds, much faster!
Let’s look at the execution plan…
48. SQL for Isolated Links: SQL 2 - 4 Not Exists Subqueries - Execution Plan
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 48
Execution Plan (Extract)
S9: Plan starts with a hash antijoin on full scans of links…
S7,5,3: Then a sequence of hash right antijoins on result sets to full scans of links
…where right means the build table/probe table choice is reversed from the default
…making the build table the (smaller) result set
Note that the A-Rows drops rapidly from 116K as the sequence progresses, down to 425 (S3)
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | | 0 |00:00:12.78 |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | 0 |00:00:12.78 |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6C1D_4F65443 | 1 | | 0 |00:00:12.77 |
|* 3 | HASH JOIN RIGHT ANTI | | 1 | 8095K| 425 |00:00:13.60 |
| 4 | TABLE ACCESS FULL | LINKS | 1 | 8095K| 8095K|00:00:00.69 |
|* 5 | HASH JOIN RIGHT ANTI | | 1 | 8095K| 484 |00:00:09.94 |
| 6 | TABLE ACCESS FULL | LINKS | 1 | 8095K| 8095K|00:00:00.59 |
|* 7 | HASH JOIN RIGHT ANTI | | 1 | 8095K| 4196 |00:00:05.59 |
| 8 | TABLE ACCESS FULL | LINKS | 1 | 8095K| 8095K|00:00:00.58 |
|* 9 | HASH JOIN ANTI | | 1 | 8095K| 116K|00:00:04.99 |
| 10 | TABLE ACCESS FULL | LINKS | 1 | 8095K| 8095K|00:00:00.59 |
| 11 | TABLE ACCESS FULL | LINKS | 1 | 8095K| 8095K|00:00:00.55 |
| 12 | LOAD TABLE CONVENTIONAL | NODE_ROOTS | 1 | | 0 |00:00:00.02 |
| 13 | HASH UNIQUE | | 1 | 16M| 850 |00:00:00.01 |
| 14 | UNION-ALL | | 1 | | 850 |00:00:00.01 |
| 15 | VIEW | | 1 | 8095K| 425 |00:00:00.01 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6C1D_4F65443 | 1 | 8095K| 425 |00:00:00.01 |
| 17 | VIEW | | 1 | 8095K| 425 |00:00:00.01 |
| 18 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6C1D_4F65443 | 1 | 8095K| 425 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------------------------
49. SQL for Isolated Links: SQL 3 - 4 Outer Joins
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 49
INSERT INTO node_roots
WITH isolated_links AS (
SELECT lnk.node_id_fr, lnk.node_id_to
FROM links lnk
LEFT JOIN links lnk_1
ON (lnk_1.node_id_fr = lnk.node_id_fr
AND lnk_1.ROWID != lnk.ROWID)
LEFT JOIN links lnk_2
ON (lnk_2.node_id_fr = lnk.node_id_to
AND lnk_2.ROWID != lnk.ROWID)
LEFT JOIN links lnk_3
ON (lnk_3.node_id_to = lnk.node_id_fr
AND lnk_3.ROWID != lnk.ROWID)
LEFT JOIN links lnk_4
ON (lnk_4.node_id_to = lnk.node_id_to
AND lnk_4.ROWID != lnk.ROWID)
WHERE lnk_1.node_id_fr IS NULL
AND lnk_2.node_id_fr IS NULL
AND lnk_3.node_id_to IS NULL
AND lnk_4.node_id_to IS NULL
)
SELECT node_id_fr, node_id_fr, 0
FROM isolated_links
UNION
SELECT node_id_to, node_id_fr, 1
FROM isolated_links
Replace each NOT EXISTS with an outer antijoin
This worked well for isolated nodes, where the plan
used hash antijoin…
Almost halved the time compared with NOT EXISTS
Obtains the 425 isolated links in 1,259 seconds, much slower!
Let’s look at the execution plan…
51. SQL for Isolated Links: SQL 4 - Group Counting
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 51
INSERT INTO node_roots
WITH all_nodes AS (
SELECT node_id_fr node_id, 'F' tp
FROM links
UNION ALL
SELECT node_id_to, 'T'
FROM links
), unique_nodes AS (
SELECT node_id, Max(tp) tp
FROM all_nodes
GROUP BY node_id
HAVING COUNT(*) = 1
), isolated_links AS (
SELECT lnk.node_id_fr, lnk.node_id_to
FROM links lnk
JOIN unique_nodes frn
ON frn.node_id = lnk.node_id_fr
AND frn.tp = 'F'
JOIN unique_nodes ton
ON ton.node_id = lnk.node_id_to
AND ton.tp = 'T'
)
SELECT node_id_fr, node_id_fr, 0
FROM isolated_links
UNION ALL
SELECT node_id_to, node_id_fr, 1
FROM isolated_links
all_nodes:
Gets all node instances with a type of F(rom) or T(o)
unique_nodes:
Selects from all_nodes the nodes having exactly one
instance, along with its type
isolated_links:
Selects all links and inner-joins them to unique_nodes on
both ends
main section:
Adds both nodes with from node as root
Re-define the logic for an isolated link as
Its from and to node both appear in exactly one link
Avoids the expensive self-join of links in favour of a group counting query
Obtains the 425 isolated links in 1.5 seconds! Let’s look at the execution plan…
52. SQL for Isolated Links: SQL 4 - Group Counting - Execution Plan
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 52
Execution Plan (Extract)
The plan shows two LOAD AS SELECTs
The first does a HASH GROUP BY, S4, on a UNION ALL of full scans on links; most of the time goes here
The filter step, S3, shows only 1,797 rows, making the rest of the query very fast – early pruning!
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | | 0 |00:00:01.83 |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | 0 |00:00:01.83 |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6C20_4F65443 | 1 | | 0 |00:00:01.82 |
|* 3 | FILTER | | 1 | | 1797 |00:00:01.91 |
| 4 | HASH GROUP BY | | 1 | 26 | 132K|00:00:01.82 |
| 5 | VIEW | | 1 | 16M| 16M|00:00:00.36 |
| 6 | UNION-ALL | | 1 | | 16M|00:00:00.34 |
| 7 | TABLE ACCESS FULL | LINKS | 1 | 8095K| 8095K|00:00:00.16 |
| 8 | TABLE ACCESS FULL | LINKS | 1 | 8095K| 8095K|00:00:00.15 |
| 9 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6C21_4F65443 | 1 | | 0 |00:00:00.01 |
|* 10 | HASH JOIN | | 1 | 1 | 425 |00:00:00.01 |
|* 11 | VIEW | | 1 | 26 | 901 |00:00:00.01 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6C20_4F65443 | 1 | 26 | 1797 |00:00:00.01 |
| 13 | NESTED LOOPS | | 1 | 1685 | 896 |00:00:00.01 |
| 14 | NESTED LOOPS | | 1 | 1690 | 896 |00:00:00.01 |
|* 15 | VIEW | | 1 | 26 | 896 |00:00:00.01 |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6C20_4F65443 | 1 | 26 | 1797 |00:00:00.01 |
|* 17 | INDEX RANGE SCAN | LINKS_TO_N1 | 896 | 65 | 896 |00:00:00.01 |
| 18 | TABLE ACCESS BY INDEX ROWID | LINKS | 896 | 65 | 896 |00:00:00.01 |
| 19 | LOAD TABLE CONVENTIONAL | NODE_ROOTS | 1 | | 0 |00:00:00.01 |
| 20 | UNION-ALL | | 1 | | 850 |00:00:00.01 |
| 21 | VIEW | | 1 | 1 | 425 |00:00:00.01 |
| 22 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6C21_4F65443 | 1 | 1 | 425 |00:00:00.01 |
| 23 | VIEW | | 1 | 1 | 425 |00:00:00.01 |
| 24 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6C21_4F65443 | 1 | 1 | 425 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------------------------
53. Tuning 3 - SQL for Root Node Selector
Brendan Furey, 2022 53
Tuning 3 - SQL for Root Node Selector (4 slides)
Code timing several methods for root node selection
Analysing Performance of Algorithmic SQL and PL/SQL
54. SQL for Root Node Selector: Method 0 - Select from Unused Nodes
(Unordered)
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 54
SELECT id INTO l_root_id
FROM nodes
WHERE id NOT IN (SELECT node_id FROM node_roots)
AND ROWNUM = 1
Code timing showed root node selection took 90% of the time on the Bacon/only_tv_v dataset
(744,374 nodes and 22,503,060 links)
Execution plan shows a nested loops antijoin from the nodes index to the root nodes index
We’ll try two variants with different queries and ordering added, then try a different approach
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.37 | 42508 |
|* 1 | COUNT STOPKEY | | 1 | | 0 |00:00:00.37 | 42508 |
|* 2 | FILTER | | 1 | | 0 |00:00:00.37 | 42508 |
| 3 | NESTED LOOPS ANTI SNA| | 1 | 20 | 0 |00:00:00.36 | 40791 |
| 4 | INDEX FAST FULL SCAN| SYS_C0018310 | 1 | 520 | 744K|00:00:00.09 | 1460 |
|* 5 | INDEX UNIQUE SCAN | NODE_ROOTS_N1 | 744K| 714K| 744K|00:00:00.23 | 39331 |
|* 6 | TABLE ACCESS FULL | NODE_ROOTS | 1 | 1 | 0 |00:00:00.01 | 1717 |
---------------------------------------------------------------------------------------------------
Execution Plan (Extract)
Root
Selection
ms/Call %Total
Non-Root
Selection
%Total Total
303 41 58 221 42 524
Elapsed Times
The base method, with no ordering, took 303
seconds
55. SQL for Root Node Selector: Method 1 - Select from Unused Nodes
(Minimum Id)
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 55
SELECT Min(id) INTO l_root_id
FROM nodes WHERE id NOT IN (SELECT node_id FROM node_roots)
The first ordering query takes a Min(id) from nodes not in the solution table
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.50 | 3178 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.50 | 3178 | | | |
|* 2 | HASH JOIN RIGHT ANTI NA| | 1 | 28678 | 0 |00:00:00.50 | 3178 | 37M| 6400K| 30M (0)|
| 3 | TABLE ACCESS FULL | NODE_ROOTS | 1 | 715K| 744K|00:00:00.06 | 1717 | | | |
| 4 | INDEX FAST FULL SCAN | SYS_C0018310 | 1 | 744K| 744K|00:00:00.08 | 1461 | | | |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"="NODE_ID")
Execution Plan
Root
Selection
ms/Call %Total
Non-Root
Selection
%Total Total
2,046 275 92 208 8 2,233
Elapsed Times
SQL
The first ordering method, took 2,046 seconds
This is nearly 7 times slower than the base, unordered method
56. SQL for Root Node Selector: Method 2 - Select from Unused Nodes
(Ordered by Id, ROWNUM = 1)
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 56
SELECT id INTO l_root_id
FROM (SELECT id FROM nodes WHERE id NOT IN (
SELECT node_id FROM node_roots) ORDER BY 1
)
WHERE ROWNUM = 1
The second ordering query uses a ROWNUM = 1 on an ordered subquery from nodes not in
the solution table
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.42 | 26499 |
|* 1 | COUNT STOPKEY | | 1 | | 0 |00:00:00.42 | 26499 |
| 2 | VIEW | | 1 | 1 | 0 |00:00:00.42 | 26499 |
|* 3 | FILTER | | 1 | | 0 |00:00:00.42 | 26499 |
| 4 | NESTED LOOPS ANTI SNA| | 1 | 20 | 0 |00:00:00.41 | 24782 |
| 5 | INDEX FULL SCAN | SYS_C0018310 | 1 | 744K| 744K|00:00:00.10 | 1398 |
|* 6 | INDEX UNIQUE SCAN | NODE_ROOTS_N1 | 744K| 688K| 744K|00:00:00.26 | 23384 |
|* 7 | TABLE ACCESS FULL | NODE_ROOTS | 1 | 1 | 0 |00:00:00.01 | 1717 |
----------------------------------------------------------------------------------------------------
Execution Plan (Steps)
Root
Selection
ms/Call %Total
Non-Root
Selection
%Total Total
289 39 60 193 40 482
Elapsed Times
SQL
Predicate Information
(identified by operation id):
-----------------------------
1 - filter(ROWNUM=1)
3 - filter( IS NULL)
6 - access("ID"="NODE_ID")
7 - filter("NODE_ID" IS NULL)
(Predicates)
The second ordering method took 289 seconds
This is 7 times faster than the first ordering method
and slightly faster than the base, unordered
57. SQL for Root Node Selector: Method 3 - Fetch from Cursor (Ordered by
Id), Check Unused
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 57
CURSOR c_roots IS
SELECT id
FROM nodes
ORDER BY 1;
OPEN c_roots;
FETCH c_roots INTO l_root_id
The ordering query is opened once as a cursor, and fetched for each new subnetwork
An existence check is made against the node_roots table, if present we skip to the next fetch
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 |
| 1 | INDEX FULL SCAN | SYS_C0018310 | 1 | 744K| 1 |00:00:00.01 | 3 |
-------------------------------------------------------------------------------------------
Cursor Execution Plan
Cursor SQL
SELECT 1 INTO l_dummy
FROM node_roots
WHERE node_id = l_root_id
Existence Check SQL Existence Check Execution Plan
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 |
|* 1 | INDEX UNIQUE SCAN| NODE_ROOTS_N1 | 1 | 1 | 1 |00:00:00.01 | 3 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("NODE_ID"=:B1)
Elapsed Times
Root
Selection
ms/Call %Total
Non-Root
Selection
%Total Total
67 11 27 185 73 252
We get the root selection time by adding up multiple code timing lines for cursor and check SQL
This third ordering method took 67 seconds
> 4 times faster
than next best
58. Tuning Results
Brendan Furey, 2022 58
Tuning Results (2 slides)
Code timing results for one dataset and before and after results for
Subnetwork Grouper for all
Analysing Performance of Algorithmic SQL and PL/SQL
59. Code Timing - Ins_Node_Roots - Results on Bacon/only_tv_v after Tuning
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 59
Code Timing Output
Timer Set: Ins_Node_Roots, Constructed at 30 Jul 2022 17:38:25, written at 17:42:37
===================================================================================
Timer Elapsed CPU Calls Ela/Call CPU/Call
------------------------------------------------- ---------- ---------- ---------- ------------- -------------
Insert isolated nodes 3: 8659 1.24 1.22 1 1.23700 1.22000
Insert isolated links 5: 7078 5.59 5.30 1 5.59100 5.30000
OPEN c_roots 0.19 0.20 1 0.19000 0.20000
Count nodes 0.01 0.00 1 0.01400 0.00000
FETCH c_roots (first) 0.00 0.00 1 0.00000 0.00000
SELECT 1 INTO l_dummy: Not found 0.70 0.84 7443 0.00009 0.00011
Insert min_tree_links (root node 1, size: 680060) 142.60 137.63 1 142.59600 137.63000
Insert node_roots (root node 1, size: 680060) 3.68 3.64 1 3.68100 3.64000
FETCH c_roots (remaining) 28.60 26.67 664224 0.00004 0.00004
SELECT 1 INTO l_dummy: Found 37.05 37.41 656782 0.00006 0.00006
Insert min_tree_links (3 nodes) 7.44 6.82 2091 0.00356 0.00326
Insert node_roots (3 nodes) 0.53 0.37 2091 0.00025 0.00018
Insert min_tree_links (4-39 nodes) 21.90 20.15 5317 0.00412 0.00379
Insert node_roots (4-39 nodes) 1.67 1.39 5317 0.00031 0.00026
Insert min_tree_links (root node 332, size: 52) 0.01 0.00 1 0.00900 0.00000
Insert node_roots (root node 332, size: 52) 0.00 0.00 1 0.00100 0.00000
...
(Other) 0.00 0.00 1 0.00100 0.00000
------------------------------------------------- ---------- ---------- ---------- ------------- -------------
Total 251.67 241.98 1343341 0.00019 0.00018
------------------------------------------------- ---------- ---------- ---------- ------------- -------------
[Timer timed (per call in ms): Elapsed: 0.00935, CPU: 0.00935]
The total time has come down from 1714 seconds to 252 seconds, a reduction factor of 7
The largest contribution is now from the timer Insert min_tree_links (root node 1, size: 680060)
The results show the additional pre-insert steps, taking 1 and 6 seconds
We also see the new cursor fetch step, and existence query, taking 29 and 37 seconds
60. Ins_Node_Roots - Performance - Results before/after Tuning
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 60
Dataset #Nodes #Links #Subnetworks #Maxlev Base Ela(s) Tuned Ela(s)
three_subnets 14 13 3 3 0.07 0.5
foreign_keys
289 319 43 5 0.2 0.6
brightkite 58,228 214,078 547 10 7 7
bacon/small 161 3,342 1 5 0.1 0.5
bacon/top250 12,466 583,993 15 6 1.9 4.2
bacon/pre1950 134,131 8,095,294 2,432 13 85 61
bacon/only_tv_v 744,374 22,503,060 12,198 11 1,714 252
bacon/no_tv_v 2,386,567 87,866,033 55,276 10 16,108 2,081
bacon/post1950 2,696,175 101,597,227 60,544 10 19,736 2,930
bacon/full 2,800,309 109,262,592 62,557 10 20,631 3,756
The tuned procedure is between 5.5 and 7.7 times faster on the four largest datasets
61. Conclusion
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 61
SQL
Be aware of the built-in SQL algorithms at different levels
Understand the use of subquery sequencing in logical query design
Understand how queries can be transformed, and how performance may be affected
By the CBO, and by manual rewriting
Including logical or physical splitting of complex queries
Understand the use of hints to affect the choice of algorithms the CBO makes
Use execution plans to analyse SQL performance
PL/SQL
Use PL/SQL algorithms when there isn’t an appropriate SQL built-in equivalent
But use SQL as fully as possible within these algorithms, in particular to process data in sets
Be familiar with the Oracle standard profilers, and the possibilities offered by custom code
timing
For more detail
See my blog and GitHub project…
62. References
Brendan Furey, 2022 Analysing Performance of Algorithmic SQL and PL/SQL 62
1. Algorithm, Computer Hope, March 2021
2. Declarative Language, Britannia.com, Undated
3. SQL Tuning Guide, 21c
4. Shortest Path Analysis of Large Networks by SQL and PL/SQL: Blog, Brendan Furey, August
2022
5. SQL and PL/SQL for Shortest Path Problems: GitHub, Brendan Furey, August 2022
6. Timer_Set - Oracle PL/SQL code timing module: GitHub, Brendan Furey, January 2019
7. Friendship network of Brightkite users, Jure Leskovec, Stanford University, Undated
8. Bacon Numbers Datasets, Oberlin College, December 2016
9. SQL for Shortest Path Problems, Brendan Furey, April 2015
10. SQL for Shortest Path Problems 2: A Branch and Bound Approach, Brendan Furey, May 2015
11. PL/SQL Pipelined Function for Network Analysis, Brendan Furey, May 2015
12. PL/SQL Profiling 1: Overview, Brendan Furey, June 2020