1. Shortest Path Efficiency Analysis - Logic Programming
Suraj Nair
September 6, 2015
Abstract
Generally, this paper aims to study different implementations of shortest path algorithms and determine
the various benefits of each implemention. Specifically, in this report we will examine implementations of
Dijkstra’s algorithm for undirected and directed weighted graphs through logic programming as well as
through standard graph theory. We will aim to show that the logic programming implementation, while using
less memory, is actually slower and has less capabilities than the standard graph theory implementation. Due
to the availability of compute space today, the memory benefits of the logic programming implementation are
not nearly as valuable as the speed and range of capabilities of the graph theory implementation, and we can
conclude that for most applications, the graph theory implementation is superior.
Method
To find these results, we developed random graphs of a user specified size. For each pair of nodes in the
graph there is a 50% chance of there existing a edge between those nodes, and if such an edge does exist, it is
given a random weight between 0 and 50. Then, using implementations of Dijkstra’s algorithm in Java and
in Prolog, we solve for the shortest paths from the first node to every other node, validate that the answers
are correct, and collect data regarding the time and space usage of each implementation.
Dataset
We begin with a dataset with the following structure:
## 'data.frame': 51 obs. of 9 variables:
## $ Number.of.Nodes : int 10 100 100 100 100 100 500 500 500 500 ...
## $ logic_cpu_start : int 208 235 319 400 440 482 722 2474 4172 5873 ...
## $ logic_cpu_end : int 209 255 341 419 463 507 2277 3960 5661 7370 ...
## $ logic_wall_start: int 703557 826379 1070444 1363390 1435559 1508727 1761770 2001670 2239261 25210
## $ logic_wall_end : int 703568 826435 1070516 1363456 1435633 1508812 1763513 2003329 2240907 25227
## $ graph_cpu : int 20 105 104 103 104 107 579 460 546 545 ...
## $ graph_wall : int 26 114 112 112 112 116 594 477 562 560 ...
## $ logic_mem : int 10560 725192 842376 691888 699448 679016 16565232 14614016 80080 16664696 .
## $ graph_mem : int 309344 23051368 23222264 22769592 23289224 23091200 1009160792 437351400 90
After processing and making the some of the columns more concise, we end up with a table with the following
fields representing memory usage, wall time, and cpu time for each implementation for each size graph:
## [1] "Number.of.Nodes" "logic_mem" "graph_mem" "logic_wall"
## [5] "graph_wall" "logic_cpu" "graph_cpu"
1
2. Analysis
Now that we have clean data, we can begin our analysis. We will begin by looking at how memory usage
scales for each implementation.
Memory
10 100 200 300 400 500 600 700 800 900 1000
0e+004e+088e+08
Comparing Memory Usage
Number of Nodes
MemoryUsageinBytes
Graph Theory
Logic Programming
Since the graphs are random, and have varying number of edges, the memory usage is not perfectly aligned
with the number of nodes, however we can clearly see the difference between the two implementations. For
the graph theory implementation, we see a roughly linear growth with the number of nodes, which is to be
expected since for each node we need to create a node object as well as an edge object for each edge.
On the other hand, the Prolog implementation implementation stays roughly constant, because the entire
graph is represented as a set of rules, and no objects are created. Thus, the logic programming implementation
consistently uses less memory.
2
3. 0 200 600 1000
0e+002e+074e+076e+07
Logic Programming
Number of Nodes
MemoryinBytes
0 200 600 1000
0.0e+006.0e+081.2e+09
Graph Theory
Number of Nodes
MemoryinBytes
Here we can see a scatterplot of the memory usage for each of the nodes. This gives us a more clear picture
of how the memory usage of each implementation scales. Comparing the slopes of each of the lines of best fit,
we can see that the graph theory implementation uses approximately 23 times more memory than the logic
programming implementation.
3
4. Timing
10 100 200 300 400 500 600 700 800 900 1000
02000600010000 Comparing Runtime (Wall−Time)
Number of Nodes
TimeinMilliseconds
Graph Theory
Logic Programming
10 100 200 300 400 500 600 700 800 900 1000
02000600010000
Comparing Runtime (CPU−Time)
Number of Nodes
TimeinMilliseconds
Graph Theory
Logic Programming
4
5. The above two graphs illustrate how the time complexity of each algorithm scales with the size of the graph.
Since the time spent reading in the graph is insignificant compared to the time required to compute the
shortest paths, we find that the CPU time and wall time are almost identical. Furthermore, we see that
as the number of nodes increases, the logic programming implementation is slower than the graph theory
implementation.
0 200 400 600 800 1000
06000
Logic Programming Wall Time
Number of Nodes
TimeinMilliseconds
0 200 400 600 800 1000
01500
Graph Theory Wall Time
Number of Nodes
TimeinMilliseconds
0 200 400 600 800 1000
06000
Logic Programming CPU Time
Number of Nodes
TimeinMilliseconds
0 200 400 600 800 1000
01000
Graph Theory CPU Time
Number of Nodes
TimeinMilliseconds
Here we can see a scatterplot of timing for each of the number of nodes. This allows us to compare exactly
the speed difference between each of the algorithms. Comparing the slopes of each of the lines of best fit, we
can see that the logic programming implementation uses approximately 8.27 times more wall time than the
graph theory implementation and 8.6 times more cpu time.
Upon a closer inspection of each of the methods, it becomes clear that the reason for the speed difference is
that the Graph Theory implementation utilizes a binary heap, while the Prolog implementation finds the new
closest node to the start by doing a breadth first search from the start node to find the closest unassigned
node, then assigns it as found. Therefore, if we have a graph of N nodes, then in the worst case scenario we
have to explore approximately N + (N-1) + (N-2) . . . = (N)(N-1)/2 paths. However this situation can only
occur if every node is connected to every other node, and one direct path through all the nodes is weighted
substantially less than all of the other path. In practice this method of finding the shortest unassigned node is
generally fast and since the number of paths which need to be explored is the sum of the number of adjacent
unassigned nodes for each assigned node, most graphs in real world applications will not require searching
too many edges to find the closest node.
Unlike the Logic Programming implementation, the Graph Theory implementation uses a binary heap which
is represented as an array. Thus all operations take either logarithmic or constant time. Furthermore, it
is a stable heap so it supports changing the priority of a key withing the heap directly. Ultimately, this
makes the Graph Theory implementation faster, especially for larger graphs, as is evident from the previously
displayed timing data. Additionally, it explains the scaling difference we see in the graphs, where the Graph
Theory implementation scales at a linearithmic rate, while the Logic Programming implementation scales at
approximately quadratic rate.
5
6. Implementing A Binary Heap in Prolog
To determine whether the Logic Programming implementation can be optimized to operate at speed close to
the Graph Theory implementation, we attempted to implement a binary heap in Prolog. However, since
Prolog stores a heap through a linked list, not an array, the functions for modifing the heap are less efficient.
In fact, in the documentation, it is specifically stated that the delete from heap rule is extremely inefficient.
Below we can see the average amount of time it takes to retreive the shortest node in Prolog with and without
the heap.
500 300 100
With Heap
Without Heap
Number of Nodes
TimeinSeconds
0e+002e−044e−04
Additionally, the heap is unstable, so to change the shortest path to a node, one needs to delete the Priority-
Key pair and add it with the new priority, so using a heap with Prolog not only has a less efficient call to get
the smallest value, it also makes that call more often. Therefore, we use the implementation without the
heap.
Real World Applications
Now let us examine the difference between these algorithms when applied to a real world example. We
will be using the Origin and Destination Survey Data for airlines from the United States Department of
Transportation Database. Based on this data, we will construct a directed graph with a node for each of the
402 airports in the data and all of the edges corresponding to real world flight info. The weights of each edge
will be the distance in miles between each of the airports. We begin with data in the following format:
## data.ORIGIN_AIRPORT_ID data.DEST_AIRPORT_ID data.NONSTOP_MILES
## Min. :10135 Min. :10135 Min. : 39
## 1st Qu.:11278 1st Qu.:11274 1st Qu.: 691
## Median :12451 Median :12451 Median :1110
## Mean :12705 Mean :12693 Mean :1311
## 3rd Qu.:14122 3rd Qu.:14113 3rd Qu.:1741
## Max. :16218 Max. :16218 Max. :8061
After creating a directed graph of flight routes from this data, we used both the logic programming and graph
theory implementations of Dijkstra’s algorithm to find the shortest path from a single airport to all other
airports. Below one can see the time used by each implementation.
6
7. Graph Theory Logic Programming
Implementation
TimeinMilliseconds
0100300500
Conclusions
From the data, we can see that in general, the Graph Theory implementation when implemented with a
binary heap, is several times faster than the logic programming approach, and thus is the preferred choice for
shortest path implementations in which speed is of the greatest importance. Additionally, the graph theory
implementations uses objects, which while they do require more memory, have the extended capability of
associating as many features as needed with edges and vertices, which in practice is especially important,
such as in the case where an edge has multiple criteria contributing to total cost.
While these conclusions seem straightforward enough, it is worth noting that there a certainly some situations
in which it would be easier and faster to utilize the logic programming implementation. Specifically, when
dealing with a knowledge base being stored as an ontology or a similar format, there is a distinct advantage
to utilizing the logic programming implementation. This is that the the user defined properties and hierarchal
object structure of a knowledge base stored as an ontology translates directly into a set of facts and rules for a
logic program. Specifically, the Data Property assertions within an ontology relate individuals to literals and
can be used as facts, while the Object Property assertions define relationships between individuals and other
individuals and can be used as rules. As a result, logic programming works seamlessly with these sorts of
data structures, while standard graph theory and other methods would require parsing the data, likely from
a XML/RDF format, and creating a new data structure, which for large knowledge bases, would take quite a
bit of time. Thus, we can see that there are applications, such a information clustering applications where we
are determining the similarity of concepts based on how many properties connect them, both directly and
indirectly, where we may want to use the logic programming implementation of Dijkstra’s algorithm.
7
8. References
Algorithms, 4th edition by Robert Sedgewick and Kevin Wayne,
Addison-Wesley Professional, 2011, ISBN 0-321-57351-X.
http://algs4.cs.princeton.edu
United States Department of Transportation,
Airline Origin and Destination Survey (DB1B),
http://www.transtats.bts.gov/Fields.asp?Table_ID=247
8