SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Shortest Path Efficiency Analysis - Logic Programming
Suraj Nair
September 6, 2015
Abstract
Generally, this paper aims to study different implementations of shortest path algorithms and determine
the various benefits of each implemention. Specifically, in this report we will examine implementations of
Dijkstra’s algorithm for undirected and directed weighted graphs through logic programming as well as
through standard graph theory. We will aim to show that the logic programming implementation, while using
less memory, is actually slower and has less capabilities than the standard graph theory implementation. Due
to the availability of compute space today, the memory benefits of the logic programming implementation are
not nearly as valuable as the speed and range of capabilities of the graph theory implementation, and we can
conclude that for most applications, the graph theory implementation is superior.
Method
To find these results, we developed random graphs of a user specified size. For each pair of nodes in the
graph there is a 50% chance of there existing a edge between those nodes, and if such an edge does exist, it is
given a random weight between 0 and 50. Then, using implementations of Dijkstra’s algorithm in Java and
in Prolog, we solve for the shortest paths from the first node to every other node, validate that the answers
are correct, and collect data regarding the time and space usage of each implementation.
Dataset
We begin with a dataset with the following structure:
## 'data.frame': 51 obs. of 9 variables:
## $ Number.of.Nodes : int 10 100 100 100 100 100 500 500 500 500 ...
## $ logic_cpu_start : int 208 235 319 400 440 482 722 2474 4172 5873 ...
## $ logic_cpu_end : int 209 255 341 419 463 507 2277 3960 5661 7370 ...
## $ logic_wall_start: int 703557 826379 1070444 1363390 1435559 1508727 1761770 2001670 2239261 25210
## $ logic_wall_end : int 703568 826435 1070516 1363456 1435633 1508812 1763513 2003329 2240907 25227
## $ graph_cpu : int 20 105 104 103 104 107 579 460 546 545 ...
## $ graph_wall : int 26 114 112 112 112 116 594 477 562 560 ...
## $ logic_mem : int 10560 725192 842376 691888 699448 679016 16565232 14614016 80080 16664696 .
## $ graph_mem : int 309344 23051368 23222264 22769592 23289224 23091200 1009160792 437351400 90
After processing and making the some of the columns more concise, we end up with a table with the following
fields representing memory usage, wall time, and cpu time for each implementation for each size graph:
## [1] "Number.of.Nodes" "logic_mem" "graph_mem" "logic_wall"
## [5] "graph_wall" "logic_cpu" "graph_cpu"
1
Analysis
Now that we have clean data, we can begin our analysis. We will begin by looking at how memory usage
scales for each implementation.
Memory
10 100 200 300 400 500 600 700 800 900 1000
0e+004e+088e+08
Comparing Memory Usage
Number of Nodes
MemoryUsageinBytes
Graph Theory
Logic Programming
Since the graphs are random, and have varying number of edges, the memory usage is not perfectly aligned
with the number of nodes, however we can clearly see the difference between the two implementations. For
the graph theory implementation, we see a roughly linear growth with the number of nodes, which is to be
expected since for each node we need to create a node object as well as an edge object for each edge.
On the other hand, the Prolog implementation implementation stays roughly constant, because the entire
graph is represented as a set of rules, and no objects are created. Thus, the logic programming implementation
consistently uses less memory.
2
0 200 600 1000
0e+002e+074e+076e+07
Logic Programming
Number of Nodes
MemoryinBytes
0 200 600 1000
0.0e+006.0e+081.2e+09
Graph Theory
Number of Nodes
MemoryinBytes
Here we can see a scatterplot of the memory usage for each of the nodes. This gives us a more clear picture
of how the memory usage of each implementation scales. Comparing the slopes of each of the lines of best fit,
we can see that the graph theory implementation uses approximately 23 times more memory than the logic
programming implementation.
3
Timing
10 100 200 300 400 500 600 700 800 900 1000
02000600010000 Comparing Runtime (Wall−Time)
Number of Nodes
TimeinMilliseconds
Graph Theory
Logic Programming
10 100 200 300 400 500 600 700 800 900 1000
02000600010000
Comparing Runtime (CPU−Time)
Number of Nodes
TimeinMilliseconds
Graph Theory
Logic Programming
4
The above two graphs illustrate how the time complexity of each algorithm scales with the size of the graph.
Since the time spent reading in the graph is insignificant compared to the time required to compute the
shortest paths, we find that the CPU time and wall time are almost identical. Furthermore, we see that
as the number of nodes increases, the logic programming implementation is slower than the graph theory
implementation.
0 200 400 600 800 1000
06000
Logic Programming Wall Time
Number of Nodes
TimeinMilliseconds
0 200 400 600 800 1000
01500
Graph Theory Wall Time
Number of Nodes
TimeinMilliseconds
0 200 400 600 800 1000
06000
Logic Programming CPU Time
Number of Nodes
TimeinMilliseconds
0 200 400 600 800 1000
01000
Graph Theory CPU Time
Number of Nodes
TimeinMilliseconds
Here we can see a scatterplot of timing for each of the number of nodes. This allows us to compare exactly
the speed difference between each of the algorithms. Comparing the slopes of each of the lines of best fit, we
can see that the logic programming implementation uses approximately 8.27 times more wall time than the
graph theory implementation and 8.6 times more cpu time.
Upon a closer inspection of each of the methods, it becomes clear that the reason for the speed difference is
that the Graph Theory implementation utilizes a binary heap, while the Prolog implementation finds the new
closest node to the start by doing a breadth first search from the start node to find the closest unassigned
node, then assigns it as found. Therefore, if we have a graph of N nodes, then in the worst case scenario we
have to explore approximately N + (N-1) + (N-2) . . . = (N)(N-1)/2 paths. However this situation can only
occur if every node is connected to every other node, and one direct path through all the nodes is weighted
substantially less than all of the other path. In practice this method of finding the shortest unassigned node is
generally fast and since the number of paths which need to be explored is the sum of the number of adjacent
unassigned nodes for each assigned node, most graphs in real world applications will not require searching
too many edges to find the closest node.
Unlike the Logic Programming implementation, the Graph Theory implementation uses a binary heap which
is represented as an array. Thus all operations take either logarithmic or constant time. Furthermore, it
is a stable heap so it supports changing the priority of a key withing the heap directly. Ultimately, this
makes the Graph Theory implementation faster, especially for larger graphs, as is evident from the previously
displayed timing data. Additionally, it explains the scaling difference we see in the graphs, where the Graph
Theory implementation scales at a linearithmic rate, while the Logic Programming implementation scales at
approximately quadratic rate.
5
Implementing A Binary Heap in Prolog
To determine whether the Logic Programming implementation can be optimized to operate at speed close to
the Graph Theory implementation, we attempted to implement a binary heap in Prolog. However, since
Prolog stores a heap through a linked list, not an array, the functions for modifing the heap are less efficient.
In fact, in the documentation, it is specifically stated that the delete from heap rule is extremely inefficient.
Below we can see the average amount of time it takes to retreive the shortest node in Prolog with and without
the heap.
500 300 100
With Heap
Without Heap
Number of Nodes
TimeinSeconds
0e+002e−044e−04
Additionally, the heap is unstable, so to change the shortest path to a node, one needs to delete the Priority-
Key pair and add it with the new priority, so using a heap with Prolog not only has a less efficient call to get
the smallest value, it also makes that call more often. Therefore, we use the implementation without the
heap.
Real World Applications
Now let us examine the difference between these algorithms when applied to a real world example. We
will be using the Origin and Destination Survey Data for airlines from the United States Department of
Transportation Database. Based on this data, we will construct a directed graph with a node for each of the
402 airports in the data and all of the edges corresponding to real world flight info. The weights of each edge
will be the distance in miles between each of the airports. We begin with data in the following format:
## data.ORIGIN_AIRPORT_ID data.DEST_AIRPORT_ID data.NONSTOP_MILES
## Min. :10135 Min. :10135 Min. : 39
## 1st Qu.:11278 1st Qu.:11274 1st Qu.: 691
## Median :12451 Median :12451 Median :1110
## Mean :12705 Mean :12693 Mean :1311
## 3rd Qu.:14122 3rd Qu.:14113 3rd Qu.:1741
## Max. :16218 Max. :16218 Max. :8061
After creating a directed graph of flight routes from this data, we used both the logic programming and graph
theory implementations of Dijkstra’s algorithm to find the shortest path from a single airport to all other
airports. Below one can see the time used by each implementation.
6
Graph Theory Logic Programming
Implementation
TimeinMilliseconds
0100300500
Conclusions
From the data, we can see that in general, the Graph Theory implementation when implemented with a
binary heap, is several times faster than the logic programming approach, and thus is the preferred choice for
shortest path implementations in which speed is of the greatest importance. Additionally, the graph theory
implementations uses objects, which while they do require more memory, have the extended capability of
associating as many features as needed with edges and vertices, which in practice is especially important,
such as in the case where an edge has multiple criteria contributing to total cost.
While these conclusions seem straightforward enough, it is worth noting that there a certainly some situations
in which it would be easier and faster to utilize the logic programming implementation. Specifically, when
dealing with a knowledge base being stored as an ontology or a similar format, there is a distinct advantage
to utilizing the logic programming implementation. This is that the the user defined properties and hierarchal
object structure of a knowledge base stored as an ontology translates directly into a set of facts and rules for a
logic program. Specifically, the Data Property assertions within an ontology relate individuals to literals and
can be used as facts, while the Object Property assertions define relationships between individuals and other
individuals and can be used as rules. As a result, logic programming works seamlessly with these sorts of
data structures, while standard graph theory and other methods would require parsing the data, likely from
a XML/RDF format, and creating a new data structure, which for large knowledge bases, would take quite a
bit of time. Thus, we can see that there are applications, such a information clustering applications where we
are determining the similarity of concepts based on how many properties connect them, both directly and
indirectly, where we may want to use the logic programming implementation of Dijkstra’s algorithm.
7
References
Algorithms, 4th edition by Robert Sedgewick and Kevin Wayne,
Addison-Wesley Professional, 2011, ISBN 0-321-57351-X.
http://algs4.cs.princeton.edu
United States Department of Transportation,
Airline Origin and Destination Survey (DB1B),
http://www.transtats.bts.gov/Fields.asp?Table_ID=247
8

Contenu connexe

Tendances

Homomorphic encryption and Private Machine Learning Classification
Homomorphic encryption and Private Machine Learning ClassificationHomomorphic encryption and Private Machine Learning Classification
Homomorphic encryption and Private Machine Learning ClassificationMohammed Ashour
 
A Novel Approach of Caching Direct Mapping using Cubic Approach
A Novel Approach of Caching Direct Mapping using Cubic ApproachA Novel Approach of Caching Direct Mapping using Cubic Approach
A Novel Approach of Caching Direct Mapping using Cubic ApproachKartik Asati
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyAndrii Gakhov
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...NECST Lab @ Politecnico di Milano
 
work load characterization
work load characterizationwork load characterization
work load characterizationRaghu Golla
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf
 
One More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific ComputingOne More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific Computingtheijes
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmMostafa G. M. Mostafa
 
A fast-replica-placement-methodology-for-large-scale-distributed-computing-sy...
A fast-replica-placement-methodology-for-large-scale-distributed-computing-sy...A fast-replica-placement-methodology-for-large-scale-distributed-computing-sy...
A fast-replica-placement-methodology-for-large-scale-distributed-computing-sy...Cemal Ardil
 
Project 2: Baseband Data Communication
Project 2: Baseband Data CommunicationProject 2: Baseband Data Communication
Project 2: Baseband Data CommunicationDanish Bangash
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesKeyon Vafa
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
 
Proximity Detection in Distributed Simulation of Wireless Mobile Systems
Proximity Detection in Distributed Simulation of Wireless Mobile SystemsProximity Detection in Distributed Simulation of Wireless Mobile Systems
Proximity Detection in Distributed Simulation of Wireless Mobile SystemsGabriele D'Angelo
 
Scalable Distributed Graph Algorithms on Apache Spark
Scalable Distributed Graph Algorithms on Apache SparkScalable Distributed Graph Algorithms on Apache Spark
Scalable Distributed Graph Algorithms on Apache SparkLynxAnalytics
 

Tendances (19)

Homomorphic encryption and Private Machine Learning Classification
Homomorphic encryption and Private Machine Learning ClassificationHomomorphic encryption and Private Machine Learning Classification
Homomorphic encryption and Private Machine Learning Classification
 
A Novel Approach of Caching Direct Mapping using Cubic Approach
A Novel Approach of Caching Direct Mapping using Cubic ApproachA Novel Approach of Caching Direct Mapping using Cubic Approach
A Novel Approach of Caching Direct Mapping using Cubic Approach
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. Frequency
 
Lec13 multidevice
Lec13 multideviceLec13 multidevice
Lec13 multidevice
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
Poster
PosterPoster
Poster
 
work load characterization
work load characterizationwork load characterization
work load characterization
 
Lec08 optimizations
Lec08 optimizationsLec08 optimizations
Lec08 optimizations
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
One More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific ComputingOne More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific Computing
 
Lec05 buffers basic_examples
Lec05 buffers basic_examplesLec05 buffers basic_examples
Lec05 buffers basic_examples
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
 
A fast-replica-placement-methodology-for-large-scale-distributed-computing-sy...
A fast-replica-placement-methodology-for-large-scale-distributed-computing-sy...A fast-replica-placement-methodology-for-large-scale-distributed-computing-sy...
A fast-replica-placement-methodology-for-large-scale-distributed-computing-sy...
 
End sem
End semEnd sem
End sem
 
Project 2: Baseband Data Communication
Project 2: Baseband Data CommunicationProject 2: Baseband Data Communication
Project 2: Baseband Data Communication
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian Processes
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
Proximity Detection in Distributed Simulation of Wireless Mobile Systems
Proximity Detection in Distributed Simulation of Wireless Mobile SystemsProximity Detection in Distributed Simulation of Wireless Mobile Systems
Proximity Detection in Distributed Simulation of Wireless Mobile Systems
 
Scalable Distributed Graph Algorithms on Apache Spark
Scalable Distributed Graph Algorithms on Apache SparkScalable Distributed Graph Algorithms on Apache Spark
Scalable Distributed Graph Algorithms on Apache Spark
 

En vedette

My favorites
My favoritesMy favorites
My favoritessh19354
 
Thin client SPAs. Stream UI using web standards
Thin client SPAs. Stream UI using web standardsThin client SPAs. Stream UI using web standards
Thin client SPAs. Stream UI using web standardsStarcounter
 
Outros Certificados em Inglês
Outros Certificados em InglêsOutros Certificados em Inglês
Outros Certificados em InglêsJoão Mpaca
 
Estadistica religion y_moral_catolica_cr2015-2016
Estadistica religion y_moral_catolica_cr2015-2016Estadistica religion y_moral_catolica_cr2015-2016
Estadistica religion y_moral_catolica_cr2015-2016miciudadreal
 
st-marine_3rd Officer Panush Vyacheslav CV_form_
st-marine_3rd Officer Panush Vyacheslav CV_form_st-marine_3rd Officer Panush Vyacheslav CV_form_
st-marine_3rd Officer Panush Vyacheslav CV_form_Vyacheslav Panush
 
Issues and Challenges in Implementing Electronic Health Record in Primary Care
Issues and Challenges in Implementing Electronic Health Record in Primary CareIssues and Challenges in Implementing Electronic Health Record in Primary Care
Issues and Challenges in Implementing Electronic Health Record in Primary Carerusai021
 
Help your scrum team strike oil!
Help your scrum team strike oil!Help your scrum team strike oil!
Help your scrum team strike oil!Michael O'Reilly
 

En vedette (9)

My favorites
My favoritesMy favorites
My favorites
 
Thin client SPAs. Stream UI using web standards
Thin client SPAs. Stream UI using web standardsThin client SPAs. Stream UI using web standards
Thin client SPAs. Stream UI using web standards
 
Outros Certificados em Inglês
Outros Certificados em InglêsOutros Certificados em Inglês
Outros Certificados em Inglês
 
Estadistica religion y_moral_catolica_cr2015-2016
Estadistica religion y_moral_catolica_cr2015-2016Estadistica religion y_moral_catolica_cr2015-2016
Estadistica religion y_moral_catolica_cr2015-2016
 
Texto
TextoTexto
Texto
 
писукову дмитрию
писукову дмитриюписукову дмитрию
писукову дмитрию
 
st-marine_3rd Officer Panush Vyacheslav CV_form_
st-marine_3rd Officer Panush Vyacheslav CV_form_st-marine_3rd Officer Panush Vyacheslav CV_form_
st-marine_3rd Officer Panush Vyacheslav CV_form_
 
Issues and Challenges in Implementing Electronic Health Record in Primary Care
Issues and Challenges in Implementing Electronic Health Record in Primary CareIssues and Challenges in Implementing Electronic Health Record in Primary Care
Issues and Challenges in Implementing Electronic Health Record in Primary Care
 
Help your scrum team strike oil!
Help your scrum team strike oil!Help your scrum team strike oil!
Help your scrum team strike oil!
 

Similaire à LogicProgrammingShortestPathEfficiency

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...
IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...
IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...IRJET Journal
 
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSSLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSIJCI JOURNAL
 
Performance measures
Performance measuresPerformance measures
Performance measuresDivya Tiwari
 
Simulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksSimulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksDaniel Zuniga
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPIJSRED
 
Back pressure based packet by packet adaptive
Back pressure based packet by packet adaptiveBack pressure based packet by packet adaptive
Back pressure based packet by packet adaptiveIMPULSE_TECHNOLOGY
 
cis97007
cis97007cis97007
cis97007perfj
 
On modeling controller switch interaction in openflow based sdns
On modeling controller switch interaction in openflow based sdnsOn modeling controller switch interaction in openflow based sdns
On modeling controller switch interaction in openflow based sdnsIJCNCJournal
 
Scimakelatex.93126.cocoon.bobbin
Scimakelatex.93126.cocoon.bobbinScimakelatex.93126.cocoon.bobbin
Scimakelatex.93126.cocoon.bobbinAgostino_Marchetti
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problemsRichard Ashworth
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsJigisha Aryya
 
Complier design
Complier design Complier design
Complier design shreeuva
 

Similaire à LogicProgrammingShortestPathEfficiency (20)

FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Eryk_Kulikowski_a4
Eryk_Kulikowski_a4Eryk_Kulikowski_a4
Eryk_Kulikowski_a4
 
IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...
IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...
IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...
 
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSSLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
cug2011-praveen
cug2011-praveencug2011-praveen
cug2011-praveen
 
Simulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksSimulation of Wireless Sensor Networks
Simulation of Wireless Sensor Networks
 
A046020112
A046020112A046020112
A046020112
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MP
 
Back pressure based packet by packet adaptive
Back pressure based packet by packet adaptiveBack pressure based packet by packet adaptive
Back pressure based packet by packet adaptive
 
Green scheduling
Green schedulingGreen scheduling
Green scheduling
 
cis97007
cis97007cis97007
cis97007
 
On modeling controller switch interaction in openflow based sdns
On modeling controller switch interaction in openflow based sdnsOn modeling controller switch interaction in openflow based sdns
On modeling controller switch interaction in openflow based sdns
 
Scimakelatex.93126.cocoon.bobbin
Scimakelatex.93126.cocoon.bobbinScimakelatex.93126.cocoon.bobbin
Scimakelatex.93126.cocoon.bobbin
 
Description Of A Graph
Description Of A GraphDescription Of A Graph
Description Of A Graph
 
StateKeeper Report
StateKeeper ReportStateKeeper Report
StateKeeper Report
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systems
 
Complier design
Complier design Complier design
Complier design
 

LogicProgrammingShortestPathEfficiency

  • 1. Shortest Path Efficiency Analysis - Logic Programming Suraj Nair September 6, 2015 Abstract Generally, this paper aims to study different implementations of shortest path algorithms and determine the various benefits of each implemention. Specifically, in this report we will examine implementations of Dijkstra’s algorithm for undirected and directed weighted graphs through logic programming as well as through standard graph theory. We will aim to show that the logic programming implementation, while using less memory, is actually slower and has less capabilities than the standard graph theory implementation. Due to the availability of compute space today, the memory benefits of the logic programming implementation are not nearly as valuable as the speed and range of capabilities of the graph theory implementation, and we can conclude that for most applications, the graph theory implementation is superior. Method To find these results, we developed random graphs of a user specified size. For each pair of nodes in the graph there is a 50% chance of there existing a edge between those nodes, and if such an edge does exist, it is given a random weight between 0 and 50. Then, using implementations of Dijkstra’s algorithm in Java and in Prolog, we solve for the shortest paths from the first node to every other node, validate that the answers are correct, and collect data regarding the time and space usage of each implementation. Dataset We begin with a dataset with the following structure: ## 'data.frame': 51 obs. of 9 variables: ## $ Number.of.Nodes : int 10 100 100 100 100 100 500 500 500 500 ... ## $ logic_cpu_start : int 208 235 319 400 440 482 722 2474 4172 5873 ... ## $ logic_cpu_end : int 209 255 341 419 463 507 2277 3960 5661 7370 ... ## $ logic_wall_start: int 703557 826379 1070444 1363390 1435559 1508727 1761770 2001670 2239261 25210 ## $ logic_wall_end : int 703568 826435 1070516 1363456 1435633 1508812 1763513 2003329 2240907 25227 ## $ graph_cpu : int 20 105 104 103 104 107 579 460 546 545 ... ## $ graph_wall : int 26 114 112 112 112 116 594 477 562 560 ... ## $ logic_mem : int 10560 725192 842376 691888 699448 679016 16565232 14614016 80080 16664696 . ## $ graph_mem : int 309344 23051368 23222264 22769592 23289224 23091200 1009160792 437351400 90 After processing and making the some of the columns more concise, we end up with a table with the following fields representing memory usage, wall time, and cpu time for each implementation for each size graph: ## [1] "Number.of.Nodes" "logic_mem" "graph_mem" "logic_wall" ## [5] "graph_wall" "logic_cpu" "graph_cpu" 1
  • 2. Analysis Now that we have clean data, we can begin our analysis. We will begin by looking at how memory usage scales for each implementation. Memory 10 100 200 300 400 500 600 700 800 900 1000 0e+004e+088e+08 Comparing Memory Usage Number of Nodes MemoryUsageinBytes Graph Theory Logic Programming Since the graphs are random, and have varying number of edges, the memory usage is not perfectly aligned with the number of nodes, however we can clearly see the difference between the two implementations. For the graph theory implementation, we see a roughly linear growth with the number of nodes, which is to be expected since for each node we need to create a node object as well as an edge object for each edge. On the other hand, the Prolog implementation implementation stays roughly constant, because the entire graph is represented as a set of rules, and no objects are created. Thus, the logic programming implementation consistently uses less memory. 2
  • 3. 0 200 600 1000 0e+002e+074e+076e+07 Logic Programming Number of Nodes MemoryinBytes 0 200 600 1000 0.0e+006.0e+081.2e+09 Graph Theory Number of Nodes MemoryinBytes Here we can see a scatterplot of the memory usage for each of the nodes. This gives us a more clear picture of how the memory usage of each implementation scales. Comparing the slopes of each of the lines of best fit, we can see that the graph theory implementation uses approximately 23 times more memory than the logic programming implementation. 3
  • 4. Timing 10 100 200 300 400 500 600 700 800 900 1000 02000600010000 Comparing Runtime (Wall−Time) Number of Nodes TimeinMilliseconds Graph Theory Logic Programming 10 100 200 300 400 500 600 700 800 900 1000 02000600010000 Comparing Runtime (CPU−Time) Number of Nodes TimeinMilliseconds Graph Theory Logic Programming 4
  • 5. The above two graphs illustrate how the time complexity of each algorithm scales with the size of the graph. Since the time spent reading in the graph is insignificant compared to the time required to compute the shortest paths, we find that the CPU time and wall time are almost identical. Furthermore, we see that as the number of nodes increases, the logic programming implementation is slower than the graph theory implementation. 0 200 400 600 800 1000 06000 Logic Programming Wall Time Number of Nodes TimeinMilliseconds 0 200 400 600 800 1000 01500 Graph Theory Wall Time Number of Nodes TimeinMilliseconds 0 200 400 600 800 1000 06000 Logic Programming CPU Time Number of Nodes TimeinMilliseconds 0 200 400 600 800 1000 01000 Graph Theory CPU Time Number of Nodes TimeinMilliseconds Here we can see a scatterplot of timing for each of the number of nodes. This allows us to compare exactly the speed difference between each of the algorithms. Comparing the slopes of each of the lines of best fit, we can see that the logic programming implementation uses approximately 8.27 times more wall time than the graph theory implementation and 8.6 times more cpu time. Upon a closer inspection of each of the methods, it becomes clear that the reason for the speed difference is that the Graph Theory implementation utilizes a binary heap, while the Prolog implementation finds the new closest node to the start by doing a breadth first search from the start node to find the closest unassigned node, then assigns it as found. Therefore, if we have a graph of N nodes, then in the worst case scenario we have to explore approximately N + (N-1) + (N-2) . . . = (N)(N-1)/2 paths. However this situation can only occur if every node is connected to every other node, and one direct path through all the nodes is weighted substantially less than all of the other path. In practice this method of finding the shortest unassigned node is generally fast and since the number of paths which need to be explored is the sum of the number of adjacent unassigned nodes for each assigned node, most graphs in real world applications will not require searching too many edges to find the closest node. Unlike the Logic Programming implementation, the Graph Theory implementation uses a binary heap which is represented as an array. Thus all operations take either logarithmic or constant time. Furthermore, it is a stable heap so it supports changing the priority of a key withing the heap directly. Ultimately, this makes the Graph Theory implementation faster, especially for larger graphs, as is evident from the previously displayed timing data. Additionally, it explains the scaling difference we see in the graphs, where the Graph Theory implementation scales at a linearithmic rate, while the Logic Programming implementation scales at approximately quadratic rate. 5
  • 6. Implementing A Binary Heap in Prolog To determine whether the Logic Programming implementation can be optimized to operate at speed close to the Graph Theory implementation, we attempted to implement a binary heap in Prolog. However, since Prolog stores a heap through a linked list, not an array, the functions for modifing the heap are less efficient. In fact, in the documentation, it is specifically stated that the delete from heap rule is extremely inefficient. Below we can see the average amount of time it takes to retreive the shortest node in Prolog with and without the heap. 500 300 100 With Heap Without Heap Number of Nodes TimeinSeconds 0e+002e−044e−04 Additionally, the heap is unstable, so to change the shortest path to a node, one needs to delete the Priority- Key pair and add it with the new priority, so using a heap with Prolog not only has a less efficient call to get the smallest value, it also makes that call more often. Therefore, we use the implementation without the heap. Real World Applications Now let us examine the difference between these algorithms when applied to a real world example. We will be using the Origin and Destination Survey Data for airlines from the United States Department of Transportation Database. Based on this data, we will construct a directed graph with a node for each of the 402 airports in the data and all of the edges corresponding to real world flight info. The weights of each edge will be the distance in miles between each of the airports. We begin with data in the following format: ## data.ORIGIN_AIRPORT_ID data.DEST_AIRPORT_ID data.NONSTOP_MILES ## Min. :10135 Min. :10135 Min. : 39 ## 1st Qu.:11278 1st Qu.:11274 1st Qu.: 691 ## Median :12451 Median :12451 Median :1110 ## Mean :12705 Mean :12693 Mean :1311 ## 3rd Qu.:14122 3rd Qu.:14113 3rd Qu.:1741 ## Max. :16218 Max. :16218 Max. :8061 After creating a directed graph of flight routes from this data, we used both the logic programming and graph theory implementations of Dijkstra’s algorithm to find the shortest path from a single airport to all other airports. Below one can see the time used by each implementation. 6
  • 7. Graph Theory Logic Programming Implementation TimeinMilliseconds 0100300500 Conclusions From the data, we can see that in general, the Graph Theory implementation when implemented with a binary heap, is several times faster than the logic programming approach, and thus is the preferred choice for shortest path implementations in which speed is of the greatest importance. Additionally, the graph theory implementations uses objects, which while they do require more memory, have the extended capability of associating as many features as needed with edges and vertices, which in practice is especially important, such as in the case where an edge has multiple criteria contributing to total cost. While these conclusions seem straightforward enough, it is worth noting that there a certainly some situations in which it would be easier and faster to utilize the logic programming implementation. Specifically, when dealing with a knowledge base being stored as an ontology or a similar format, there is a distinct advantage to utilizing the logic programming implementation. This is that the the user defined properties and hierarchal object structure of a knowledge base stored as an ontology translates directly into a set of facts and rules for a logic program. Specifically, the Data Property assertions within an ontology relate individuals to literals and can be used as facts, while the Object Property assertions define relationships between individuals and other individuals and can be used as rules. As a result, logic programming works seamlessly with these sorts of data structures, while standard graph theory and other methods would require parsing the data, likely from a XML/RDF format, and creating a new data structure, which for large knowledge bases, would take quite a bit of time. Thus, we can see that there are applications, such a information clustering applications where we are determining the similarity of concepts based on how many properties connect them, both directly and indirectly, where we may want to use the logic programming implementation of Dijkstra’s algorithm. 7
  • 8. References Algorithms, 4th edition by Robert Sedgewick and Kevin Wayne, Addison-Wesley Professional, 2011, ISBN 0-321-57351-X. http://algs4.cs.princeton.edu United States Department of Transportation, Airline Origin and Destination Survey (DB1B), http://www.transtats.bts.gov/Fields.asp?Table_ID=247 8