Beyond the bread-and-butter singly linked list are dozens of practical Functional Data Structures available to mask complexity, enable composition, and open possibilities in pattern matching. This session focuses on data structures available today in F#, but has practical application to most any functional language. The session covers what makes these structures functional, when to use them, why to use them, choosing a structure, time complexity awareness, garbage collection, and practical insights into creating and profiling your own Functional Data Structure. Bibliography at http://jackfoxy.com/fsharp-user-group-working-with-functional-data-structures-bibliography
3. tl;dr
Singly-linked list -- the fundamental purely functional data structure
Time complexity overview
Garbage collection and real-world performance
Reasons to use Purely Functional Data Structures
When not to use Purely Functional Data Structures
Choices and shapes
Build your own Purely Functional Data Structure
acster.com jackfoxy.com @foxyjackfox 2/5/2013 3
5. Theoretical Performance
O(1)
O(log * n) practically O(1)
O(log log n)
O(log n)
O(n) linear time
O(n2) gets real bad from here on out
…
acster.com jackfoxy.com @foxyjackfox 2/5/2013 5
6. Theoretical Performance (most common)
O(1)
O(log * n) practically O(1)
O(log log n)
O(log n)
O(n) linear time
O(n2) gets real bad from here on out
O(i) variables other than n require explanation
acster.com jackfoxy.com @foxyjackfox 2/5/2013 6
7. Actual Performance
Processor architecture (instruction look-ahead, cache, etc.)
.NET Garbage Collection
O(n) behavior starts for “large enough size”
Recursive Benchmarks over different Structure Sizes
102
103 often looks like << O(n)
104
105 usually settles down to O(n),
sometimes looks like > O(n)
106
acster.com jackfoxy.com @foxyjackfox 2/5/2013 7
8. List as a recursive structure
Adding Element Empty List
4 :: 3 2 1 []
Head Tail
acster.com jackfoxy.com @foxyjackfox 2/5/2013 8
9. So what the heck would you do with a list?
Demo 1
acster.com jackfoxy.com @foxyjackfox 2/5/2013 9
10. “Getting” the recursive thing
SICP
a.k.a
Abelson
&
Sussman
a.k.a
The Wizard Book
acster.com jackfoxy.com @foxyjackfox 2/5/2013 10
11. Why no update or remove in List ?
Graphics: unattributed, all over the internet
acster.com jackfoxy.com @foxyjackfox 2/5/2013 11
12. Okasaki’s Pseudo-Canonical List Update
1. let rec loop i updateElem (l:list<'a>) =
2. match (i, l) with
3. | i', [] -> raise (System.Exception("subscript"))
4. | 0, x::xs -> updateElem::xs
5. | i', x::xs -> x::(loop (i' - 1) y xs)
found it!
4 :: 3 :: 2 :: 1 []
acster.com jackfoxy.com @foxyjackfox 2/5/2013 12
13. Okasaki’s Pseudo-Canonical List Update
1. let rec loop i updateElem (l:list<'a>) =
2. match (i, l) with
3. | i', [] -> raise (System.Exception("subscript"))
4. | 0, x::xs -> updateElem::xs
5. | i', x::xs -> x::(loop (i' - 1) y xs)
Do you see a problem?
acster.com jackfoxy.com @foxyjackfox 2/5/2013 13
14. We could just punt
1. let punt i updateElem (l:list<'a>) =
2. let a = List.toArray l
3. a.[i] <- updateElem
4. List.ofArray a
acster.com jackfoxy.com @foxyjackfox 2/5/2013 14
15. …or try a Hybrid approach
1. let hybrid i updateElem (l:list<'a>) =
2. if (i = 0) then List.Cons (y, (List.tail l))
3. else
4. let rec loop i' (front:'a array) back =
5. match i' with
6. | x when x < 0 -> front, (List.tail back)
7. | x ->
8. Array.set front x (List.head back)
9. loop (x-1) front (List.tail back)
10. let front, back = loop (i - 1) (Array.create i y) l
11. let rec loop2 i' frontLen (front’:'a array) back’ =
12. match i' with
13. | x when x > frontLen -> back’
14. | x -> loop2 (x + 1) frontLen front’ (front’.[x]::back’)
15. loop2 0 ((Seq.length front) - 1) front (updateElem ::back)
acster.com jackfoxy.com @foxyjackfox 2/5/2013 15
16. Time complexity of update options
Pseudo-Canonical
O(i)
Punt
O(n)
Hybrid
O(i)
Place your bets !
Graphics: unattributed, all over the internet
acster.com jackfoxy.com @foxyjackfox 2/5/2013 16
17. Actual Performance
10k Random Updates One-time Worst Case
102 PC - 2.9ms Punt - 0.2ms
Hybrid 1.4X 4.0 PC 1.1X 0.2
Punt 1.5 4.5 Hybrid 4.1 0.8
PC looks
perfect !
Graphics: http://www.freebievectors.com/es/material-de-antemano/51738/material-vector-dinamico-estilo-comic-femenino/
acster.com jackfoxy.com @foxyjackfox 2/5/2013 17
18. Actual Performance
10k Random Updates One-time Worst Case
102 PC - 2.9ms Punt - 0.2ms
Hybrid 1.4X 4.0 PC 1.1X 0.2
Punt 1.5 4.5 Hybrid 4.1 0.8
103 Hybrid - 29.6 Punt - 0.2
Punt 1.6 47.6 PC 1.1 0.2
PC 1.7 50.3 Hybrid 4.1 0.8
104 Hybrid - 320.3 Punt - 0.3
Punt 1.7 534.9 PC 1.3 0.4
PC 2.9 920.2 Hybrid 3.2 0.9
105 Hybrid - 4.67sec Punt - 1.0
Punt 2.0 9.34 Hybrid 1.5 1.5
PC stack overflow !
acster.com jackfoxy.com @foxyjackfox 2/5/2013 18
19. Benchmarking performance
Hard to reason about actual performance
DS_Benchmark
◦ Open source on Github
◦ Discards outliers
◦ Fully isolates code to benchmark
◦ Fully documented
◦ “how to extend” documented
acster.com jackfoxy.com @foxyjackfox 2/5/2013 19
20. Shapes: let your imagination run wild!
Graphics: Larry D. Moore Attribution-Share Alike 3.0 Unported license. http://commons.wikimedia.org/wiki/File:Playdoh.jpg
acster.com jackfoxy.com @foxyjackfox 2/5/2013 20
21. Binary Random Access List
Same Cons, Head, Tail signature
Optimized for Lookup and Update
O(log n)
…but not for Remove
Why Not?
Does it with alternate internal structures
acster.com jackfoxy.com @foxyjackfox 2/5/2013 21
32. Trees
Wide variety of applications
Binary (balanced or unbalanced)
Multiway (a.k.a. RoseTree)
acster.com jackfoxy.com @foxyjackfox 2/5/2013 32
33. Red Black Tree Balancing
d
a
b c
a
d
a b c d b
c
c d
a b
a
d
b c
Source: https://wiki.rice.edu/confluence/download/attachments/2761212/Okasaki-Red-Black.pdf
acster.com jackfoxy.com @foxyjackfox 2/5/2013 33
34. Talk about reducing complexity!
1. type 'a t = Node of color * 'a * 'a t * 'a t | Leaf
2. let balance = function
3. | Black, z, Node (Red, y, Node (Red, x, a, b), c), d
4. | Black, z, Node (Red, x, a, Node (Red, y, b, c)), d
5. | Black, x, a, Node (Red, z, Node (Red, y, b, c), d)
6. | Black, x, a, Node (Red, y, b, Node (Red, z, c, d)) ->
7. Node (Red, y, Node (Black, x, a, b), Node (Black, z, c, d))
8. | x -> Node x
Source: http://fsharpnews.blogspot.com/2010/07/f-vs-mathematica-red-black-trees.html
acster.com jackfoxy.com @foxyjackfox 2/5/2013 34
35. Extra Credit
Write the Remove operation for a
Red Black Tree
Here’s how:
http://en.wikipedia.org/wiki/Red-black_tree#Removal
acster.com jackfoxy.com @foxyjackfox 2/5/2013 35
37. To Do:
Benchmark:
RoseTree (lazy)
EagerRoseTree (not yet implemented)
IndexedRoseTree
Multiway as unbalanced binary tree
(polymorphic recursion)
acster.com jackfoxy.com @foxyjackfox 2/5/2013 37
38. Another To Do:
The (not-so-) Naïve Binary Tree:
As seen all over the internet…
acster.com jackfoxy.com @foxyjackfox 2/5/2013 38
39. Another To Do:
The (not-so-) Naïve Binary Tree:
As seen all over the internet…
…yet often missing: Pre-order
Post-order
In-order
fold traversals (better be tail-recursive).
And maybe a zipper navigator while you are at it!
acster.com jackfoxy.com @foxyjackfox 2/5/2013 39
40. Call for Action!
Fsharpx.Collections.Experimental
GitHub fork FSharpx
Implement some interesting structure and tests
Sync back to your fork
Pull request
Out of ideas or just want to practice?
Unimplemented Okasaki structures:
http://github.com/jackfoxy/DS_Benchmark/tree/
master/PurelyFunctionalDataStructures
acster.com jackfoxy.com @foxyjackfox 2/5/2013 40
41. When not to use purely functional
Consider Array if performance is critical
Functional dictionary–like structures
(Map) may not perform well-enough,
especially after scale 104
Consider .NET dictionary–like object
acster.com jackfoxy.com @foxyjackfox 2/5/2013 41
42. Publishing your functional DS
FSharpx.Collections.readme.md
Include Try value returning option for
values that can throw Exception
Include other common values if < O(n)
Reason about edge cases
(more unit tests better than not enough)
acster.com jackfoxy.com @foxyjackfox 2/5/2013 42
43. Build your own structure
Leverage Heap as internal structure to
create RandomStack
Demo 3
acster.com jackfoxy.com @foxyjackfox 2/5/2013 43
44. Closing Thought
The functional data structures further from the
“mainstream” (if such a measure were possible) tend to
have less inherit value in their generic form.
Therefore the ultimate functional data structures
collection would combine the characteristics of a
library, a snippet collection, a benchmarking tool, superb
documentation, test cases, and EXAMPLES!
acster.com jackfoxy.com @foxyjackfox 2/5/2013 44
45. Resources
FSPowerPack.Core.Community (NuGet)
FSharpx.Core (GitHub & NuGet)
FSharpx.Collections.Experimental
(GitHub & NuGet)
DS_Benchmark (GitHub)
raw code for structures not yet merged to FSharpx
acster.com jackfoxy.com @foxyjackfox 2/5/2013 45
Notes de l'éditeur
The big ideas in the presentation
Immutable is only requirement for “definition” of purely functionalPersistence is a side effect of immutable Immutable and persistence allows for thread safetyRecursive just happens to be implementation of nearly all purely functional data structuresIncremental is an aspect of recursion, and enables efficient GC, structures never require .NET large object heap
Time complexity relates how the time component of a process scales
You usually only have to reason about a few time complexity cases
Processor architecture and GC can affect time complexity analysisEspecially on repeated operations resulting in new structure object
Singly-linked list, arguably the most pervasive functional data structure(setting aside stream/IEnumerable for the moment)Tail is itself a listSo is empty list
Summary: recursing through a list with active pattern to format the dataDoing the same with a LazyList takes more time and more Garbage CollectionBut if active pattern cuts short recursion before covering the whole list LazyList actually saves resources (time)(especially useful if calculation or other resources involved)
Read first few chapters to see what singly linked lists are all about
Or in practically any purely functional data structure, for that matter.
This is how we would like to write List update. Remove is the same algorithm, but losing the target element without replacingRecursing like this is akin to operating on a Russian doll
Not tail recursive. (see later slides)
Array is not a functional structureHowever it ends up hidden from the rest of the user code, thus preserving structure immutabilityCould end up transitoraly using .NET large object heapThis approach only addresses update, not remove
Recursive loop to take tail of original list after update position and build Array from the frontCons the updated element to the tailRecursive loop to cons front elementsNote both loops are tail recursiveStill could use .NET large object heapThis approach does work for remove
“i” needs explanation, it is index value of element to update
All the DS stack overflows seem to occur after 10^4 and before 10^5Also best time of 10k updates seems to scale perfectly with size until 10^5(possibly because we crossed over into the large object heap structure?)NOTE – punt is actually quite good for worst caseNOTE 2 – worst case does not scale linearly for any of the options, presumably overhead more expensive than performant code
Hard to reason… : for instance DList Append is O(1), but Deque.OfCatLists outperforms it until a scale of appending 100,000 element structuresPull requests welcome, guidelines for pull requestsFrequently several choices for the operation you want
Singly-linked lists are the starting point of functional data structures.Many of the principles of operation remain the same, but changing shapes offer new possibilitiesLike the Play-Doh Fun Factory, run your data through data structures to change its shape
Solves the lookup and update problem for lists, but not RemoveCreative internal structures required for new functional data structures
Adding function is called “conj”(the inverse of cons)Cons operator, conj operator, empty symbol not actually availableEmpty Queue stands in a different relationship than with List, because no real pointer to itDashed arrows because “pointing” not same as singly linked list
Last & Init are the complement to Head & Tail
Either the minimum or maximum element rises to the top of the heap
Attributes of functional linear structures
Somewhat complete collection of canonical sequential data structures
Some minor exceptions
Array at a disadvantage for this benchmark
No know correct implementation of remove in F#
Summary:RandomStack internally implements 2 partsIComparable type consisting of a random integer and valueHeap of the IComparable items