Presentation on the Trie datastructure, showing how it works, how it's used and what it can be used for; and an implementation of Tries in PHP... with occasional references to Rugby League
Example code to go with the slides can be found at https://github.com/MarkBaker/Tries
and
https://github.com/MarkBaker/QuadTrees
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
Php data structures – beyond spl (online version)
1. PHP DataStructures – Beyond SPL
A dreamscape made from random noise. Illustration: Google
2. DataStructures
A data structure is a particular way of organizing data in a
computer so that it can be used efficiently.
Different kinds of data structures are suited to different kinds
of applications, and some are highly specialized to specific
tasks.
3. DataStructures in PHP
• Some basic DataStructures available in PHP’s SPL
• Stack
• Queue
• Heap
• Doubly-Linked List
• Fixed Array
• SPL Object Storage
• SPL is the Standard PHP Library
• (Yet another recursive acronym)
6. Tries
• A Tree structure comprising a hierarchy of “indexed” nodes
• Each node can contain:
• A series of pointers (keys) to the next node in the hierarchy
• A bucket for data values
• This allows for multiple values with the same key
• There are three basic types of Tries:
• Tries
• Radix Tries
• Suffix Tries
7. Tries – Purpose
• Fast lookup with a partial key
• Example implementation
https://github.com/MarkBaker/Tries
8. Tries – Uses
• Replacement for PHP Arrays (Hashmaps)
• No key collisions
• Duplicate Keys supported
• No Hashing function required
• Partial Key Lookups
• Predictive Text
• Autocomplete
• Spell-Checking
• Hyphen-isation
9. Tries – Methods
• add($key, $value = null)
Adds new data to a Trie
• search($prefix)
Find data in a Trie
• delete($key)
• isNode($key)
• isMember($key)
10. Tries – Basic Trie
• Node pointers comprise a single character or byte
11. Tries – Basic Trie $trie = new Trie();
$trie->add('cat', 'cat data');
C
A
T
12. Tries – Basic Trie $trie = new Trie();
$trie->add('cat', 'cat data');
$trie->add('car', 'car data');
C
A
T R
13. Tries – Basic Trie $trie = new Trie();
$trie->add('cat', 'cat data');
$trie->add('car', 'car data');
$trie->add('cart', 'cart data');
C
A
T R
T
14. Tries – Basic Trie $trie = new Trie();
$trie->add('cat', 'cat data');
$trie->add('car', 'car data');
$trie->add('cart', 'cart data');
$trie->search('car');
T
T
C
C A
A
R
R
15. Tries – Basic Trie
• The key to a data node is inherent in the path to that node,
so it is not necessary to store the key
16. Tries – Radix Trie
• Node pointers comprise one or more characters or bytes
• This means they can be more compact and memory efficient than
a basic Trie
• It can add more overhead to building the Trie
• It may be faster to search the Trie hierarchy
18. Tries – Radix Trie $radixTrie = new Trie();
$radixTrie->add('cat', 'cat data');
$radixTrie->add('car', 'car data');
CA
T R
19. Tries – Radix Trie $radixTrie = new Trie();
$radixTrie->add('cat', 'cat data');
$radixTrie->add('car', 'car data');
$radixTrie->add('cart', 'cart data');
CA
T R
T
20. Tries – Suffix Trie $suffixTrie = new SuffixTrie();
$suffixTrie->add('cat', 'cat data');
C
A
T
21. Tries – Suffix Trie $suffixTrie = new SuffixTrie();
$suffixTrie->add('cat', 'cat data');
C
A
T
TA
T
22. Tries – Suffix Trie $suffixTrie = new SuffixTrie();
$suffixTrie->add('cat', 'cat data');
$suffixTrie->search('at');
C
A
T
T
A T
A
T
23. Tries – Suffix Tries
• Memory hungry
• n + n-1 + n-2… 2 + 1 nodes (where n is key length) used for every
key/value stored in a Suffix Trie
• Slow to populate
• Can be used to search for “contains” rather than simply
“begins with”
24. Tries – Suffix Tries
• It is necessary to store the key with the data
• A search can return duplicate values
• e.g. “banana” if we search for “a” or “n” or even “ana”
• Data should only be stored once for the “full word”, and
subsequent sequences should only store a pointer to that
data
26. QuadTrees
• A Tree structure that partitions a 2-Dimensional space by
recursively subdividing it into quadrants (or regions)
• Each node can contain:
• A series of pointers (keys) to the next node in the hierarchy
• A bucket for data values
• There are different types of QuadTrees:
• Point QuadTrees
• Region QuadTrees
• Edge QuadTrees
• Polygonal Map (PM) QuadTrees
27. QuadTrees – Purpose
• Fast Geo-spatial or Graph lookup
• Sparse data compression
• Example implementation
https://github.com/MarkBaker/QuadTrees
28. QuadTrees – Uses
• Spatial Indexing
• Storing Sparse Data
e.g.
• Spreadsheet format data
• Pixel data in images
• Collision Detection
• Points within a field of vision
29. QuadTrees – Methods
• insert($xyCoordinate, $value = null)
Adds new data to a QuadTree
• search($boundingBox)
Find data in a QuadTree
38. QuadTrees – Spatial Indexing
• With a larger bucket size
• QuadTree is smaller, fewer nodes using less memory
• More points need checking in each node
• Faster to insert / slower to search
• With a smaller bucket size
• The QuadTree uses more memory
• Fewer points in each node to check
• Slower to insert / faster to search
39. QuadTrees – Region QuadTree
• Used for Sparse-data Compression
• Used for Level-based Aggregations
41. QuadTrees
• The same principles can be applied to 3-Dimensional space
using an Octree
42. PHP DataStructures – Beyond SPL
A dreamscape made from random noise. Illustration: Google
Questions
?
43. Who am I?
Mark Baker
Design and Development Manager
InnovEd (Innovative Solutions for Education) Learning Ltd
Coordinator and Developer of:
Open Source PHPOffice library
PHPExcel, PHPWord, PHPPowerPoint, PHPProject, PHPVisio
Minor contributor to PHP core
Other small open source libraries available on github
@Mark_Baker
https://github.com/MarkBaker
http://uk.linkedin.com/pub/mark-baker/b/572/171