Abstract—Since the demand for information retrieval increases quickly, indexing structures became an important issue to support fast information retrieval. According to the work in this paper, a new data structure called Dynamic Ordered Multi-field Index (DOMI) for information retrieval has been introduced. It is based on radix trees organized in segments in addition to a hash table to point to the roots of each segment, where each segment is dedicated to store the values of a single field. The hash table is used to access the needed segments directly without traversing the upper segments. So, DOMI improves look-up performance for queries addressing to a single field. In the case of multiple queries addressing, each segment of the radix tree is traversed sequentially without visiting the unrelated branches. The use of segmentation for the proposed DOMI provides flexibility for minimizing communication overhead in the distributed system. Every field in the radix tree is represented by one segment, where each segment can be stored as one block.
In addition to, the proposed DOMI consumes less space comparing to indexes which are built using B or B+ trees. Hence, it is more suitable for intensive-data such as Big Data.
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
Infos2014
1. Information Retrieval using Dynamic Indexing
Sura I. Mohammed
Computer science department
Faculty of computer science
Cairo University
Egypt, Cairo
suib200684@yahoo.com
Hussien M. Sharaf
ITC department
Arab Open University
Egypt, Cairo
hussiensharaf@from-masr.com
Fatma A. Omara
Computer science department
Faculty of computer science
Cairo University
Egypt, Cairo
f.omara@fci-cu.edu.eg
Abstract—Since the demand for information retrieval increases
quickly, indexing structures became an important issue to support fast
information retrieval. According to the work in this paper, a new data
structure called Dynamic Ordered Multi-field Index (DOMI) for
information retrieval has been introduced. It is based on radix trees
organized in segments in addition to a hash table to point to the roots
of each segment, where each segment is dedicated to store the values
of a single field. The hash table is used to access the needed segments
directly without traversing the upper segments. So, DOMI improves
look-up performance for queries addressing to a single field. In the
case of multiple queries addressing, each segment of the radix tree is
traversed sequentially without visiting the unrelated branches. The
use of segmentation for the proposed DOMI provides flexibility for
minimizing communication overhead in the distributed system.
Every field in the radix tree is represented by one segment, where
each segment can be stored as one block.
In addition to, the proposed DOMI consumes less space comparing to
indexes which are built using B or B+ trees. Hence, it is more
suitable for intensive-data such as Big Data.
Keywords: Dynamic Order Multi-field Index (DOMI), Data query,
indexing structures, Big Data.
I. INTRODUCTION
As the size of data sets grows bigger, some research works
had been done to provide flexible and efficient query
mechanisms to extract data from a set of fields of data
streaming. There are several existed data structures that can be
used for this task as B and B+. Particularly, index mechanisms
are needed to help the retrieval information quickly and
according to their location within large Data sets where the
Main objective of indexing is to optimize the speed of query
[22].
However, there are two major phases for supporting
dynamic queries through a large data set. The first phase is
preparing data sets. The second phase is to give an answer to a
query after using indexing for data that result from one phase.
The second phase will be considered in this paper.
An index is considered an efficient data structure to retrieve
objects by given the value of one or more elements of those
objects [4]. This scales gracefully to large numbers of keys
and insensitive to the length or content of inserted strings [2].
Data processing based on queries can produce useful
information. However, answers to such queries reflect the
stored information directly and avoid searching unnecessary
field values.
The basic idea of field’s index is to ignore unnecessary
fields during processing queries. The logical unit of a tuple as
entry in field index is an important where each entry index is a
tuple with order which is decided by the Index designer.
Therefore, Tuple affects directly the performance of query
execution.
Predilection on processing queries over large Data set has
contributed in the improvement of index structure, such as
projections strategy indices (RB+), Value-List Index (B+ tree)
and more complex index structures to speed up the process of
queries evaluation [15]. While efficient query processing
specifications have been achieved from these indices,
querying huge data sets, especially streaming data, had
suffered from the time overheads during answering the queries
that involve multiple fields. That problem is happened mainly
in traditional search Trees, because of the sequential access
(i.e., static order). By using static order index, the search
should traverse all values in each entry to find matching
values for the required field which might be a non-leading
field. In such case, the search is nearly a sequential search.
The problem of using static order index for answering
queries which involves multiple fields index, i.e., single index
can reference multiple field (e.g., State, City, Zipcode, Web
Site), is illustrated Fig 1. According to Fig. 1, the index will
not be helpful when the query is processed using State and Zip
fields because unnecessary fields will be searched to answer
queries. The only field that benefits from the index structure
is the leading field while the rest of the fields are nearly
unsorted.
2. Fig 1. Single Multi-field Index
According to the work in this paper, multi-field searching
problem using static ordering of the large data set has been
resolved by introducing a dynamic ordered index for holding
values of multiple fields based on the Radix tree (DOMI) as a
basic structure.
The mean advantages of Radix tree relative to other trees
such as B/B+ are that efficient in terms of storage, fast on
look-up operations, and the data is maintained in sorted order
[16] [9]. Also it supports other operations such as prefix
lookup and row update.
The dynamic property of the proposed DOMI structure has
been satisfied by using a hash-table together with Radix trees.
It allows a search to proceed directly to a root field. Hence it
helps the query to access the data items in parallel with
reasonable time. On the other hands, the dynamic index allows
a search to proceed directly to a target data portion of the
index that can answer a query.
The remainder of this paper is organized as follows; Section
II discusses related work. Section III starts with some general
observations on the Radix trees advantages over B/B+ tree,
and how to build index of fields using Radix tree. Section IV
presents the building multi-field using radix tree. Section V
presents the proposed DOMI structure and how to build index
where order of fields can dynamically change according to the
given query. Finally, section VI presents the conclusions of
the paper.
II. RELATED WORK
More organizations are running into problems with
processing big data every day [7]. In real life, data stores
contain millions of data for real world objects and the data
searching is most common and is always used to retrieval of
data. So, to improve the performance of retrieving data, data
indexing is required [19].
There are problems for indexing huge amount of digital
information. The problem of storing, indexing and searching
data set has gained increasing attention. A survey about
spatial indexing is discussed in [6]. According to the survey in
[6], the problem of designing efficient indexes to support
spatial objects has been addressed. On the other hands, the
reason for creating an index for a data set is to speed the
access to a subset of the data [21]. Index structures are
different in terms of structure, query support, data type support
and application [19]. Tuple reconstruction is an important
component in the column-stores, and affects the performance
of query execution. Therefore, it is necessary to perform the
process of Tuple reconstruction before query execution by
using main indexes and jointing address mapping indexes [8].
Radix tree has property a key prefix that allow efficiently
index in main memory. Query Processing in QPPT keeps the
index materialization costs low, and uses optimal prefix trees
to satisfy balanced read/write operations which are known to
be main memory optimized [12].
On the other hands, the Adaptive Index Buffer reduces the
cost of table scans by quickly indexing tuples in the memory
until the partial index to be adapted to the workload again, but
it covers only a subset of the values of a column [14].
Based on data structure to store tuples of fields, B-tree is
considered a simple existed method that permits storing
vertical partitions in traditional B-tree indexes with practically
zero overheads for storing the tuples [10]. Even so, a B tree is
usually used to search tuples of fields in a static order. By
given a query, the search begins from the root, and checks
each child sequentially until the query is found.
According to the work in this paper, a radix tree is used to
store field values based on the characteristic adaptive radix
tree by using dynamically choosing compact internal data
structures to overcome the common problem in the worst-case
space consumption and enable efficient parallel access [9].
The Radix tree saves storage space by exploiting the common
prefixes in the string set.
Index Fabric based on B-tree uses a segmented approach of
Patricia to allow a search to proceed directly to a block-sized
portion of the index that can answer a query [2]. According to
index fabric, the search proceeds from segment to segment
until the desired data segment is found.
Comparing to the existed approaches, using Radix tree to
build indexes is considered very helpful especially in a stream
Data querying. Efficient Search is considered the most
important criterion for selecting data structures because search
is normally carried out on-line (and thus needs quick response)
and will be carried out many times [20]. Traditionally, it uses
auxiliary data structures, such as B-Trees, Hash Indexes, and
Bitmap Indexes. These data structures have excellent
performance. The indices are used to provide a quick and easy
access to data, save time and operations in searching, inserting
of data, etc.
3. This paper intends to use two different data structure for
index construction.
III. RADIX TREE (RT) OVERVIEW.
Using Radix tree provides the advantages of reduction in
the storage space required for storing values. It also provides
great efficiency to retrieve any information [3]. Regarding
object storage, the radix tree uses a simple key/value model
depending on the characteristics of radix trees. It also enables
parallel access of sub-radix trees
Generally, a Radix tree is a hierarchical structure composed of
internal nodes and leaf nodes, where [3]:
• Internal Nodes; contain pairs of the form (key, P). An
entry in an internal node contains a pointer (P) pointing to
a lower level node in the sub-tree and a key is the field
name. Structure of inner node for this tree capacity where
values can contain more complex data types.
• Leaf nodes; store the values corresponding to the keys.
The useful properties of Radix tree are [13]:
• Look-up; determines if a string exists in a tree.
• Insertion; either add a new outgoing edge labeled with all
remaining elements in the input string, or find longest
common prefix, split it into two edges then add suffix.
• In addition to the ordering of the keys that are sorted
lexicographically. It supports another operations e.g.
(rang scan, prefix lookup, update).
With respect to the performance of operations, k-ary search
trees fail to support incremental update operations [17]. And
B+
trees have expensive update operations [18]. On the other
hands, Radix trees doesn’t have such expensive update
operations because they need minimal re-structuring of nodes
compared to B+
trees which need time consuming insertion
algorithm. .
One advantage of the Radix tree is that it depicts early if
there are possible matches. While other search trees such as
binary tree, the decision will probably be the slowest because
it has to search through levels of tree nodes, then, the result of
comparisons cannot be predicted easily [9]. Finally the reasons
of using a Radix tree are that it provides faster look-up,
efficient insertions, and updates, supports range scans and
prefix look-ups as the data is sorted.
IV. BUILDING INDEX FILED USING RADIX TREE (RT).
In this section, using the radix trees to build convenient
indexes and for storing and answering a query will be
discussed. Queries can be processed efficiently with specific
indexes. Internal nodes are used as index to insert and locate
data efficiently from the radix tree with minimum time.
There are two types of nodes:
• The Root Node (RN) such as (FN1, FN2… FNk), where
each field FNi belongs to the set of a field's header in the
original data. Each RN is used as a root for a Sub-Radix
Trees (SRT). Each internal node stores one element only.
• The Data Node (DN) contains two issues; The prefix
value pv which is a common prefix for two values v that
have the same prefix. This saves space for two inner
nodes by truncating the path to the leaf. DN can be
described in form of a tuple (V, {offset}) where value V
has two type:
• v is a complete value that exists in the original data file .
• sv is a suffix value for that prefix value of the parent
node.
An entry of Data Node consists of a pointer pointing to the
data and an offset or set of offsets which covers the item’s
location in original file.
Fig. 2 shows the structure of both Root Node and Data Node
in the Radix Tree. In this paper, the radix tree is built on
qualifying more set of fields. A candidate index in the data set
is decided according to a certain criteria, and also according to
the relationships between the fields themselves, and additional
optimal number of fields. It is necessary to perform a pre-
process of fields before index construction. A possible pre-
processing is organizing values of each record in the form of
tuples.
Fig 2. Single node of tree
Definition 1:
A RN is defined in the form of the following:-
RN = (FN, {p1,p2,…pi}), where FN indicates a field name in
the original data. Pi points to another type Node of the tree.
Definition 2:
A DN is defined using three forms as follows:-
• DN= (v, {o1,o2,..,oi}, { p1,p2,…pi }), where v is a complete
value from the domain of the values to be indexed. o is
an offset or position of this value in original data set. P
4. points to a node which could be a DN in the same SRT or
a RN in a new SRT.
• DN= (pv, { p1,p2,…pi }), where pv is a common prefix for
two values v that have the same prefix. P points to a node
which conation suffix value.
• DN= (sv, {o1,o2,..,oi}, { p1,p2,…pi }), where sv is suffix
value for that prefix value of parent node. p points to a
node which could be a DN in the same SRT or a RN in a
new SRT.
The values of each field are grouped into a segment that
may contain one or more Sub-Radix Trees (SRT). The order
of segments is initially decided according to the relationships
between the fields. The increasing segments {1, 2, …, n} from
the highest to the lowest is illustrated in Fig 3.. Each segment
contains one or more block-sized sub-radix tree (SRTi) as
shown in Fig 4. The Root Node (RN) of one SRT stores a key
as (FN1) and each one of Data Nodes (DN) refers to a value
that belongs to FN1..
Fig 3. Segments in Radix Tree
One of the values as (v1FN1,…), is inserted by either, adding
a new node that stores a complete value as shown in Fig 3.
The other way is to find the longest common prefix as
(pv2FN1) then add another node to hold the remaining suffix
values. Storing a common prefix value only once saves
storage space. In this case, a different suffix is stored in
separate nodes(sv2.1FN1, sv2.2FN1), and so on for each value.
The Radix Tree (RT) indexes stores values (v1FN1,
sv2.2FN1), together with the position (offset) of that value that
refers to a location of the actual spatial-data they represent.
Multiple values in can reference same offset.
The search is done by comparing each tuple generated by a
user’s query which is coming in the form of tuple query (field,
value, operator) where the operator could be equal, greater
than or etc... with each tuple of the original data set which
should have its tuple data-set Row (field, value, Data Type).
The search process descends from the root at the highest
segment and proceeds nodes within the same segment, or
transferred them to the next segment, where the result of the
search in one segment is either a pointer to data - if the search
key matches the data key, or a pointer to another segment, or
null. Thus each query may require passing more than one
segment to find the answer of query to any search process.
More than one SRT under a visited node may need to be
searched; hence it might not possible to guarantee good
performance. To avoid a sequential search on the whole tree, a
dynamic index could provide a good solution. The dynamic
index is based on radix tree which will help to retrieve data
quickly according to their locations and require visiting only a
small number of nodes.
A single segment where sub-radix trees (SRTs) are grouped
together is shown in Fig. 4. Each SRT has a root node that
stores the field name.
Fig 4. Sub Trees (ST) in one segment
Definition 3:
A SRT is defined in the form according to the following:
SRT = (N, E, n0) is an acyclic graph where:
• N is a set of nodes {n0,….,nk}, where k > 0, n0 ∈ N,
• E is the set of links {e0,…., em }; where m>=0 and eij is a
pair (ni, nj) such that ni ∈ N and nj∈ N.∀ pair (ni, nj) ∈ E; ni
≠ nj,
• Finally; n0 is the only RN in a single SRT. ∀ni ∈ N ~ {n0},
ni must be of type DN.
5. V. DYNAMIC ORDER MULTI-FIELD INDEX (DOMI)
In order to handle data set efficiently and to provide some
optimizations for indexing of data set repository, dynamic
index structure should be used. This is achieved by building
the dynamic index based on a Radix tree as a basic structure.
This will help to retrieve the data quickly. According to the
work in this paper, a Dynamic Order Multi-field index
(DOMI) has been introduced as a new index structure to
support query processing of data efficiently. By using DOMI,
the search time , as well as, storage overhead will be reduced.
A dynamic index structure could be constructed using two
different data structures; SRT and Hash Table (HT). Hash
tables allow each SRT to be accessed randomly and
independently. HT provides a way to locate data in a constant
time. Each root node of SRTi is saved in the HT. HT consists
of several entries of the tuple (keyvalue). A key is the field
name and a value is a list of pointers each of which points to
the position of a root node in a radix tree.
The proposed data structure, dynamic ordered multi-field
index (DOMI), can find tuple data-set Row in the original data
source which match the query keyword as tuple query using
the techniques based on direct access of any field. Any
searching process starts by consulting the hash table and
locating pointers of the required RNs using the field names.
Fig 5. Hash Table (HT)
The design of a dynamic re-ordering multi-field index
(DOMI) for querying data sets is illustrated in Fig. 5. Each key
in a HT has one or more pointers to point to a root node of an
SRT. If one segment contains more than one SRT, as segment
2 in Fig. 4, in this case, each key has a list of pointers; FN2→
{P1, P2, …, Pn}. These pointers refer directly to the root node
of any segment that includes the desired values to answer the
queries.
Definition 4:
A HT is defined in the form as follows:
HT = ({FN1, FN2, …,FNn}, {P1, P2, …, Pn}), where
(1) FN is the set of RN for each segment,
(2) P is the set of pointers for each RN.
When input query q is given in the form of tuple T = (FN,
value), if FN of query matches with FN in HT, it returns the
query answer q according to {FNk, Pn⊆ SRTi, SRT∈ RT}.
The search process starts by comparing a field in the query
with a field name in the hash table. If they match, the search
follows a link which connected SRTi and HT to a particular
SRT, and the desired data is found. If there is no matching,
this indicates that a value does not exist, and the search
terminates.
When a tuple of query is received, it immediately moves
towards the HT to determine the suitable root at the right
segment, and then process the query accordingly.
A. Preliminaries
A binary search tree of height H can support any of the
basic dynamic-set operations such as SEARCH,
PREDECESSOR, SUCCESSOR, MINIMUM, MAXIMUM,
INSERT, and DELETE, in (h) time. This set operations
would be processed fast if the height of the search tree is
small. If the search tree height is large, processing these
operations may not be faster than a linked list [23].
On the other hands, hash tables support the dictionary
operations as INSERT, DELETE, and SEARCH. In the worst
case, the hashing process requires (n) time to Perform
SEARCH operation, but the expected time for hash-table
operations is (1) [23].
The complexity of Look-up, insert, and delete operations in
the worst case is ( ), where is the maximum length of the
string in the set [24]. The time complexity of the worst
operations (e.g., insert and look-up), where n is the number of
elements, l is the maximum length of the new key, using
different data structure is illustrated in Table 1.
Table1. Comparison between Index Structures
Index
Structure
Time complexity
B-tree (log n)
B+-tree (log n)
R-tree
Not utilize space more efficiently,
not have worst case time
complexity [19].
Radix Tree (l)
Hash Table (1)
6. Most of the index structures have time complexity in terms
of (log n). But they have different factor, terms and
condition when they use to develop algorithms [19]. On the
other hands, Radix trees have a number of interesting
properties that distinguish them from other search trees [9]:
• The height (and complexity) of radix trees depends on the
length of the keys not on the number of elements in the
tree.
• Radix trees don't need rebalancing operations and all
insertion orders result in the same tree.
• The keys are stored in lexicographic order.
• The path to a Data Node represents the key of that
leaf. Therefore, keys are stored implicitly and can be re-
constructed from paths
B. Insertion Algorithm
The pseudo code of a simple algorithm of insert operation to
insert values of tuples Data Set_Row in form (key/value) at the
segments of Radix Tree is presented in Fig. 6.
.
Fig 6. Simple Algorithm to insert (RN&DN)
C. Insertion Example
The insertion operation of values in two cases will be
explained. Fig. 7 illustrates the beginning of the segment1
key ’ST’, followed by the value ‘Nevada’ at the right side of
the root ST.
A new entry ‘ST=California’ is inserted to the upper
segment. For searching a new value, a null pointer leaves a
non-leaf node. Next, a node is created for value “California”
and it is inserted as Data Node (child) of root node “ST”
accordingly (line 2) in the algorithm. The DN of this value
can be artlessly inserted into an existing root node. It has a
new child now. At any time, a new entry of another tuple is
inserted to the SRT1. In this case, a new pointer for RN ‘ST’ is
not added in the hash table. For any existed DN in segment1
contains a prefix value pv, it is compared to a new value of the
entry. This node can be split and return sv for both values (line
3) in algorithm. The symbols (…..) in a node represent
numbers of offset that inserted together values.
Fig 7. Inserted new Data Node (case 1)
Note that a middle segment contains more than one SRT.
New value can be inserted in the middle segment where each
value is inserted based on the above SRT in all segments.
Segment2 contains more than one RN ‘City’. To insert a new
value, it must follow the path of SRT in the above segment.
By inserting a new entry ‘City=Los Angles’, it should append
a new DN to an existing RN2 of segment2. Create a node for
value “Los Angles” and add it as a child DN of the RN “City”.
Since no new RN was added to the index, therefore there is no
need to add anything to the HT. a new entry “Company
Name=A World Link” is inserted until a leaf node is reached.
But the stored value isn't the same as the new value. Now a
new path has to be generated as the common prefix. The old
value “A white Rose” and the new value “A World Link”
have the same prefix “A W” which branches out to two
different leaves each of which contains suffix for each value at
the end.
It can be noted that some segments of the DOMI did not
require the creation of RNs where the insertion process is
implemented on existing RNs “previously inserted”. In this
case, there is no need to add new pointers for RNs in a hash
Algorithm (Parameters V: pair of (v,{offset1,…, offseti})
SRTi: sub radix tree, pv: prefix value, v i,j: different
value, HT: hash table, Pi: new pointer)
1) IF N is a Root Node:
2) For each value V of TData set_Row , expand
new Leaf Node into SRTi , add V or,
3) IF Data Node conations pvj=pvi, split node
and insert sv for each v.
4) IF N equal null:
5) Add new Root Node inside the segment, then call
steps (2 or 3).
6) Add new pi of root node into HT.
7. table “no change to the hash table”. Therefore, the insertion
operation can be performed in (|V|) where V is the length of
the value to be inserted.
The insertion operation in case 2 is illustrated in Fig. 8. A
new DN is created to insert value ‘Ohio’ into segment1.
Fig 8. Inserted new Root Node (case 2)
In segment 2, a RN ‘City’ should be inserted first, and then a
value ‘Findlay’ can be inserted as a child node. In this case,
since there is no RN is existed under the DN ‘Ohio’; therefore,
a new RN ‘City’ should be created (see line 5). It is an
extension of a value directly above it and then a new DN is
created to insert value ‘Findlay’. The same steps are applied in
segment3. Segment 2 and segment 3 of the DOMI are required
to the creation of RNs. This needs to add new pointers for
RNs (‘City’, ‘compnay name’) in the hash table (see line 6).
The time complexity of the insertion operation in DOMI
depends on the time complexity of Multi-Segments Radix
Tree Index (MSRTI) and Hash Table (HT). It is required to
insert a new value into the appropriate DN and also insert RN
for this value in the HT.
Therefore, the time complexity could be determined as
follows:
l = max ( |V1| , |V2|, …. |Vn|), l is the length of the maximum
value.
MSRTI: (n l), where n is the number of segments which
equals to the number of fields and l is the length of the
maximum value
DOMI: (n l) + (1) = (n l) since (1) is negligible.
D. Algorithm Search
the algorithm for searching values in DOMI data structure is
presented in Fig. 9.
Fig 9. Simple Algorithm for search in DOMI
Usually, searching the trees must be descending from the
highest tree. Thus, more than one SRT to be searched might
be traversed.
E. Example search
Fig. 10 describes how to search DOMI for a specific value
stated in a given query. Given a query, the search begins from
the RNs nodes that are stored in the HT. It checks RNs until a
field name that matches the query field name is found. If there
is a matching, the search follows the direct link that refers to a
particular block-sized SRTi. Then, the search continues from
the RN of SRTi down to a DN.
The comparison between a key of Tquery and a RN of a HT is
illustrated in line 1 of Fig. 9. The following example can be
used to illustrate how to process queries using DOMI
structure.
A query Q is stated as “City= Garden Gove and Company
name= A white Rose”. It is immediately moving towards the
HT. If a ‘City’ and a ‘Company name’ of Tquery matches the
appropriate FNs in the HT, the search process follows pointers
of those accordance FNs. P1 in the pointers of ‘City’ which
points directly to the segment2, which includes “Garden Gove”
value according to the query without traversing the segment1.
P1 in the pointers of ‘Company name’ points directly to the
Input: Q = (k1, v1, k2, v2,) Tuples of query.
Output: all occurrences of Q in the Data Set.
RN: root node, p: pointer of RN
/*Search begins at the hash table (HT)*/
1) Check a ki of Tquery with a RNi of HT, if so.
2) A pi of a RNi in HT moves toward SRTi
3) Return output
4) Otherwise, If ki of Tquery ≠ a RNi of HT, then return null
8. segment3 without traversing both segment1 and segment2
sequentially. Segment3 includes the node that stores “A white
Rose” value. Then, the search continues from that node down
to a DN to reach the desired data. The leftmost DN of
segment3 represents the common prefixes for two values “A
white Rose” and “A world Link”.
The leaf DN contains pair of value and the position (offset)
of that value. It refers to a location of the actual spatial-data
within the input stream. If FN doesn't match the appropriate of
the search key, indicates that the key does not exist, and the
search terminates.
Fig 10. Search in DOMI
VI. CONCLUTIONS
The primarily environment of big data needs to use more
efficient index structures to speed up the evaluation of queries.
The work in this paper has introduced new index structure;
Dynamic Ordered Multi- Field Index (DOMI). The DOMI is
based on a collection of radix trees in addition to a single hash
table. The use of a hash table allows random access of any
sub-radix tree without traversing the upper trees at the upper
segments. In addition, the use of radix trees decreases the
space consumption by storing common prefix values only
once. Also, it provides efficient time complexity regarding the
insertion and searching operations. For these reasons, we
believe that the proposed DOMI offers an attractive alternative
approach compared to other structures for indexing forever-
growing big data.
REFERENCES
[1] Jeffrey Dean and Sanjay Ghemawat, MapReduce:
Simplied Data Processing on Large Clusters, 2004,
Google, Inc.
[2] Brian F. Cooper, Neal Sample, Michael J. Franklin1, Gísli
R. Hjaltason1, Moshe Shadmon1,” A Fast Index for Semi
structured Data”, Proceedings of the 27th VLDB
Conference ,Roma , Italy, 2001.
[3] Christophe Cérin, MichelKoskas, Jean-SébatienGay, Gaël
Le Mahec, “Efficient Data-Structures and Parallel
Algorithms for Association Rules Discovery”, Proceedings
of Fifth Mexican International Conference, in IEEE, 2004.
[4] Mining of Massive Datasets, Anand Rajaraman, Jure
Leskovec, Jeffrey D. Ullman, 2012.
[5] Andrew S. Tanenbaum Maarten Van Steen, “Distributed
Systems Principles and Paradigms”, 2007.
[6] V. Gaede and O. Gu¨ nther, “Multidimensional Access
Methods,” ACM Computing Surveys, vol. 30, no. 2, pp.
170-231, June 1998.
[7] Kevin McGowan, “Big data, Fast Processing Speeds”, In
SAS Solutions on Demand, Cary NC, 2013.
[8] Xiangwu Ding, Wenbing Yu, Jiajin Le, “An Adaptive
Projection Strategy and Its Implementation in Column
Stores”, in IEEE, 2011.
[9] Viktor Leis, Alfons Kemper, Thomas Neumann, “The
Adaptive Radix Tree: ARTful Indexing for Main-
Memory Databases”, ICDE, 2013.
[10] Goetz Graefe, “Efficient columnar storage in B-trees”, In
ACM, 2007.
[11] Mohammad M. Masud1, Jing Gao, Latifur Khan, Jiawei
Han, Bhavani Thuraisingham, “A Multi-partition Multi-
chunk Ensemble Technique to Classify Concept-Drifting
Data Streams”, In Springer-Verlag Berlin Heidelberg,
2009.
[12] K.Ramamohanarao, JohnW.Lloyd, “Dynamic Hashing
Schemes”, In ACM Computing Surveys, 1998.
[13] Per-Ake Larson,” Linear hashing with separators—a
dynamic hashing scheme achieving one-access”, In
ACM Transactions on Database Systems, 1988.
[14] Hannes Voigt, Tobias Jaekel, Thomas Kissinger,
Wolfgang Lehner, “Adaptive Index Buffer”, In 28th
International Conference on Data Engineering
Workshops, In IEEE, 2012.
[15] P. O’Neil, D. Quass, “Improved Query Performance with
Variant Indexes” In ACM SIGMOD international
conference on Management of data, page 38--49, 1997.
9. [16]J.Corbet, “Trees I: Radix trees,”
http://lwn.net/Articles/175432.
[17] B. Schlegel, R. Gemulla, W. Lehner, “k-ary search on
modern processors,” In DaMoN workshop, 2009.
[18] R.Bayer and E. McCreight, “Organization and
maintenance of large ordered indices,” in SIGFIDET,
1970.
[19] P. Patel, D Garg,” Comparison of Advance Tree Data
Structures”, in IJCA International Journal of Computer,
2012.
[20] Guojun Lu, “Techniques and Data Structures for
Efficient Multimedia Retrieval Based on Similarity”, In
IEEE, 2002.
[21] Lisa A. Horwitz, “Techniques for Managing Large Data
Sets: Compression, Indexing and Summarization”,
Applications, 2012.
[22] Ajit Singh, Dr. Deepak Garg "Implementation and
Performance Analysis of Exponential Tree Sorting"
International Journal of Computer Applications, pp. 34-
38 June 2011.
[23] Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest, Clifford Stein, “Introduction to Algorithms Third
Edition”, 2009.