Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing Cache

Tag-Based Browsing of Digital
Collections with Inverted Indexes and
Browsing Cache
Joaquín Gayoso-Cabada, Mercedes Gómez-Albarrán,
José Luis Sierra
Fac. Informática
Universidad Complutense de Madrid

2
Contents
Introduction
The Tag-Based Browsing Model
Tag-Based Browsing with Inverted
Indexes
Adding a Browsing Cache
Conclusions and Future Work

3
Introduction
Clavy: an experimental platform for learning
object repositories with reconfiguable
structures
Clavy makes it possible to rearrange the
hierarchical organization of elements in
metadata schemata.
These reconfigurations affect functionalities like
learning object presentation, and browsing.
In particular, although from a user’s point of
view Clavy supports a guided browsing
paradigm…
… internally it supports more free and flexible
browsing mechanisms…
… able to take account of all the posible ways of
browsing the repositories

4
Introduction
Clavy browsing is internally supported by a tag-
based browsing system
element – value pairs are abstracted as tags
The browsing system maintains:
– A set of active tags
– The set of filtered objects
– The set of additionally selectable tags, able to
further shrink, but not to vanish, the filtered
objects
Updating the browsing snapshot when the set of
active tags changes can be computationally-
intensive
To mitigate the cost we proposed a strategy
based on inverted indexes and a browsing
cache

5
Digital Collections
Resources Tagging Resources Tagging
r1 Cave-Painting
Cantabrian
Prehistoric
r4 Tartesian
Plateau
Protohistoric
r2 Cave-Painting
Levant
Prehistoric
r5 Phoenician
Penibaetic
Protohistoric
r3 Megalithic
Cantabrian
Prehistoric
r6 Punic
Levant
Protohistoric
Resources  Content of Learning objects
Tags  Element-value pairs

6
Browsing
Browsing state:
– F  Set of selected tags.
– RF  Set of filtered resources.
– SF  Set of selectable tags.
Browsing actions:
– +t  Select the tag t.
– xt  Remove the tag t

7
Browsing with Inverted Indexes
Inverted Indexes
For each tag t the inverted index  returns
the set of all the resources (t) tagged with t
(Cave-Painting)={r1,r2}
(Megalithic)={r3}
(Tartesian)={r4}
(Phoenician)={r5}
(Punic)={r6}
(Cantabrian)={r1,r3}
(Levant)={r2,r6}
(Plateau)={r4}
(Penibaetic)={r5}
(Prehistoric)={r1,r2,r3}
(Protohistoric)={r4,r5,r6}
Resources Tagging Resources Tagging
r1 Cave-Painting
Cantabrian
Prehistoric
r4 Tartesian
Plateau
Protohistoric
r2 Cave-Painting
Levant
Prehistoric
r5 Phoenician
Penibaetic
Protohistoric
r3 Megalithic
Cantabrian
Prehistoric
r6 Punic
Levant
Protohistoric
Inverted index

8
Browsing with Inverted Indexes
The Browsing Strategy
+t browsing action:
– F  F  {t}
– RF  RF(t)
– SF{t’SF-{t} |
0 < |RF(t’)| <|RF|}
xt browsing action:
– F  F - {t}
– RF  t’F (t’) (or all the
resources if F=)
– SF{t’- F |
0 < |RF(t’)| <|RF|}
F= is managed as a
particular case:
– RF  
– SF  {t | |(t)| < ||}

9
: filtered resource
store
F ⟶ RF
: selectable tag
store
F ⟶ SF
: representative
store
RF ⟶ F
Adding a Browsing Cache
CACHE#5 CACHE#4
CACHE#1
CACHE#2
()=
()=
CACHE#3
()=
(t10)=R1
F
(t10,t1)=R2
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
()=
(t10)=R1
F
(t10,t1)=R2
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
()=
(t10)=R1
F
(t10,t1)=R2
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
()=
(t10)=R1
F
(t10,t1)=R2
F
(t1)=R5
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
(t1)={t6,t7}
CACHE#6
()=
(t10)=R1
F
(R1
F
)={t10}
()=
(t10)={t1,t2,t6,t7}
+Prehistoric
CACHE#1
+Cave-Painting
CACHE#2
xCave-Painting
CACHE#3
xPrehistoric
CACHE#4+Cave-Painting
CACHE#5
{Cave-Painting}
{Cantabrian,
Levant}
 
 {Prehistoric}
{Cave-Painting,
Megalithic,
Cantabrian,
Levant}
{Prehistoric}
{Cave-Painting,
Megalithic,
Cantabrian,
Levant}
 

R1
F
=R0
F
  (t10) R2
F
=R1
F
  (t1)
R5
F
=R4
F
  (t1)
|R1
F
  (t1)|=2
|R1
F
  (t2)|=1
|R1
F
  (t3)|=0
|R1
F
  (t4)|=0
|R1
F
  (t5)|=0
|R1
F
  (t6)|=2
|R1
F
  (t7)|=1
|R1
F
  (t8)|=0
|R1
F
  (t9)|=0
|R1
F
  (t11)|=0
0<|R1
F
(t)|<|R1
F
|
|R2
F
  (t2)|=0
|R2
F
  (t6)|=1
|R2
F
  (t7)|=1
| (t1)|=2
| (t2)|=1
| (t3)|=1
| (t4)|=1
| (t5)|=1
| (t6)|=2
| (t7)|=2
| (t8)|=1
| (t9)|=1
| (t10)|=3
| (t11)|=3
|(t)|< ||
{Prehistoric,
Cave-Painting}
{Cantabrian,
Levant}
0<|R2
F
(t)|<|R2
F
|
345
{r1,r2,r3} {r1,r2}
{r1,r2,r3}{r1,r2}
0 1 2
CACHE#6

10
Conclusions
A browsing strategy based on a suitable combination of
inverted indexes and multilevel caches has been proposed
to speed up the browsing process in Clavy
Currently we are working on the empirical evaluation of our
approach in Chasqui, a real-world repository in the Pre-
Columbian American archeology field.
Preliminary experiments suggest that the browsing cache
can substantially speed up navigation with respect to a more
basic, un-cached strategy (solely based on inverted indexes).
The price to pay is the overhead generated by cache
management, as well as the higher memory footprint caused
by the technique.
However, the experiments also make apparent how: (i) the
cache management overhead is compensated by eliminating
the explicit computation of the information associated to many
browsing states, and (ii) the cache size is maintained within
reasonable ranges, even when it is not upper-bounded.

11
Future Work
To improve the cache strategy by combining it with our
previous work on navigation automata.
To generalize the browsing strategy to support navigation
through links among resources.
To combine browsing and search, letting users browse
search results according to the browsing model described.

Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing Cache

Recommended

Recommended

More Related Content

Similar to Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing Cache

Similar to Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing Cache (20)

More from Technological Ecosystems for Enhancing Multiculturality

More from Technological Ecosystems for Enhancing Multiculturality (20)

Recently uploaded

Recently uploaded (20)

Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing Cache