This document proposes a tag-based browsing system for digital collections that uses inverted indexes and a browsing cache to improve performance. Tags representing element-value pairs are used to filter resources. A browsing cache stores browsing states like filtered resources and selectable tags to speed up navigating when tag filters change. Preliminary experiments show the cache can substantially improve browsing speed over an uncached system using just inverted indexes, though with increased memory usage. Future work aims to integrate browsing automata and links between resources.
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing Cache
1. Tag-Based Browsing of Digital
Collections with Inverted Indexes and
Browsing Cache
Joaquín Gayoso-Cabada, Mercedes Gómez-Albarrán,
José Luis Sierra
Fac. Informática
Universidad Complutense de Madrid
3. 3
Introduction
Clavy: an experimental platform for learning
object repositories with reconfiguable
structures
Clavy makes it possible to rearrange the
hierarchical organization of elements in
metadata schemata.
These reconfigurations affect functionalities like
learning object presentation, and browsing.
In particular, although from a user’s point of
view Clavy supports a guided browsing
paradigm…
… internally it supports more free and flexible
browsing mechanisms…
… able to take account of all the posible ways of
browsing the repositories
4. 4
Introduction
Clavy browsing is internally supported by a tag-
based browsing system
element – value pairs are abstracted as tags
The browsing system maintains:
– A set of active tags
– The set of filtered objects
– The set of additionally selectable tags, able to
further shrink, but not to vanish, the filtered
objects
Updating the browsing snapshot when the set of
active tags changes can be computationally-
intensive
To mitigate the cost we proposed a strategy
based on inverted indexes and a browsing
cache
6. 6
The Tag-Based Browsing Model
Browsing
Browsing state:
– F Set of selected tags.
– RF Set of filtered resources.
– SF Set of selectable tags.
Browsing actions:
– +t Select the tag t.
– xt Remove the tag t
7. 7
Browsing with Inverted Indexes
Inverted Indexes
For each tag t the inverted index returns
the set of all the resources (t) tagged with t
(Cave-Painting)={r1,r2}
(Megalithic)={r3}
(Tartesian)={r4}
(Phoenician)={r5}
(Punic)={r6}
(Cantabrian)={r1,r3}
(Levant)={r2,r6}
(Plateau)={r4}
(Penibaetic)={r5}
(Prehistoric)={r1,r2,r3}
(Protohistoric)={r4,r5,r6}
Resources Tagging Resources Tagging
r1 Cave-Painting
Cantabrian
Prehistoric
r4 Tartesian
Plateau
Protohistoric
r2 Cave-Painting
Levant
Prehistoric
r5 Phoenician
Penibaetic
Protohistoric
r3 Megalithic
Cantabrian
Prehistoric
r6 Punic
Levant
Protohistoric
Inverted index
8. 8
Browsing with Inverted Indexes
The Browsing Strategy
+t browsing action:
– F F {t}
– RF RF(t)
– SF{t’SF-{t} |
0 < |RF(t’)| <|RF|}
xt browsing action:
– F F - {t}
– RF t’F (t’) (or all the
resources if F=)
– SF{t’- F |
0 < |RF(t’)| <|RF|}
F= is managed as a
particular case:
– RF
– SF {t | |(t)| < ||}
9. 9
: filtered resource
store
F ⟶ RF
: selectable tag
store
F ⟶ SF
: representative
store
RF ⟶ F
Adding a Browsing Cache
CACHE#5 CACHE#4
CACHE#1
CACHE#2
()=
()=
CACHE#3
()=
(t10)=R1
F
(t10,t1)=R2
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
()=
(t10)=R1
F
(t10,t1)=R2
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
()=
(t10)=R1
F
(t10,t1)=R2
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
()=
(t10)=R1
F
(t10,t1)=R2
F
(t1)=R5
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
(t1)={t6,t7}
CACHE#6
()=
(t10)=R1
F
(R1
F
)={t10}
()=
(t10)={t1,t2,t6,t7}
+Prehistoric
CACHE#1
+Cave-Painting
CACHE#2
xCave-Painting
CACHE#3
xPrehistoric
CACHE#4+Cave-Painting
CACHE#5
{Cave-Painting}
{Cantabrian,
Levant}
{Prehistoric}
{Cave-Painting,
Megalithic,
Cantabrian,
Levant}
{Prehistoric}
{Cave-Painting,
Megalithic,
Cantabrian,
Levant}
R1
F
=R0
F
(t10) R2
F
=R1
F
(t1)
R5
F
=R4
F
(t1)
|R1
F
(t1)|=2
|R1
F
(t2)|=1
|R1
F
(t3)|=0
|R1
F
(t4)|=0
|R1
F
(t5)|=0
|R1
F
(t6)|=2
|R1
F
(t7)|=1
|R1
F
(t8)|=0
|R1
F
(t9)|=0
|R1
F
(t11)|=0
0<|R1
F
(t)|<|R1
F
|
|R2
F
(t2)|=0
|R2
F
(t6)|=1
|R2
F
(t7)|=1
| (t1)|=2
| (t2)|=1
| (t3)|=1
| (t4)|=1
| (t5)|=1
| (t6)|=2
| (t7)|=2
| (t8)|=1
| (t9)|=1
| (t10)|=3
| (t11)|=3
|(t)|< ||
{Prehistoric,
Cave-Painting}
{Cantabrian,
Levant}
0<|R2
F
(t)|<|R2
F
|
345
{r1,r2,r3} {r1,r2}
{r1,r2,r3}{r1,r2}
0 1 2
CACHE#6
10. 10
Conclusions
A browsing strategy based on a suitable combination of
inverted indexes and multilevel caches has been proposed
to speed up the browsing process in Clavy
Currently we are working on the empirical evaluation of our
approach in Chasqui, a real-world repository in the Pre-
Columbian American archeology field.
Preliminary experiments suggest that the browsing cache
can substantially speed up navigation with respect to a more
basic, un-cached strategy (solely based on inverted indexes).
The price to pay is the overhead generated by cache
management, as well as the higher memory footprint caused
by the technique.
However, the experiments also make apparent how: (i) the
cache management overhead is compensated by eliminating
the explicit computation of the information associated to many
browsing states, and (ii) the cache size is maintained within
reasonable ranges, even when it is not upper-bounded.
11. 11
Future Work
To improve the cache strategy by combining it with our
previous work on navigation automata.
To generalize the browsing strategy to support navigation
through links among resources.
To combine browsing and search, letting users browse
search results according to the browsing model described.
12. Tag-Based Browsing of Digital
Collections with Inverted Indexes and
Browsing Cache
Joaquín Gayoso-Cabada, Mercedes Gómez-Albarrán,
José Luis Sierra
Fac. Informática
Universidad Complutense de Madrid