IUI 2015 talk slides: Ahn, J., Brusilovsky, P., and Han, S. (2015) Personalized Search: Reconsidering the Value of Open User Models. In: Proceedings of Proceedings of the 20th International Conference on Intelligent User Interfaces, Atlanta, Georgia, USA, ACM, pp. 202-212
5. Weber, G. and Brusilovsky, P. (2001) ELM-ART: An adaptive versatile system forWeb-based instruction. International Journal of Artificial Intelligence in Education 12 (4), 351-384.
6.
7. Ahn, J.-w., Brusilovsky, P., Grady, J., He, D., and Syn, S.Y. (2007) Open user profiles for adaptive news systems: help or harm? In:
16th international conference on World Wide Web, WWW '07, Banff, Canada, May 8-12, 2007, ACM, pp. 11-20
18. gence analysts [16] with 18 topics and ground-truth informa-
tion. In order to select as equally difficult topics as possible
and to make them comparable with each other, we devised
ameasure considering the ground-truth information distribu-
tion in the corpus. In some topics, answers are concentrated
in a small number of documents (easier because it reduces
user efforts to explore more documents) whereas other top-
icsdisperseanswersacrossmany documents(moredifficult).
Therefore, wedefined the standard deviation of relevant pas-
sage count per document as a pseudo topic complexity mea-
sure (Equation 1). Three TDT4 topics with equivalent topic
complexity wereselected asthestudy topics (Table1).
Complexitytopi c =
v
u
u
t 1
|Docr el |
|D ocr el |
X
i = 1
(|Passager el | − µ))2
(1)
The participants were recruited from the University of Pitts-
burgh and Carnegie Mellon University. They were expected
toplay theroleof information analystswhocould proficiently
operate the search systems for the task completion and were
required to meet the following criteria: (1) native English
speakersor equivalent language abilities; (2) sufficient infor-
Search Session 1
System: VIBE or VIBE+NE
Post-
questionnaire
Search Session 2
System: VIBE or VIBE+NE
Post-
questionnaire
Exit Interview
Figure3: Study procedure
to fill out post-task questionnaires and took 10-minute exit
interviews.
Table1: Topic difficulty: distribution of relevant information
Topic ID 40009 40021 40048
Complexitytopi c 71.46 73.98 64.12
(most difficult) (least difficult)
STUDY DESIGN
A user study was conducted to test the advantages of Adap-
tive VIBE+NE’s concept-based visual open user modeling.
The study was designed to simulate the work situation of
an information analyst engaged in asufficiently complex ex-
ploratory search with information foraging and sense-making
stages [21]. The tasks and the documents were provided by
anexpanded TDT4 (Topic Detection andTracking) document
collection that contains 28,390 English documents published
from October 2000 to January 2001. Theoriginal TDT4 top-
ics were enriched to resemble the tasks performed by intelli-
genceanalysts [16] with 18 topics and ground-truth informa-
tion. In order to select as equally difficult topics as possible
and to make them comparable with each other, we devised
ameasure considering the ground-truth information distribu-
tion in the corpus. In some topics, answers are concentrated
in a small number of documents (easier because it reduces
user efforts to explore more documents) whereas other top-
icsdisperseanswersacrossmany documents(moredifficult).
Therefore, wedefined the standard deviation of relevant pas-
sage count per document as a pseudo topic complexity mea-
sure (Equation 1). Three TDT4 topics with equivalent topic
complexity wereselected asthestudy topics (Table1).
Complexitytopi c =
v
u
u
t 1
|Docr el |
|D ocr el |
X
i = 1
(|Passager el | − µ))2
Introduction Statement
Entry Questionnaire
Training
Search Session 1
System: VIBE or VIBE+NE
Post-
questionnaire
Search Session 2
System: VIBE or VIBE+NE
Post-
questionnaire
Exit Interview
Figure 3: Study procedure
19. Table 2: Comparison of x-coordinates of relevant and non-
relevant document cluster centroids
Mean x-coord Rel Non-rel diff
Overall 458.03 379.55 78.48
VIBE 414.43 372.36 42.07
VIBE+NE 492.58 397.66 94.92
relevant and non-relevant documents and place relevant doc-
ument clusters closer to the user model (to the right in Fig-
ure 1). It is similar to the behavior of search systems that
promote the relevant documents to the top of the lists. How-
visual separation of relevant docs
ration (around 100 pixels) than keyword-based user models
(VIBE, around 69 pixels) and the relevant document cluster
in VIBE+NE is even closer to the user models than VIBE
(492.58 versus 414.43). The difference is statistically signif-
icant (Kruskal-Wallis rank sum test, p < 0.001). This result
confirms the aforementioned simulation result [3] and sug-
geststhat VIBE+NE hasstronger relevant document discrim-
ination power than baseline VIBE. Table 3 provides another
metric regarding the document cluster quality. The Davies-
Bouldin Validity Index (DB-index) [13] measuresthecluster-
ing quality by comparing the within document-centroid dis-
tances versus between-cluster centroid distances. A smaller
Table3: Comparison of DB-index between systems
System VIBE VIBE+NE p
Overall 1.70 1.69 < 0.001
Figure 5: M
progress: co
DB-index v
VIBE+NE
the differen
0.001).
Position of
the ability t
to visual us
relevance b
document s
uments are
easily ident
X-coordinates of relevant/non-relevant document cluster centroids
DB-index comparison betweenVIBE andVIBE+NE (DB-index smaller, better clustering)
24. E
0.00
0 10 20 30
Opened relevant document count
Figure 6: Comparison of relevant document open count
ser to moresimilar POIsin AdaptiveVIBE and their posi-
ns are updated dynamically while users drag related POIs.
is dynamic visualization feature can let users manipulate
elayout of theuser model POIsandinstantly learn theeffect
the manipulation on the retrieved documents. Therefore,
compared the participants’ POI movement (or POI drag-
ng) event counts with system and user precision. System
ecision wascalculated astheprecision of top-10 documents
d user precision was calculated as the precision of user
ened documents. Figure 7 shows the correlation between
OI manipulation counts and system/user precision. The re-
ession lines suggest no statistical evidence that POI manip-
ation degraded system or user performance. Figure 8 and 9
eak down theoverall correlations into keyword POI and NE
OI correlations respectively. Among them only keyword-
er precision shows significantly negative result (Figure 8
elow), p = 0.0179). However the system precision (Fig-
e8 (above)) still doesnot show any significant degradation.
suggests the system could maintain high performance re-
dless of user POI manipulation but the users eventually
Figure 7: POI movement versus performance (all POIs)
Table 8: Subjectivefeedback: topic difficulty
Topic 40009 40021 40048
Mean Topic Difficulty 2.77 3.27 2.68
25. position of system recommended docs
made wrong decisions. In fact the R-square score is rela-
tively low (R2
= 0.179) and the graph shows that a few out-
liers resulted in negative results. Moreover NE POI manip-
ulations show no performance degradation (Figure 9), which
hints at theadvantages of semantic named-entities during the
user manipulation of visual user model elements. Theseanal-
ysessuggest that thevisualization-based openuser model ma-
nipulation could overcomethedisadvantageof thetext-based
open and editable user modeling, which was observed in the
previous studies.
Table 7: Subjectivefeedback: positivereactions
System VIBE VIBE+NE
Average Score 3.18 3.39
SD 0.98 0.93
Positivecount 4 9
Subjective Feedback
Weasked the participants to
5-point Likert scale (1=dislik
ence was close to significant
ble 7, Kruskal-Wallis rank s
also compares relativepositiv
subjects preferred VIBE or V
preferred VIBE+NE to VIB
(20). Table 8 compares top
the participants (1=easy, 5=d
make the three topic difficul
perceived differences. The
and the easiest one was 400
significant (Kruskal Wallis ra
rs manipulate
earntheeffect
s. Therefore,
(or POI drag-
sion. System
10 documents
cision of user
ation between
sion. The re-
at POI manip-
Figure 8 and 9
dPOI andNE
nly keyword-
sult (Figure 8
recision (Fig-
t degradation.
rformance re-
ers eventually
score is rela-
Figure 7: POI movement versusperformance (all POIs)
Table8: Subjectivefeedback: topic difficulty
Topic 40009 40021 40048
Mean Topic Difficulty 2.77 3.27 2.68
User preference on two systems (1=dislike, 5=like)
Topic difficulty (subjective)