The document summarizes a study on how users examine and interact with query auto-completion (QAC) rankings. Eye tracking data was collected from 25 participants performing search tasks to analyze metrics like fixation time, query formulation time, and result clicks. Statistical models were used to analyze the effects of QAC ranking and position on user behavior. The results showed a strong position bias effect and that ranking changes can impact query effectiveness. The discussion considers different ways to measure QAC ranking quality and references related work.
An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge
1. Joint work with Bhaskar Mitra,
Milad Shokouhi, and Filip Radlinski
2.
3.
4.
5. How do searchers examine QAC rankings?
How does the quality of QAC rankings affect
examination and usage?
Are QAC examination and usage affected by position
bias?
7. massachu|
massachusetts
massachusetts state lottery
massachusetts unemployment
massachusetts registry of motor vehicles
massachusetts secretary of state
massachusetts department of revenue
massachusetts department of education
massachusetts general hospital
massachu|
massachusetts unemployment
massachusetts department of education
massachusetts secretary of state
massachusetts registry of motor vehicles
massachusetts
massachusetts general hospital
massachusetts department of revenue
massachusetts state lottery
original condition (production) random condition
Counterbalanced in blocks so maximum of 2 subsequent tasks are in the same condition.
8. Same tasks for all participants
navigational closed
informational
Included difficult-to-spell names
(schwarzenegger), terms that can
be abbreviated (wsj).
Example search tasks:
Find the homepage of the Massachusetts
General Hospital in Boston, USA.
What is their physical address?
(navigational)
Japan is the 10th most populated country
in the world. How many people live
there?
(easy informational)
How many matches did Roger Federer
win against Rafael Nadal in 2007?
(complex informational)
9. Tobii TX300
unobtrusive
tracks natural head movement
300 Hz temporal resolution
accuracy up to 0.4˚ visual angle
size of each QAC suggestion
on screen: 0.67˚
http://www.tobii.com/Global/Analysis/Downloads/Product_De
scriptions/Tobii_TX300_EyeTracker_Product_Description.pdf
10. Make searchers type:
Provide instructions and
search task descriptions on
screen (avoid copy-paste).
Participants: 25, diverse
backgrounds, level of
education, and computer
experience.
Instruction: Participate in a
study of search quality; start
search from bing.com, then
search any way you like.
14. Q1 Q2 Q3R1 R4S2 R5S4R2 S3
task completion time (TCT)
time to first result click (TFC)
T E T S T _ Q U E
S1
query formulation time (QFT)
time to first fixation (TFF)
A B
A + B = cumulative
fixation time (CFT)
R3
fixation (anywhere on the screen)
saccade (anywhere on the screen)
mouse click
typed character
QAC suggestions
shown
fixations on QAC
suggestions
control characters
QU QAC suggestion used
QR QAC rank
QL query length
CS characters saved
UQ unique queries submitted
UR unique result pages
+ query and task characteristics:
17. Q1 Q2 Q3R1 R4S2 R5S4R2 S3
T E T S T _ Q U E
S1
time to first fixation (TFF)
A B
A + B = cumulative
fixation time (CFT)
R3
fixations on QAC
suggestions
response type n β0 estimate β1 estimate
CFT > 0 binary
CFT | CFT > 0 log
TFF | CFT > 0 log
* marks coefficients that are estimated to differ significantly from zero.
18. Q1 Q2 Q3R1 R4S2 R5S4R2 S3
T E T S T _ Q U E
S1
time to first fixation (TFF)
A B
A + B = cumulative
fixation time (CFT)
R3
fixations on QAC
suggestions
response type n β0 estimate β1 estimate
CFT > 0 binary 331 3.468* 0.97 -0.220 0.96
CFT | CFT > 0 log 284 7.124* 1241 ms -0.043 1189 ms
TFF | CFT > 0 log 284 6.503* 667 ms -0.094 607 ms
* marks coefficients that are estimated to differ significantly from zero.
19. Fixations and use of AS by rank and condition. Condition has
little effect, suggesting a strong position bias.
AS suggestion rank
ASusage(percent)
Fixations (original)
Fixations (random)
AS usage (original)
AS usage (random)
meanfixationtime
(milliseconds)
20. Q1 Q2 Q3R1 R4S2 R5S4R2 S3
T E T S T _ Q U E
S1
query formulation time (QFT)
R3
mouse click
typed character
control characters
response type n β0 estimate β1 estimate
QFT log
QL Poisson
QU binary
CS | QU Poisson
QR | QU Poisson
* marks coefficients that are estimated to differ significantly from zero.
21. Q1 Q2 Q3R1 R4S2 R5S4R2 S3
T E T S T _ Q U E
S1
query formulation time (QFT)
R3
mouse click
typed character
control characters
response type n β0 estimate β1 estimate
QFT log 331 8.680* 5884 ms 0.058 6235 ms
QL Poisson 331 3.224* 25 -0.007 25
QU binary 331 -0.915* 0.29 -0.508 0.19
CS | QU Poisson 99 2.192* 9 0.223* 11
QR | QU Poisson 99 0.344* 1.4 0.044 1.5
* marks coefficients that are estimated to differ significantly from zero.
22. Q1 Q2 Q3R1 R4S2 R5S4R2 S3
task completion time (TCT)
time to first result click (TFC)
T E T S T _ Q U E
S1 R3
mouse click
response type n β0 estimate β1 estimate
UQ Poisson
UR = 0 binary
UR | UR > 0 Poisson
TFC | UR > 0 log
TCT ≥ ts binary
TCT | TCT < ts log
* marks coefficients that are estimated to differ significantly from zero.
23. Q1 Q2 Q3R1 R4S2 R5S4R2 S3
task completion time (TCT)
time to first result click (TFC)
T E T S T _ Q U E
S1 R3
mouse click
response type n β0 estimate β1 estimate
UQ Poisson 331 0.357* 1.4 0.044 1.5
UR = 0 binary 331 -3.654* 0.03 -0.022 0.02
UR | UR > 0 Poisson 282 0.703* 2.0 0.161* 2.4
TFC | UR > 0 log 282 8.625* 5569 ms -0.036 5372 ms
TCT ≥ ts binary 331 -3.217* 0.04 0.764 0.08
TCT | TCT < ts log 297 11.096* 65.9 s -0.021 64.5 s
* marks coefficients that are estimated to differ significantly from zero.
28. How to measure QAC ranking quality?
Rank-based (e.g., MRR, extracted from logs) e.g., [Shokouhi ‘13]
QAC usage [Kharitonov et al. ‘13]
Manual judgment of suggestions [Bhatia et al. ‘11]
Result page quality [Liu et al. ‘12]
Effort-based (e.g., MKS) [Duan & Hsu ‘11]
AB-tests [Kohavi et al. ‘13]
Interleaving [Hofmann et al. ‘13]
29. effects of ranking changes
strong position bias
effect
on query effectiveness
Next:
30. [Bhatia et al. ‘11] S. Bhatia, D. Majumdar, P. Mitra: Query suggestions in the absence of query logs
(SIGIR 2011).
[Duan & Hsu ‘11] H. Duan, B.-J. P. Hsu: Online spelling correction for query completion (WWW ‘11).
[Hofmann et al. ‘13] K. Hofmann, S. Whiteson, M. de Rijke: Fidelity, soundness, and efficiency of
interleaved comparison methods (ACM TOIS 31(4) 2013).
[Hofmann et al. ‘14] K. Hofmann, B. Mitra, M. Shokouhi, F. Radlinski: An Eye-tracking Study of User
Interactions with Query Auto Completion (CIKM 2014).
[Kharitonov et al. 13] E. Kharitonov, C. Macdonald, P. Serdyukov, I. Ounis: User Model-based
Metrics for Offline Query Suggestion Evaluation (CIKM 2013).
[Kohavi et al. ‘13] R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu, N. Pohlmann: Online controlled
experiments at large scale (KDD 2013).
[Li et al. ‘14] Y. Li, A. Dong, H. Wang, H. Deng, Y. Chang, C. Zhai: A Two-Dimensional Click Model for
Query Auto-Completion (SIGIR 2014).
[Liu et al. ‘12] Y. Liu, R. Song, Y. Chen, J.-Y. Nie, J.-R. Wen: Adaptive query suggestion for difficult
queries (SIGIR 2012).
[Mitra et al. ‘14] B. Mitra, M. Shokouhi, F. Radlinski, K. Hofmann: On User’s Interactions with Query
Auto-Completion (SIGIR 2014).
[Shokouhi ‘13] M. Shokouhi: Learning to Personalize Query Auto-Completion (SIGIR 2013).