Sigir2013 勉強会資料

SIGIR 2013
(Users and Interactive IR I)
デンソーアイティーラボラトリ山本光穂
資料中の図は論文を引用しております。
13年10月9日水曜日

An Effective Implicit Relevance Feedback Technique
Using Affective, Physiological and Behavioural Features
• Implicit Relevance Feedbackとは?
• ユーザの意図を推定した上で、関連する文章や結果を提示する
ことによって検索結果品質を向上させる手法[1]
• ユーザの意図を推定するためのfeatureとして以下の値を使えば
良い結果が得られることが知られている[2]
• dwell time: あるドキュメントに対する滞在時間
• task intension: 検索意図
• task intensionについてはユーザに直接聞くことはできない事
から、何らかの値を使って推定する必要
2
[1] Accurately interpreting clickthrough data as implicit feedback (SIGIR 2005) Thorsten Joachims , et all.
[2] A study on the effects of personalization and task information on implicit feedback performance(CIKM
2006) Ryen W. White, et all.

本論文で検証したいこと
• 検索中の表情や心拍数といったaffective,physiologicalな
情報がImplicit Relevance Feedbackで利用する
task intension推定に使えるかの分析
• どのような感情情報(表情)や生体情報(心拍、発汗量等)が検索意
図に利用できるか？
• どのような検索意図(seeking information, re-finding,
entertainment)の判定にこれら情報が有効か？
3

検証の流れ-1
1.被験者に対してvideo retrieval systemを利用させ、以下
の4つの検索タスクを実行させる
• INS Task (Information seeking intent)
• INF Task (re-finding search intent)
• ENA Task (entertainment-based search intent where searchers
adjust their arousal level)
• ENM Task (the entertainment- based search intent where searchers
adjust their mood
• なお、各タスク実施時には
あるタスクのお題が出されます。
•
4
BA
Figure 1: A snapshot of the video retrieval system for query “avengers”.
nology into consumer and industrial end-applications. Neu-
roSky MindKit-EMTM
features two key technologies: (i)
ThinkGear-EMTM
headset and (ii) eSense-EMTM
software
(i.e. brainwave interpretation software). The headset is
used to extract, filter, and amplify brainwave (EEG) signals
and convert that information into digital mental state out-
puts for eSense-EMTM
software. The EEG signals read by
the MindKit-EMTM
are detected on the forehead via points
Fp1 (electrode placement system by the International Fed-
eration in Encephalography and Clinical Neurophysiology).
The headset has three dry active sensors: one sensor located
on the forehead and two sensors are located behind the ears
as ground/reference sensors. It also has electronic circuitry
that filters and amplifies the brainwaves. The eSense-EMTM
software further processes and analyses the obtained brain-
wave signals into two useful neuro-sensory values: the user’s
Attention4
and Meditation5
levels at any given moment.
The output of eSense-EMTM
software has been tested over
a wide population and under di↵erent environmental condi-
the output of the BodyMedia SenseWearR
Pro3 Armband;
and the Attention or Meditation data (referred to as “NV”)
from the output of the eSense-EMTM
software. For our be-
havioural signal, we considered the dwell time (referred to
as “DT”) logged by the system as our dwell time feature.
Finally the task intention was considered as task feature
(referred to as “Task”).
Preprocessing: For each visited video, the value of
each sensory feature (for both a↵ective and physiological
features) was calculated by averaging the data logged by
its sensory device during the dwell time period. Since none
of the instruments we used normalised the data, we scaled
signal values before applying any classification method, to
avoid having attributes in greater numeric ranges dominat-
ing those in smaller numeric ranges.
3.4 Video Retrieval System
For the completion of the search tasks we used a custom-
made search environment (named VideoHunt) that was de-13年10月9日水曜日

検証の流れ-2
2.Affective Signals, Physiological Signalsを以下の4つの
情報を取得
1) FX (affective features, eMotionから取得した19種の特徴量)
1)MU(motion unit featuresから取得した7種類(happiness, sadness,
anger, fear, disgust, and surprise)の感情特徴量) + AU(Ekman’s
Action-unitから取得した12種の特徴量)
2) HR(heart rate data, )
3) AB(電気皮膚反応や皮膚温度、near-body ambient temperature, heat
flux(熱流速?),BodyMedia SenseWear Pro3を利用)
4) NV(Attention or Meditation data, eSense-EMを利用
3.同上のデータ+video retrieval system実行時のdwell
timeを利用してタスク意図モデルを作成及び精度推定
5

各タスクの詳細とユーザへのお題-1
1)INS Task (Information Seeking intent)
• 自分の知識に存在しないがクエリーをキーワードとして出せるぐらいは知
識として存在する。
2)INF Task(Information re-finding search intent)
• ある特定のドキュメントを探している。なお、目的とするドキュメントは
一つしかない。(INF は条件に合うドキュメントならなんでも良い)
• ユーザはその探索しているドキュメントを一度は過去に見たことがある。
6
「大学を卒業して地方の会社で面接を受けている状況を想像してください。インタビューにおけるある過程で、どの地
域で働きたいかを聞かれたで聞かれました。あなたはそのインタビューに対し非常に熱意を持って接していますが、一
方でその知識に対する知見が欠けているので、事前に調査したいと考えています。
「友達と数日前にみたビデオのことについて話している状況を想像してください。彼らはそのビデオに対して興味を示
してきたため、あなたはそのビデオのリンクを送りたいとします。あなたはそれらの内容については覚えていますが、
一方でそのタイトル、tag等の情報は一切覚えていない状況ととします」
以下の4つの検索意図を想定して検証を行う

各タスクの詳細とユーザへのお題-2
3)ENA Task(Entertainment-based search intent where
searchers adjust their arousal level)
• 楽しみ・快楽の為にある情報を検索する。
• 覚醒レベルを上げるため (眠気を取り除くため)
4)ENM Task(the entertainment-based search intent
where searchers adjust their mood)
• 楽しみ・快楽の為にある情報を検索する。
• 現状のムードを変更するために映像をみる。
7
あなたが工場で夜警をしていることを想像してください。あなたはちょうど工場内のチェックが終了し、次のチェック
まで時間が少しだけあるとします。あなたは少しつかれていますが、次の回のチェックまで気分転換の為になにかビデ
オをみようと決めました。
あなたは彼氏/彼女と旅行しているとします。通常あなたたちは何らかの理由からめったにあえないとします。旅行期間
があと数日になり、彼氏/彼女はあなたと会えなくなることを寂しく思っています。このような気持ちを変えるためにあ
なたはなにかビデオを見ようときめました。

• affective,physiologicalの情報を組み合わせるとランダム値と比較して推定精度が5%程度改善
• さらに、Dwell Timeと組み合わせると15%くらい精度が改善
結果(ユーザの検索意図の推定精度)
8
Table 2: This table shows the prediction accuracy of a model trained on di↵erent sets of features (presented as rows), given di↵erent se
ntentions (presented as columns). The best performing set of features for each condition and search intention is highlighted in bold.
INS ENA ENM INF ALL - INF ALL
Random [BL1](*) 54.88% 64.06% 64.53% 98.83% 61.19% 50.60%
DT [BL2](†) 62.40% 65.62% 66.16% 98.83% 71.31% 72.74%
DT+Task [BL3](‡) –% –% –% –% 69.65% 76.63%
FX 55.63%*
(+1.3%)
66.4%**
(+3.6%)
64.53%
(+0%)
98.83%
(+0%)
62.43%**
(+2.0%)
64.54%**
(+27.4%)
AB 54.88%
(+0%)
64.06%
(+0%)
64.53%
(+0%)
98.83%
(+0%)
61.19%
(+0%)
50.60%
(+0%)
HR 57.89%**
(+5.4%)
64.06%
(+0%)
64.53%
(+0%)
98.83%
(+0%)
62.93%**
(+2.8%)
53.27%**
(+5.2%)
NV 55.63%*
(+1.3%)
64.06%
(+0%)
64.53%
(+0%)
98.83%
(+0%)
61.19%
(+0%)
55.73%**
(+10.1%)
FX+AB+HR+NV 55.63%*
(+1.3%)
69.53%**
(+8.5%)
64.53%
(+0%)
98.83%
(+0%)
67.16%**
(+9.7%)
65.98%**
(+30.3%)
DT+FX 67.66%††
(+8.4%)
68.75%††
(+4.7%)
71.63%††
(+8.2%)
98.83%
(+0%)
72.88%††
(+2.2%)
77.04%††
(+5.9%)
DT+AB 66.91%††
(+7.2%)
67.96%††
(+3.5%)
81.56%††
(+23.2%)
98.83%
(+0%)
71.64%
(+0.4%)
76.22%††
(+4.5%)
DT+HR 63.15%†
(+1.2%)
73.43%††
(+11.9%)
82.26%††
(+24.3%)
98.83%
(+0%)
72.13%†
(+1.1%)
76.22%††
(+4.5%)
DT+NV 64.41%††
(+3.2%)
70.31%††
(+7.1%)
74.46%††
(+12.5%)
98.83%
(+0%)
72.13%†
(+1.1%)
75.40%††
(+5.6%)
DT+FX+AB+HR+NV 66.16%††
(+6.0%)
75%††
(+14.2%)
80.14%††
(+21.1%)
98.83%
(+0%)
75.37%††
(+5.6%)
77.04%††
(+5.9%)
DT+Task+FX+AB+HR+NV –% –% –% –% 76.36%‡‡
(+9.6%)
78.89%‡‡
(+2.9%)
the prediction accuracy of a model trained on dwell time and
task features signiﬁcantly (i.e. “DT+Task”row). The results
also show that the prediction accuracy of such a model is
even higher than a model trained on all features except task
one (i.e. “DT+FX+AB+HR+NV” row). This show that re-
searches on task prediction are complementary to this study
rather than contradictory.
An interesting ﬁnding is that the discriminative powe
sensory signals changes once they are combined with d
time, even though they show no such power individually.
example, a sensory feature that was not discriminative
its own for a task (e.g. “HR” feature for “ENM” task), w
combined with dwell time, can result in the highest pre
tion accuracy (i.e. “DT+HR” features for “ENM” task)13年10月9日水曜日

• 音声検索における音声入力エラー発生時の
ユーザの行動及び検索結果精度評価
• 音声入力におけるクエリーのパフォーマンスの調査
• 間違った音声クエリが入力された際、
ユーザーはどのようにクエリーを修正するか
• クエリ修正を利用した際の検索パフォーマンス調査
9
How Do Users Respond to Voice Input Errors?
Lexical and Phonetic Query Reformulation in Voice

実験手法
• ユーザに特定のテーマについて検索させ、検索クエリー(qv),システムの認識
結果(qtr),ユーザの検索結果の選択履歴を取得
10
①音声クエリを入力 ②認識結果を提示 ③検索結果を表示。ユーザは
求めていた情報の場合クリック
indicate its various statuses, which includes: starting or stopping
“listening” a voice query; displaying the transcribed query; and
failing to generate the transcribed query. These audio cues are
very useful in our transcriptions of the experiment recordings.
(a) (b) (c)
Figure 1. Screenshots of the Google search app on iPad.
3.2 Search Tasks and Topics
Our experiment setting is similar to the one adopted by the
TREC session track [17], in which users can issue multiple
queries to work on one search topic.13年10月9日水曜日

How Do Users Respond to Voice Input Errors?
Lexical and Phonetic Query Reformulation in Voice
• 検索タスク
• TREC 50 task(30 form robust track, 20 from web track)
• 検索時間
• 二分間
• ユーザがして良いこと
• クエリの再構成
• googleのクエリサジェスチョンを利用すること
• 検索結果のブラウジング及びクリック
• 以上を20人のネイティブ・スピーカーに対して実験
11

実験の流れ
12
EXPERIMENT PROCEDURE (90 MIN)
User
Background
Questionnaire
Training
(One TREC Topic)
(10 Topics) Interview
10 min
Break
(15 Topics)
Work on a TREC
topic for 2 min
Post-task
questionnaire
12

• 908のクエリにエラーを含む (55% of 1650)
• 810がクエリの認識エラー
• 98が不適切なシステムの割り込み
• 単語が認識できなかった場合は検索精度にそれほど影響を与えな
い。
• 一方で、誤認識した場合は検索精度に多大な影響を与える。
13
• 908 queries have voice input errors (55% of 1,650)
• 810 by speech recognition error
• 98 by improper system interruption
45%
49%
6%
% of all 1,650 voice queries
No Error
Speech Rec Error
Improper System
Interruption
1
QUERY TRANSCRIPTION
• qv (a voice query’s actual content)
• manually transcribed from the recording
• two authors had an agreement of 100%, except on
casing, plurals, and prepositions
• qtr (the system’s transcription of a voice query)
• available from the log
16
QUERY TRANSCRIPTION
• qv (a voice query’s actual content)
• manually transcribed from the recording
• two authors had an agreement of 100%, except on
casing, plurals, and prepositions
• qtr (the system’s transcription of a voice query)
• available from the log
16
INDIVIDUAL QUERIES: WORDS
• Missing words: words in qv but not in qtr
• Incorrect words: words in qtr but not in qv
qv: a voice query’s
actual content
qtr: the system’s
transcription
missing
words
incorrect
words
20
INDIVIDUAL QUERIES: PERFORMANCE
• Significant decline of search performance (nDCG@10)
No Errors
742 Queries
Speech Rec Errors
810 Queries
mean SD mean SD
nDCG@10 of qv
0.275 0.20 0.264 0.22
nDCG@10 of qtr
0.275 0.20 0.083 0.16
nDCG@10 - - -0.182 0.23
23
結果1 クエリのエラー率と検索精度
ユーザのクエリの修正方法

結果2 ユーザのクエリの修正方法
• クエリの追加(ADD)
• /
• クエリの言い換え(SUB)
14
TEXTUAL PATTERNS
• Query Term Addition (ADD)
• Query Term Substitution (SUB)
• SUB word pairs are manually coded (93% agreement)
Voice Query Transcribed Query ADD words
q1 the sun the son
q2 the sun solar system the sun solar system solar system
Voice Query Transcribed Query SUB words
q1 art theft test
q2 art embezzlement are in Dublin theft embezzlement
q3 stolen artwork stolen artwork embezzlement stolen
art artwork
TEXTUAL PATTERNS
• Query Term Addition (ADD)
• Query Term Substitution (SUB)
• SUB word pairs are manually coded (93% agreement)
Voice Query Transcribed Query ADD words
q1 the sun the son
q2 the sun solar system the sun solar system solar system
Voice Query Transcribed Query SUB words
q1 art theft test
q2 art embezzlement are in Dublin theft embezzlement
q3 stolen artwork stolen artwork embezzlement stolen
art artwork
33
クエリを修正する理由は(1)音声入力の認識結果の修正/(2)検索結果修正
の二点

• クエリの削除(RMV)
• /
• クエリの順序の変更(ORD)
15
TEXTUAL PATTERNS
• Query Term Removal (RMV)
• Query Term Reordering (ORD)
Voice Query Transcribed Query
q1 advantages of same sex schools andy just open it goes
q2 same sex schools same sex schools
q1 interruptions to ireland peace talk is directions to ireland peace talks
q2 ireland peace talk interruptions ireland peace talks interruptions
34
TEXTUAL PATTERNS
• Query Term Removal (RMV)
• Query Term Reordering (ORD)
q1 advantages of same sex schools andy just open it goes
q2 same sex schools same sex schools
q1 interruptions to ireland peace talk is directions to ireland peace talks
q2 ireland peace talk interruptions ireland peace talks interruptions
34

• ・
16
エラー後は，クエリの順序を変えたり，
クエリ中の単語を削除したりして次のクエリを入力する傾向
• When previous query has voice input error
• Increased use of SUB & ORD
• Less use of ADD & RMV
Patterns Prev Q Error Prev Q No Error Overall
ADD 90.50% 32.98% 53.82%
SUB 15.04% 16.34% 14.87%
RMV 66.75% 37.93% 48.37%
ORD 33.51% 43.03% 39.58%
(All Lexical) 99.74% 77.36% 85.47%
37
etic patterns are nearly always
with previous voice input errors
Prev Q Error Prev Q No Error Overall
0% 14.84% 9.46%
0% 0.60% 0.39%
0% 0.90% 0.57%
0.26% 9.30% 6.02%
0.26% 25.64% 16.44%
0% 20.54% 13.58%
38
• Use of phonetic patterns are nearly always
associated with previous voice input errors
Patterns Prev Q Error Prev Q No Error O
STR/SLW 0% 14.84% 9
SPL 0% 0.60% 0
DIF 0% 0.90% 0
WE 0.26% 9.30% 6
(All Phonetic) 0.26% 25.64% 1
Repeat 0% 20.54% 1

結果2 ユーザのクエリの修正方法(発話)
• WE
• すべてのクエリを強く言い直す
• REP
• リピートする
17
PHONETIC PATTERNS
• Partial Emphasis (PE)
• Overstate a specific part of a query
PE Type Example Explanation
Stressing (STR) rap and crime put stress on “rap”
Slow down (SLW) rap and c-r-i-m-e slow down at “crime”
Spelling (SPL) P·u·e·r·t·o Rico spell out each letter in “Puerto”
Different
Pronunciation (DIF)
Puerto Rico pronounce “Puerto” differently
35

結果-2 検索結果の修正手法
18
クエリの間違いが事前にあった場合は、クエリの修正を発話によって行うユー
ザはいる。
(ただ効果はいまいちらしい→音声認識は標準的な発話を前提としているため)
• Use of phonetic patterns are nearly always
associated with previous voice input errors
Patterns Prev Q Error Prev Q No Error Overall
STR/SLW 0% 14.84% 9.46%
SPL 0% 0.60% 0.39%
DIF 0% 0.90% 0.57%
WE 0.26% 9.30% 6.02%
(All Phonetic) 0.26% 25.64% 16.44%
Repeat 0% 20.54% 13.58%
38
onetic patterns are nearly always
d with previous voice input errors
Prev Q Error Prev Q No Error Overall
0% 14.84% 9.46%
0% 0.60% 0.39%
0% 0.90% 0.57%
0.26% 9.30% 6.02%
) 0.26% 25.64% 16.44%
0% 20.54% 13.58%
38
• Use of phonetic patterns are nearly alwa
associated with previous voice input err
Patterns Prev Q Error Prev Q No Error
STR/SLW 0% 14.84%
SPL 0% 0.60%
DIF 0% 0.90%
WE 0.26% 9.30%
(All Phonetic) 0.26% 25.64%
Repeat 0% 20.54%

結果-3 クエリ修正を利用した場合の検索精度
19
若干精度が向上するとのこと。
• Overall slightly improvement (10% in nDCG@10)
• But highly depends on whether or not voice input
error happened after query reformulation
• Did not reduce the likelihood of voice input errors
The reformulated
query has / is
nDCG@10
(before after)
# of cases
No Error 0.150 0.233 474 (40%)
Speech Rec Error 0.104 0.079 597 (51%)
Interruption 0.156 0.056 79 (6.7%)
Query Suggestion 0.201 0.223 32 (2.7%)
Overall 0.129 0.143 1,182
40

Mining Touch Interaction Data on Mobile Devices to
Predict Web
• タッチデバイス環境下でのビヘイビアログを使った，検索結果の適合
性予測
• 調べたい事
• ユーザは、マウスやキーボードをデスクトップコンピュータを使用する場合と比較
してタッチ対応のモバイルデバイスで異なる検索結果文書を表示するか否か
• 関連の低いドキュメントと比較して関連性の高い検索結果文書を表示した場合、タ
ッチ対応モバイルデバイス上で動作が異なるか否か
• 実験手法
• 以下のユーザに対して実験を実施
• smartphoneを使うユーザ：26人
• desktop pcを使うユーザ:30人
20

実験結果
• テスクトップ検索とモバイル検索の比較
• モバイルの方がページの滞在時間が長く，スクロール量が大き
い
• inactivity featureがrelevanceの推定に有効
• inactive = ページをきちんと読んでいるからでは
• gestureすればするほど，不適合なページである可能性が高い
• scanning - 適合ページを探している可能性が高い
21

Sigir2013 勉強会資料

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (6)

En vedette

En vedette (20)

Similaire à Sigir2013 勉強会資料

Similaire à Sigir2013 勉強会資料 (20)

Dernier

Dernier (10)

Sigir2013 勉強会資料