The recent increase in the volume and variety of video content available online presents growing challenges for video search. Users face increased difficulty in formulating effective queries and search engines must deploy highly effective algorithms to provide relevant results. This talk addresses these challenges by introducing two novel frameworks and approaches. First, we discuss a principled framework for multimedia retrieval that moves beyond 'what' users are searching for also to encompass 'why' they search. This 'why' is understood as the reason, purpose or immediate goal behind a user information need, which is identified as the underlying 'user intent'. We identify useful intent categories for online video search, present validation experiments showing that these categories display enough invariance to be successfully modeled by a video search engine and demonstrate the potential for these categories to improve video retrieval with a large crowdsourcing user study. Second, we present a novel approach able to predict for which queries results optimization is most useful, i.e., predicting which queries will fail in the search session of a user on a video search engine. Being able to predict when a video search query would fail is likely to make the video search result optimization more efficient and deploy optimization techniques more effectively. This approach uses a combination of features derived from the search log of a video search engine (capturing users' behavior) and features derived from the video search results list (capturing the visual variance of search results), with the objective to predict whether a particular query is likely to fail in the context of a particular search session.
Handwritten Text Recognition for manuscripts and early printed texts
User Intent in Online Video Search
1. The User at the Wheel of the
Online Video Search Engine
Christoph Kofler (c.kofler@tudelft.nl)
Delft University of Technology, Delft, The Netherlands
1
2. In this talk…
Two of our approaches presented at ACM Multimedia
2012, Nara, Japan:
1. User intent in video search
2. Query failure prediction in video search sessions
2
3. I.
Intent and its Discontent
ACM Multimedia 2012 Brave New Ideas
Work with Alan Hanjalic and Martha Larson
Slide credit: Martha Larson
3
5. Many results, but no satisfaction
Top ranked
results are about
koi ponds, but we
are discontent:
There is no
information
specifically about
the significance of
koi ponds.
5
6. Many queries, no satisfaction
Query suggestions
Refinement
strategies don’t
always work.
Query reformulation
6
7. Video Search Engine Workflow
Information
need
Query Results
“koi pond” list
Video search engine
• So what went wrong?
Openclipart.org: samukunai 7
10. Video Search Engine Workflow
Information
need
Query Results
“koi pond” list
Video search engine
• So what went wrong?
10
11. Video Search Engine Workflow
Information
need
Query Results
“koi pond” list
Video search engine
• So what went wrong?
We neglect the goal that the user is trying to reach…
…our video search is “blind” to user intent.
11
12. User information need
What Why
Query Results
“koi pond” list
Video search engine
User information need has two parts:
• Topic = What the user is searching for.
• Intent = Why the user is searching for it.
12
13. Removing the Intent Roadblock
The main research roadblock has been the question:
Which intent categories are
both useful to users
and technically within reach?
1. Categories of Intent: Which ones are useful to users?
2. Indexing Intent: Is intent technically feasible?
3. Impact of Intent: Could intent prevent discontent?
13
16. Natural Language Information Needs
• We harvested natural language information needs related to
video search from Yahoo! Answers.
• We analyzed 281 cases in which the user has clearly stated
the goal behind the information need.
16
17. User Search Intent Categories
• In an iterative process, we manually clustered the information
needs to identify the dominant user search intent categories
(using a card-sorting methodology).
Intent category Description
I. Information Obtain knowledge and/or gather information
II. Experience: Learning Learn something practically by experience
III. Experience: Exposure Experience a person, place, entity or event.
IV. Affect Change mood or affective state.
V. Object Video is its own goal.
17
19. Wider View on Video Intent
Search Intent: Creation Intent:
Video Intent
19
20. Is intent within our reach?
• We carry out a feasibility experiment using simple features from:
• Shot patterns
• Speech recognition transcripts
• User-contributed metadata: title, description, tags
v
e
r
s
u
s
Information Intent Affect Intent
20
21. Evaluating Classifiers for Intent
• Evaluate with two large sets of Internet video (from blip.tv)
• Train a classifier that assigns intent categories to videos.
• See paper for the experiment details; here selected results are
reported for the smaller, 350 hour set.
21
22. Features from shot patterns
• Shot patterns show promise.
• Weighted F-measure 0.53
• They are especially good in distinguishing
“Information” vs. “Affect”
Shot pattern from an “Information” video (correctly classified)
Shot pattern from an “Affect” video (correctly classified)
22
23. Features from ASR transcripts
• Speech recognition transcripts perform better (WFM 0.67)
• They don‟t reach the performance of tags (WFM 0.77)
“Egon comes packaged on a really nice looking blister cover that
features some great super natural colors and images from the
films. The back of the package features a really cool bio…”
Transcript excerpt from an “Experience: Exposure” video (correctly classified)
“It’s Thursday, April 10 2008. I am Robert Ellis, and this is your
Thursday snack. Welcome back to political lunch. Barack Obama
has painted himself in some ways,…”
Transcript excerpt from an “Information” video (correctly classified)
23
25. Experiment on User Perception of
Intent
• Workers were presented with a set of three videos returned by
YouTube in response to a query.
• The videos are about the same topic, i.e., “what”
• We ask if the videos have the same intent, i.e., “why”.
Short excerpt of the user study survey:
25
26. User Agreement on Video Intent
• Setup: For each of the 883 queries, three workers filled in
the survey (total 294 workers).
• Results: For 55% of the queries, 2/3 workers agreed that
the set contained videos representing at least two different
intent categories.
• Conclusions:
• If online video search engines become “intent-aware”,
users will indeed notice the difference.
26
27. Examples of Agreement on Intent
Query: „human metabolism Query: „motorcycle‟
glycolosis‟
Agreed on
Agreed on “Experience:
“Information” Learning”
Agreed on
Agreed on “Experience:
“Information” Learning”
Agreed on Agreed on
“Affect” “Affect”
27
29. Take-home message
• Intent can help us develop video search engines that get
users where they want to go.
• We have removed the video search intent roadblock: We
have shown which intent categories are important and that
they are in reach.
More challenges lie in the
road ahead.
29
30. Challenge 1: Evaluating Intent
• Quantifying the ability of intent to prevent discontent.
“My search engine
finds topics, but is it
getting me where I
want to go?”
Flickr: sean dreilinger 30
31. Challenge 2: Isolating Intent
• Addressing videos that fit multiple intents.
“I‟m not relaxing, I‟m
a biologist studying
fish feeding habits.”
31
32. Challenge 3: Implementing Intent
Query Results
“koi pond” list
Video search engine
• Implementing intent into the video search engine workflow.
“Intent fits anywhere and everywhere”
32
33. II.
When Video Search Goes Wrong
ACM Multimedia 2012 Multimedia Search and Retrieval
Work with Linjun Yang, Martha Larson, Tao Mei, Alan Hanjalic, Shipeng Li
Delft University of Technology, Delft, The Netherlands
Microsoft Research Asia, Beijing, China
33
34. Searching gets complex!
• Searching for videos on the Internet becomes increasingly
complex
• Users face increased difficulty in formulating effective and
successful text-based video search queries
34
37. Deployment of existing algorithms
Algorithms improving the performance of video search engines
have been developed for whole search pipeline
1. Not effectively deployed
2. “Expensive” for both user and search engine
37
38. How can we improve?
Predicting when users will fail in their search session…
…can help to more effectively deploy these algorithms
Focus of this
contribution!
Concept-based retrieval … Particular query suggestion
Better search results for user and “cheaper” for engine
38
39. Approach and Motivation
• Context-aware Query Failure Prediction
• Prediction of success or failure of a query at query time…
• …within a user‟s search session with the video search engine
Patterns of users’ interaction with the search engine
Visual features from search results list produced by query
• When does a query „fail‟? No search results click
39
40. Terminology: Query performance
prediction (QPP)
• Predict retrieval performance of query
• Correlates with precision
• How topically coherent are search results? (clear vs. ambigious)
• Statistics involve
• Query string
• Background collection
• Search results
• No search session context
40
43. Why QPP in Video Search is not
enough: User Perspective
0.5
(Almost) all fail (Almost) all successful
0.4
Frequency
0.3
0.2
0.1
0
0% 1-9% 10-19% 20-29% 30-39% 40-49% 50-59% 60-69% 70-79% 80-89% 90-100%
Proportion of success rate for queries
All engines YouTube Google video Bing video Yahoo! video
Example: koi history: 100K submitted, 60K successful
60% success rate
43
44. Why QPP in Video Search is not
enough: User Perspective
0.5
(Almost) all fail (Almost) all successful
0.4
Frequency
0.3
0.2
0.1
0
0% 1-9% 10-19% 20-29% 30-39% 40-49% 50-59% 60-69% 70-79% 80-89% 90-100%
Proportion of success rate for queries
All engines YouTube Google video Bing video Yahoo! video
Example: koi history: 100K submitted, 60K successful
60% success rate
Query performance prediction is not trivial in the majority of the cases,
since query success highly depends on the query‟s context.
44
45. Video Search Transaction Logs
Time Current URL Previous URL Query/Action Vertical
10:46:12 …search?q= - koi documentary video
koi+documentary
10:46:20 …search?q= …search?q= koi history video
koi+history koi+documentary
10:46:25 …q=koi+history&view=detail …search?q= <results click> video
&mid=E9589097DCE1DDD7D koi+history
17DE9589097DCE1DDD7D17
45
46. Context-aware
Query Failure Prediction
• Exploratory investigation of users’ search sessions,
stored in transaction log, to find characteristics indicative for
query failure
• Context is derived from query‟s context within a user‟s search
session
46
47. Context-aware
Query Failure Prediction
• Exploratory investigation of users’ search sessions,
stored in transaction log, to find characteristics indicative for
query failure
• Context is derived from query‟s context within a user‟s search
session
USER FEATURES:
QPP + Session Context
47
48. User Features (excerpt)
• General search session statistics
• Duration
• Number of interactions
• Search engine vertical switches
• Query formulation strategies and clarity
• Query reformulation types
• Differences between clarity of queries within session
• Overlapping query terms
• Mutually exclusive query topics
• Click-through data
• Click behavior in search results
• Dwell time on search results
48
49. Why QPP in Video Search is not
enough: Engine Perspective
49
50. Context-aware
Query Failure Prediction
• Exploit visual information of thumbnails of produced search
results list
• Consistency of visual content of search results on
conceptual level reflects topical focus of the results list
50
51. Context-aware
Query Failure Prediction
• Exploit visual information of thumbnails of produced search
results list
• Consistency of visual content of search results on
conceptual level reflects topical focus of the results list
ENGINE FEATURES:
QPP + Visual Search Results
51
52. Engine Features (excerpt)
• Show the potential of the visual information to be helpful
for query failure prediction
• Light-weight features to be
• Deployed during query time
• Covering the whole query space
• Higher-level representations are not scalable
• Video search results are represented by standard local and
global features
52
53. Model Training and Prediction
• Supervised learning trains generic classifiers on development
set using the extracted features
• One binary classifier for feature sets representing user and
engine features
53
54. Offline
User
Training
Features
Feature
Extraction
Engine
Features Model
Online
Context-
Engine features
Aware
Prediction
Q1 Q2 Q3 Q4
Feature
? Extraction
User features
54
56. Dataset
• Development set
• 24K search sessions
• 108K queries
• Test set
• 150K search sessions
• 1.1M queries
• 392K unique queries exclusively occur in the test set
• For each query, we collected information from 25 most-
relevant search results
• Textual information: titles of videos
• Visual information: static visual thumbnails
56
57. Baselines, Training, Evaluation
• Compare against a set of query performance prediction
baselines and the dominant class baseline
• Ground truth from clicks in search session
(from transaction log)
57
58. Performance
F (q. i. F (q. i.
Features WF
success) failure)
Best QPP baseline 0.6862 0.748 0.593
Feature combination from
0.7356 0.788 0.656
engine features
Feature combination from
0.7678 0.820 0.688
user features
Feature combination from
0.7744 0.830 0.690
user and engine features
• Engine features: +4% improvement
• User features: +8% improvement
• Combined features: +9% improvement
58
60. Discussion & Take home messages
1. Simple visual features from search results help to
extend query performance prediction
• Able to outperform conventional text-only query performance
prediction
• Performance increase (+4%) is quite modest, but promising
• Consistent with our expectations for our relatively simple
visual representations
• Can positively influence wrong predictions by user features-
only classifiers
60
61. Discussion & Take home messages
2. Features from the user context help the most for
query failure prediction
Three classes of query types benefited from our user features
(+8%)
1. User presumably wants recommendations over general
results, e.g., „youtube‟
2. Particular type of requested content is not available,
e.g., „free movies‟
3. Wrong video search engine usage (wrong vertical) or
misspellings, e.g., „yahoo mail‟, „micheal jackon‟
61
62. Discussion & Take home messages
2. Features from the user context help the most for
query failure prediction
• „Long tail‟ queries
• 36% of video queries in test set were submitted once
• Contribution of session context features is independent of the
frequency of query submission
• Challenge: „Cold start‟ queries do not have enough session
context
• Only very little information is needed to address the cold start
issue
62
63. Discussion & Take home messages
3. Context-aware Query Failure Prediction approach is
applicable using little session data
• Solely focuses on local search sessions
• No user profiles or global search patterns were involved in
the learning process
63
64. Future Work
1. Improvement of engine features using visual
information from the video search results list
• Higher-level representation of thumbnails
• Additional sources of visual information
2. Enhancing the performance of an entire range of
video search engine optimization techniques
3. Experimenting with additional definitions of query
failure (e.g., dwell time on search results)
64
65. The User at the Wheel of the
Online Video Search Engine
Christoph Kofler (c.kofler@tudelft.nl)
Delft University of Technology, Delft, The Netherlands
THANK YOU FOR YOUR ATTENTION!
65
Notes de l'éditeur
Approaches: or combinations of these
Approaches: or combinations of these
Approaches: or combinations of these
Approaches: or combinations of these
We don’t know when which method is good and when to deploy a particular methodUser does not get proper search results (as heard yesterday in BNI) and engine has to do unnessacerycompution which might not influence user and which is therefore senseless.
Approaches: or combinations of these
Approaches: or combinations of these
Approaches: or combinations of these
Extreme cases are well predicted by qpp because in these cases no context is necessary to make a relateively good prediction. In (almost) all of the cases, when the particular query string is submitted to a search session, indepentenly where in the search session, then it will either be successful or failed. So qpp would do a good job here. However…
Extreme cases are well predicted by qpp because in these cases no context is necessary to make a relateively good prediction. In (almost) all of the cases, when the particular query string is submitted to a search session, indepentenly where in the search session, then it will either be successful or failed. So qpp would do a good job here. However…
One source to infer context are transaction logs…
We looked into queries which fall in the middle category on the plot before, i.e., which have a lot of successful and failed query instances throughout different search sessions. Then we manually investigated these search session in order to infer characteristics of the user which point to success or failure of these queries, dependent on the session context.
In the paper we came up with 5 observations pointing to query failure. These are related to the iterative search goal development throughout the session, the satisfaction of the user with the results thus far in the session and so on. Due to time limitations, I am refering you to the paper at this point and just want to mention some features which we extracted from these observations which are indicative for query failure.general Internet browser session and search session statistics, (ii) query (re)formulation behavior and clarity of search goal expressiveness, and (iii) click-through data in the video search results lists generated by the queries in the search sessionTwo types of pre-query session historiesSession query historyQuery-specific reformulation historyFeatures are extracted from these local search session histories relative to the current queryWe do not learn user profiles or global search patterns
We heard yesterday in the cbir session that it is not necessarily related that the more specific the queyr, the more visually consisten the search results. So visual features give additional information next to text-based search results which could be exploited w.r.t. query failure.
High consistency should then indicate that the search engine has achieved good performance on the query that generated the results list.
Both, NSCQ and QC baselines achieve a good balance between correctly classified instances of -qif and +qif, however QC outperforms NCSQ. The relatively strong performance of the conventional QPP baseline demonstrates the potential and the strength of the text-retrieval methods to transfer to video retrieval problems. For the remainder of the experiments we compare performance against the best-performing conventional QPP baseline achieved by the query clarity score.
Our user indicator-based query failure prediction methods statistically significantly outperform the conventional QPP baseline (QC in Table 2) and achieve an 8% improvement in absolute performance solely by taking local search context into account. The best-performing method is the classifier built on features derived from ‘User familiarity’. Another strong performer is ‘Previous dissatisfaction’, reflecting previous failures in the session. For the observation ‘Query iterations’, using local features from the query-specific reformulation region of the search session increases the performance compared to using the entire query history results, suggesting the value of using narrow local context. The relatively poor performance achieved by observation ‘Goal-directedness’ suggests that search goal clarity evolving over a search session is not consistent. Early and late fusions perform well but do not succeed in outperforming individual well-performing observations. Looking at F-measure values of individual classes shows that classifying +qif using the proposed classifiers is more conservative than classifying –qif instances. Observations clearly achieve a much better result for –qif than for +qif. The characteristics of successful queries are presumably more stable, most likely reflecting the relatively greater stability of the characteristics of the successful query.
is a clear sign that the visual component of video search results should not be ignored, but rather potentially makes an important contribution to query failure prediction
is a clear sign that the visual component of video search results should not be ignored, but rather potentially makes an important contribution to query failure prediction
is a clear sign that the visual component of video search results should not be ignored, but rather potentially makes an important contribution to query failure prediction
is a clear sign that the visual component of video search results should not be ignored, but rather potentially makes an important contribution to query failure prediction
is a clear sign that the visual component of video search results should not be ignored, but rather potentially makes an important contribution to query failure prediction