An introductory presentation about the current state of personalization in (Web) search for Bibliotekarforbundet's series of 'gå-hjem-møder'. Presented on May 17, 2016 at Aalborg University Copenhagen.
2. Outline
• Past
- What is the basic foundation of search engines?
• Present
- How do search engines personalize the results?
• Future
- What direction are we moving in?
2
4. Search is everywhere!
• Some statistics
- 82.6% of internet users use search engines
- 93% of online experiences begin with a search engine
- Google receives ~3.3 billion searches per day
- Since 2015 half of all searches come from mobile
- Size of Google’s index exceeds 100 million GB
- 80% of users prefer personalized search
4
6. Content
• 2nd generation Web search
- Early 1990s
- Examples: Lycos, Altavista, AllTheWeb, ...
• Ranking signals
- Term frequency (TF)
‣ Term more frequent in document → more important for that document
- Inverse document frequency (IDF)
‣ Term unique for that document → more important for that document
- TF·IDF
‣ Combined term score of both TF and IDF
6
8. Content-based ranking
8
Z
...
vector
representation
0 0 1 0 0 0 0 0 0 0 1
frequency of term 1 in
the query/document
frequency of term 2 in
the query/document
Y 6 0 0 0 0 9 0 3 7 0 0
X 8 0 4 0 0 0 2 0 0 0 3
0 4 0 5 0 0 0 0 0 0 0
all unique words in the index
10. Links
• 3rd generation Web search
- Take the link structure of the Web into account
- Second half of 1990s
- Examples: Google (PageRank), Ask! (HITS)
• Ranking signals
- Website popularity
‣ More incoming links → higher popularity
‣ More incoming links from popular pages → higher popularity
10
13. Personalization
• Definition
- Providing search results tailored to the individual user
• History
- 1998: Yahoo! MyWeb
- 2004: Google introduces personalized search
- 2007: iGoogle
13
14. Personalization
• Pros & cons
+ Saves time by reducing number of results to inspect
+ Better decision making by filtering out inferior information
– Filter bubble (as much a personal decision as an algorithmic restriction)
– Users as products (using search history for advertising)
14
16. Personal
• Information about the user him/herself
• Ranking signals
- Language
‣ Language preferences can be used to filter out results
- Demographics
‣ Google+ or predicted → can be used for re-ranking results
‣ Results selected by other users from similar cohorts can be ranked higher
16
original
relevance score
Q
P
R
% times selected by
demographically
similar users
+ =
combined
score
17. Social
• Information about a user’s social network
• Ranking signals
- Social network connections
‣ Results selected by friends for similar searches could be given more weight
‣ Web pages shared by friends could be given more weight
17
shared
by friends?
+ =
original
relevance score
Q
P
R
+
combined
score
% times selected
by friends
18. Activity: Query logs
• Information about the queries submitted by the user and
other users in the past
• Ranking signals
- Query suggestion
‣ Others users entered queries A and B in the same session → B might be a good
suggestion for a user entering query A
18
19. Activity: Query suggestion
19
Session 1 john
hotels New York1.
hotels Manhattan2.
affordable hotels Manhattan3.
sightseeing New York4.
One World Trade Center5.
Session 2 mary
oed1.
oxford english dictionary2.
Session 3 jane
youtube drumpf john oliver1.
Session 4 bob
oed1.
oxford english dictionary2.
Session 5 alice
sights New York1.
sightseeing New York2.
Brooklyn Bridge3.
One World Trade Center4.
oed oxford english dictionary
sightseeing New York One World Trade Center
sightseeing New York Brooklyn Bridge
Ranking principle:
Queries are similar
if they have been
issued in the same
session.
20. Activity: Query logs
• Information about the queries submitted by the user and
other users in the past
• Applications
- Query suggestion
‣ Others users entered queries A and B in the same session → B might be a good
suggestion for a user entering query A
- Spelling correction
‣ Immediately after query X other users entered
query Y → Y might be the
correct version of query X
20
21. Activity: Browse logs
• Information about the results clicked on by the user and
other users in the past
• Ranking signals
- Similar results in the same session
- Similar results in the same user browsing history
21
Session 1
http://www.nycgo.com1.
http://www.lonelyplanet.com/new-york2.
http://www.citypass.com/new-york3.
https://oneworldobservatory.com/4.
http://www.esbnyc.com/5.
sightseeing New York Session 2
http://www.lonelyplanet.com/new-york1.
sightseeing New York
https://oneworldobservatory.com/
http://www.esbnyc.com/
22. Context
• Information about the context in which the search is performed
• Ranking signals
- Location
‣ Used to prioritize locally relevant results
‣ Essential for mobile search
- Device
‣ Has the page been optimized for the user’s current device?
- Date & time
‣ Seasonal influences, home vs. work, ...
- ...
22
23. Learning to rank
• Learning the optimal combination of all ranking signals
- Goal: to do this continuously and automatically using machine learning
‣ Predict for each query-result pair whether the result is relevant for that user’s
query at this specific time
• Machine learning is the science of teaching a computer how
to perform a task without explicitly programming it
- Detect common patterns in the data
‣ Our data → different ranking signals related to query and document
- Associate those patterns with specific outcomes
‣ Our outcomes → overall relevance score
- The more examples for the computer, the better!
23
24. Learning to rank
24
1
Example Ranking signal vector
Document
• Similarity with query vector
• Recency
• Readability score
• Language
• Spam score
0.904
Query
• Type of information need
• Entities (company, person)
• Trending topic?
Personal
• Preferred language?
• Selected by
demographically
similar users
Links
• PageRank
• Personalized PageRank
• TrustRank
25. Learning to rank
25
1
Example Ranking signal vector Relevance
✓
DocumentQuery PersonalLinks
Social
• Selected by friends
• Shared by friends
Activity
• Selected by similar users
• Selected for related
queries
Context
• Optimized for
current device?
• Related to current
location
• Related to current
date/time
26. Learning to rank
26
Example Ranking signal vector Relevance
✓1
✗2
...
3.3 billion examples per day!
3 ✗
4 ✗
5 ✓
6 ✗
27. Personalization in academic search
• What ranking signals are available in academic search?
Content
‣ Publications, teaching materials, supervised theses, homepages, grants, ...
Links
‣ Citation networks, ...
Personal
‣ LinkedIn endorsements, expertise areas, ...
Social
‣ LinkedIn, Academia.edu, ResearchGate, Mendeley, CiteULike, ...
27
28. Personalization in academic search
Activity
‣ Teaching, supervision, organization, service to the profession, ...
Context
‣ Research vs. teaching, active project, previously read, ...
28
30. Task-awareness
• Search is rarely a goal in itself → often associated with the
completion of a larger task
- Tasks are complex, involving a nontrivial sequence of steps
- Tasks are knowledge-intensive, requiring access to and manipulation of
large quantities of information
- Example: Planning a family vacation
• Awareness of the background task is essential to take
personalization to the next level
- Detecting & supporting multiple search strategies
- Supporting filtering, sorting, and aggregating of results
30