2. The WebLog
AnonID Query QueryTime ItemRank ClickURL
142rentdirect.com 01/03/2006 07:17
142www.prescriptionfortime.com 12/03/2006 12:31
142staple.com 17/03/2006 21:19
142staple.com 17/03/2006 21:19
142www.newyorklawyersite.com 18/03/2006 08:02
142www.newyorklawyersite.com 18/03/2006 08:03
142westchester.gov 20/03/2006 03:55 1
http://www.westchesterg
ov.com
142space.comhttp 24/03/2006 20:51
The WebLog is AOL weblog made available to public in 2006
3. The goal
Building a query suggestion application
exploting the information observed on the AOL
WebLog.
Constrains:
1) the application relies on observed queries
2) The application needs to be fast!
4. The approach
Exploiting the relation between typed queries
and clicked URL by AOL users:
If two queries share “a lot or URLs”
then they are strongly related to
each other
5. “a lot of URLs”….
Several approaches can be followed for linking
observed queries to clicked URLs
We’ve been inspired by “Query-URL Bipartite
Based Approach to Personalized Query
Recommendation” paper by Li, Yang, Liu,
Kitsuregawa, Proceedings of the Twenty-Third
AAAI Conference on Artificial Intelligence (2008)
6. Idea 1/2
Let q(i) be the i-th query and u(k) be the k-th
clicked url after a query is typed
A Bipartite Graph
can be built such
that for each q(i)
belonging to the
query set, a link to a
subsequent clicked
url u(k) can be
defined
7. Idea 2/2
Once a Bipartite Graph has been built, a relation
between any query belonging to the query set
can be established accordingly to the clicked
URLs.
An Affinity Graph over the
query set can be defined
consequently, where the
edges between two
queries have to be
weighted in order to
exploit it in a suggestion
task
8. Weighting the Edges
𝒘 𝒊, 𝒋 = 𝒌=𝟏
𝑼
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝑼𝑹𝑳(𝒌) 𝒊𝒔 𝒄𝒍𝒊𝒄𝒌𝒆𝒅 𝒃𝒚 𝒒 𝒊 𝒂𝒏𝒅 𝒒(𝒋)
𝒌=𝟏
𝑼
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝒂𝒏𝒚 𝑼𝑹𝑳(𝒌) 𝒊𝒔 𝒄𝒍𝒊𝒌𝒆𝒅 𝒃𝒚 𝒒 𝒊 + 𝒌=𝟏
𝑼
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝒂𝒏𝒚 𝑼𝑹𝑳(𝒌) 𝒊𝒔 𝒄𝒍𝒊𝒌𝒆𝒅 𝒃𝒚 𝒒 𝒋
Let q(i) be the i-th query and u(k) be the k-th clicked url
after a query is typed
w(i,j) is equal to 1 if once q(i) or q(j) are passed the same URLs are clicked
w(i,j) is equal to 0 if once q(i) or q(j) are passed, all the clicked URLs don’t
match
9. Managing “over-clicked URLs”
On the AOL 2006 WebLog dataset there exist a number
of URLs which are over-clicked by users, independently
of the query they type before clicking them.
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
-foot-and-mouth-…
http://books.stores.ebay.ie
http://dixonmayfair.com
http://grounds-mag.com
http://local.infospace.com
http://p072.ezboard.com
http://shop.treonauts.com
http://vipcams.literotica.com
http://www.acbarandgrill.com
http://www.alyandaj.com
http://www.assplundering.com
http://www.beardieagilitydie…
http://www.bodo.com
http://www.calnhs.org
http://www.chantcd.com
http://www.clubunlimited.com
http://www.creativeforecasti…
http://www.dennys.com
http://www.duplicolor.com
http://www.esilvercart.com
http://www.fitzandfloyd.com
http://www.gamecubecheats…
http://www.grandmashandsb…
http://www.henrymedical.com
http://www.i-m-t.demon.co.uk
http://www.jacksonsoccer.com
http://www.keyloggers.com
http://www.leesburg2day.com
http://www.madison.k12.ky.us
http://www.mercy.net
http://www.mp3sugar.com
http://www.netads.com
http://www.oceanviewinnan…
http://www.partsforlifts.com
http://www.poetsgraves.co.uk
http://www.radio-3.ru
http://www.robotstorehk.com
http://www.scotfest.com
http://www.skinashoba.com
http://www.starktaxes.com
http://www.talktorusty.com
http://www.theremyreport.c…
http://www.trollcarnival.com
http://www.vcta.com
http://www.welovedolls.com
http://www.xandocosi.com
URLs Click Count
10. Managing “over-clicked URLs”
Those URLS generate a noise in the query recommendation
algorithm. For this reason we selected only those URLs having
less than 1,000 clicks
0
100
200
300
400
500
600
700
800
900
1000
-foot-and-mouth-…
http://blackdicksmovies.deluxep…
http://dallasnative.com
http://freescreensaver.ezthemes…
http://jingdong.en.alibaba.com
http://mtv-spring-…
http://pub25.bravenet.com
http://store.vegas.com
http://westsideconnection.org
http://www.acsu.buffalo.edu
http://www.amarula.com
http://www.asht.org
http://www.bathandmore.com
http://www.blackmanlaw.com
http://www.buerge.com
http://www.caswells.com
http://www.chsb.org
http://www.colts.com
http://www.ctahperd.org
http://www.dewattoport.com
http://www.dvdworldonline.com
http://www.ericdaugherty.com
http://www.findlayfpc.org
http://www.frugalhaus.com
http://www.gniarmls.com
http://www.hankingroup.com
http://www.homerwood.com
http://www.incomemax.com
http://www.jesusandkidz.com
http://www.kinray.com
http://www.lemassif.com
http://www.machinetools.net.tw
http://www.medrekforum.com
http://www.montgomerycollege.…
http://www.natalbelo.com
http://www.northlouisianaskydiv…
http://www.orientvisual.com
http://www.performancedogsina…
http://www.pptbackgrounds.fsn…
http://www.ravc.com
http://www.rodssteak-…
http://www.scms.ca
http://www.simplysiestakey.com
http://www.sportsstats.com
http://www.supersprings.com
http://www.thebeverlyhillscouri…
http://www.tombraidermovie.com
http://www.ulqini.de
http://www.virtualict.com
http://www.whipnspur.com
http://www.yardleylondon.com
URLs Click Count
11. Affinity Graph Representation
Once the edge weight is computed, for each query
q(i) we built a main dictionay having key = q(i) and
value equal to an ordered dictionary.
The ordered dictionary has keys equals to the
queries sharing at least 1 URL with q(i) and values
equal to w(i,j).
The main dictionary is used to feed the query
suggestion API and provide a reliable result in
milliseconds.