For discovering the new URI of a missing web page, lexical signatures, which consist of a small number of words chosen to represent the “aboutness” of a page, have been previously proposed. However, prior methods relied on computing the lexical signature before the page was lost, or using cached or archived versions of the page to calculate a lexical signature. We demonstrate a system of constructing a lexical signature for a page from its link neighborhood, that is the “backlinks”, or pages that link to the missing page. After testing various methods, we show that one can construct a lexical signature for a missing web page using only ten backlink pages. Further, we show that only the first level of backlinks are useful in this effort. The text that the backlinks use to point to the missing page is used as input for the creation of a four-word lexical signature. That lexical signature is shown to successfully find the target URI in more than half of the test cases.
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures
1. Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures Martin Klein, Jeb Ware, Michael L. Nelson {mklein,jware,mln}@cs.odu.edu JCDL 2011 Ottawa, Canada 07/14/2011
7. Link Neighborhood The Problem #1 A is about IKEA #2 B Bjoern Oslo Dorm room Nobel Herring #3 C extract 6
8. 7 Lexical Signatures (LSs) First introduced by Phelps and Wilensky[Phelps2000] Small set of terms capturing “aboutness” of a document, “lightweight” metadata Resource Abstract 10,000 terms 200 terms
9.
10. 5..8 for tags [Klein2011]How many backlinks to include? The more backlinklevels the better? What radius on the backlink page to use? 8
11. The Radius on a Backlink Page The Problem Entire page Paragraph Anchor text 9
25. References Concluding Remarks Jones73 K.Spärck Jones, “Index Term Weighting”, Information Storage and Retrieval, pp. 619-633, 1973 Klein2008 M.Klein, M.L.Nelson,“Revisiting Lexical Signatures to (Re-)Discover Web Pages”, ECDL 2008, pp. 371-382 Klein2010a M.Klein, J.Shipman, M.L.Nelson,“Is This a Good Title”, Hypertext 2010, pp. 3-12 Klein2010b M.Klein, M.L.Nelson, “Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure”, JCDL 2010, pp. 59-68 Klein2011 M.Klein, M.L.Nelson, “Find, New, Copy, Web, Page – Tagging for the (Re-)Discovery of Web Pages”, TPDL 2011 to appear Phelps2000 T.A.Phelps, R.Wilensky, “Robust Hyperlinks Cost Just Five Words Each”, technical report, Univesity of California at Berkeley, 2000 18
26. Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures Martin Klein, Jeb Ware, Michael L. Nelson {mklein,jware,mln}@cs.odu.edu
27.
28. The Results – Backlink Level The Problem Anchor text ± 10 words level-radius-rank 22