... because research without search is just "re".
A talk teaching the basics of searching, made for the PhD students of the Department of Electronics and Information at Politecnico di Milano
2. 2
Table of contents
Introduction
... (ellipsis left by purpose)
Conclusions
Davide Eynard
ReSearch - 2008/06/06
3. 3
This seminar is not...
“Le risorse elettroniche per la ricerca”
a transversal course for the PhD Students of Politecnico di
Milano
This (June 2008?) will be the fourth edition
Very good material from previous editions is available at
http://www.biblio.polimi.it/documenti
Main topics:
• query languages
• online libraries, journals and ebooks
• tools to create and manage your bibliography
• search engines, deep Web, open archives, advanced browsing
• social publishing (blogs and RSS) and social bookmarking
• POLIsearch
• using the university proxy to access online resources
• notes on copyright issues
• search techniques (like PICO and SPICE)
Davide Eynard
ReSearch - 2008/06/06
4. 4
So... why?
Searching (and now, in particular,
being able to effectively search on
the Internet) is very important for
our research and, more generally,
in our lives.
Even if they are interested, some
students skip the course as it does
not give enough credits!
If you're interested in these topics,
ask for a solution (ie. increase the
credits, together with the teaching
material).
Davide Eynard
ReSearch - 2008/06/06
5. 5
So... what?
What is the real purpose of this lecture, then?
What are the contents?
Is this a short version of the PhD course?
Davide Eynard
ReSearch - 2008/06/06
6. 6
So... what?
What is the real purpose of this lecture, then?
What are the contents?
Is this a short version of the PhD course?
NAH!
There's so much material about search that we could prepare
ten complementary PhD courses...
Davide Eynard
ReSearch - 2008/06/06
7. 7
So... what?
What is the real purpose of this lecture, then?
What are the contents?
Is this a short version of the PhD course?
NAH!
There's so much material about search that we could prepare
ten complementary PhD courses...
Moreover, I already had some material I wanted to recycle
http://searchlores.org – a precious source for seekers
PowerBrowsing – an old project of mine
Davide Eynard
ReSearch - 2008/06/06
8. 8
So... what?
What is the real purpose of this lecture, then?
What are the contents?
Is this a short version of the PhD course?
NAH!
There's so much material about search that we could prepare
ten complementary PhD courses...
Moreover, I already had some material I wanted to recycle
http://searchlores.org – a precious source for seekers
PowerBrowsing – an old project of mine
BUT I also have something new to tell you, I promise!
Davide Eynard
ReSearch - 2008/06/06
9. 9
The Web
Search engines
cover (at best)
¼ of the web
Different SE may
return different
results (as they
overlap)
Quality of results
in terms of
precision and
recall
See (for instance)
here
[http://www.searchlores.org]
Davide Eynard
ReSearch - 2008/06/06
10. 10
The Internet
The Web vs Not the Web
IRC Email
Blogs
Usenet IM
Wikis
Forums
File sharing Emule
Bittorrent
Folksonomies
P2P ...
...
Davide Eynard
ReSearch - 2008/06/06
11. 11
Search engines
How are search engines used?
Mostly queries with one or few words
• (which ones? Give a look at zeitgeist!)
Mostly you look just at the first hits
• (check here and here)
Main operators are available instead...
quotes
allinanchor
inurl
filetype
intitle
related
... and of course boolean ones
Davide Eynard
ReSearch - 2008/06/06
12. 12
True or false?
How true is boolean search?
(that is, how truly boolean...)
“I want this term or this other and not that one” is fine...
... but don't try to think in sets!
semantic AND web semantic AND semantic
web
semantic
semantic
... but it doesn't work like this!
Davide Eynard
ReSearch - 2008/06/06
13. 13
Vector Space Model
In the VSM, documents are represented as vectors
in a multidimensional Euclidean space
The coordinate of document d in axis t is given by
dt = TF(d,t) * IDF(t)
Davide Eynard
ReSearch - 2008/06/06
14. 14
The epanaleptical approach
Some search engines are based on models that are much more similar
to the VSM than to sets+boolean.
Epanaleptical approach:
just repeat the word many times
if it's more that one word, surround them with quotes
Examples (nice academic drawbacks):
semantic web
semantic web + collaborative systems
slam
performance evaluation
Davide Eynard
ReSearch - 2008/06/06
15. 15
To google or not to google
Use google to find anything
“local” searches can be run from google too
try it with blogs, forums, wikis etc
• phpbb trick
• mediawiki trick
Use alternative search engines
search for related:www.google.com
Davide Eynard
ReSearch - 2008/06/06
16. 16
Search techniques
Word search (+ suffixes)
Webbits (here and here)
• (and the “index of” trick)
Concept related search and specific search engines
Arrows: using communities of practice to enhance search
• What are diy, gtd, seo, slam, etc.?
Foster serendipity
• check upper dirs
• follow links
• look at the status bar
Davide Eynard
ReSearch - 2008/06/06
17. 17
Exploit collaboration
Blogs/News
Ok, I suppose you all know about RSS feeds...
• You can recognize them
• You can mash them up
• You can use them for other media
... but how can you find interesting ones?
• AideRSS technique
• ... and a tutorial that explains you how to use it
Davide Eynard
ReSearch - 2008/06/06
18. 18
Exploit collaboration
Folksonomies
del.icio.us
ma.gnolia
Bibliography sharing
bibsonomy
CiteULike
Social networks/groups
Ever searched for Facebook groups?
Davide Eynard
ReSearch - 2008/06/06
19. 19
DIY
AKA Do It Yourself
AKA means Also Known As
• Also means... well, just jokin'!
In this case it means use a personal, custom approach using ready
made tools or creating new ones.
How can you do it?
Know thy enemy
• WWW, HTTP, HTML (see powerbrowsing)
• Human patterns
• PC patterns
Build models
Exploit tools or regularities in contents
Davide Eynard
ReSearch - 2008/06/06
20. 20
Web Technologies
There are some things you should know to make a well-behaving
bot:
• HTTP
◦ GET and POST
◦ Referer
◦ UserAgent
◦ Cookie
◦ Proxy
• HTML
◦ Form
◦ Dynamically generated code
Give a look at this tutorial. And to some DEI examples.
Davide Eynard
ReSearch - 2008/06/06
21. 21
Tools and examples
Web tools
• Program Committee Searcher
• Changedetection
• Wayback machine
• Mashup tools
• SpeakinAbout
Client tools
• user agent switcher
• spiders/scrapers
• custom made tools ;-)
• Firefox search plugins
Davide Eynard
ReSearch - 2008/06/06
22. 22
To conclude: did you know...
that we have people working on very interesting stuff about
searching, libraries and documents here (and, in the real world,
about 100m from us?)
that here you can find all the info you need to set up the
university proxy, so you can access restricted document
libraries from anywhere?
that on the OPAC you can find recent doctoral theses ready to
read, in pdf format?
... and that you have a lot of polimi-related news here?
Davide Eynard
ReSearch - 2008/06/06