SlideShare a Scribd company logo
1 of 77
UB Utrecht                  HvA-MIC                     GO Opleidingen




      searching the internet
better with Google / Google not always best


                      Eric Sieverts
                         @sieverts

                                               CODARTS, 04-03-2013
agenda


    • searching the web
    • smart searching
    • google options
    • beyond google
    • beyond general web search


         for all links see: http://sieverts.pbworks.com/codarts


2
the general
                        agenda               importance
    web                                       of specific
    ?=?                                        material
 everything                                     types?
              general             specific
               web                material
              search              search       how to …
how to …




                          when
                          & why
an ever changing google landscape




            •   unreliable numbers
            •   irreproducible results
            •   disappearing functions
            •   changing interfaces

4
5
building block approach

    systematic searching in structured information systems (like JStor etc.)
      start analytically with so-called building block approach
      e.g.: subject "modern american composers"
         – it breaks up in 3 facets
         – collect keywords for each facet
         – combine keywords with OR and AND operators

              modern              american             composers
       modern                american              composer
       contemporary          america               composers
       20th century OR       usa           OR      songwriters    OR
       twentieth century     united states         …
       …                     …

6
                          AND                   AND
building block approach

             modern             american           composers
        modern              american           composer
        contemporary        america            composers
        20th century OR     usa           OR   songwriters   OR
        twentieth century   united states      …
        …                   …

                        AND                AND
    it makes a query:
    (modern OR contemporary OR "twentieth century" OR "20th
    century")
       AND (america OR american OR usa OR "united states")
       AND (composer OR composers OR songwriter OR songwriters)
7
building block approach

    also with Google ?
    web search engines are not specifically designed for such structured
    queries, but it is possible to do


    Google and Yahoo make it even easier, since you may omit parentheses
    and the AND-operator (since it is default) :
                                                                 implied
                                                                  AND

    modern OR contemporary OR "twentieth century" OR "20th century" america
    OR american OR usa OR "united states" composer OR composers OR
    songwriter OR songwriters
                                       implied
                                        AND

8
relevance ranking (1)

    Google (and other web search engines) are primarily
    focused on presenting search results in order of relevance
    how do they know what is relevant?
     – they interpret the importance of words for the subject matter of
       the retrieved documents
       (your search terms present in title, url, headings, ... ?)
         • you can enhance importance of a certain term for your
           query by repeating that word a couple of times
     – they estimate the importance of the relation between words in
       the retrieved documents: whether ..
        • your search words occur close together
        • your search words occur in same order as you entered them
9
          >> formulate your query like you expect it formulated
word order matters
relevance ranking (2)

     Google (and other web search engines) are primarily
     focused on presenting search results in order of relevance
     how do they know what is relevant?
      – importance or quality of retrieved web pages is deduced from
        the number and the importance of links from other sites
        (for each site a pagerank is calculated)
      – importance of retrieved web pages for your personal interest is
        deduced on basis of your previous search and browse behaviour,
        which is monitored whenever you're logged in

     since every search engine uses somewhat different algorithms for its
     relevance calculations (and their coverage is different as well) there
     tends to be little overlap between top 10 results form different engines
11
search terms

     use of proper search terms is crucial for search success
     think of :
      –   singular / plural , verbs / nouns / adjectives , conjugations , ...
      –   spelling variations (behavior / behaviour)
      –   compound terms (writer / songwriter)
      –   synonyms, acronyms (compact disc / compact disk / cd / digital disc)

     how would the answer to my question be formulated in a
     relevant document? "think as if being a document"
      –   the right terms
      –   as an "exact phrase" or in most probable word order
      –   use wildcard for variable words ("modern * * composers")
      –   use known examples from a list to be found
      –   use of popular <> scientific terms etc.
13
refining searches

 if results are too broad, too diverse
  – add another essential term or set of terms to your query
  – see what your search engine suggests
    while you enter your query




   – exclude unwanted term with NOT (francis bacon NOT philosopher)
     NB: Google does not understand NOT ; use minus-sign instead:
14                                     francis bacon -philosopher
nice interactive infographic "how search works"
     http://www.google.com/insidesearch/howsearchworks/thestory/
15
is Google outsmarting us ?
     Google tries to improve and to broaden your queries
     •   automatic spelling corrections (veilgheid >> veiligheid)
     •   automatic search for words with same word stem (singular/plural,
         verb, conjugation, inflection, …)
     •   expands acronyms (jfk >> john f kennedy | wwii >> world war II)
     •   adds some synonyms (vaccination >> immunization)
     •   transforms separate words to compound term & vice versa
         (veiligheid maatregel >> veiligheidsmaatregel | catfood >> cat food)
     •   may leave out term as optional if not differentiating enough

     more often what/when or notEnglish than in Dutch
     never sure and elaborate in
     • personalisation based on previous search behaviour

     but what, if you don't like all of this ........
16
                                                            >> "verbatim"
d
    searche
   only    literally
                     t
   f or t he exac
                   u
      w ords yo
        entered

  on google.nl:
"woord voor woord"
some more "how to"


     • domain search: site:edu OR site:edu.* [for all edu (sub)domains]
                          site:shell.com OR site:philips.com
     • url search:        inurl:novelty
     • title search:      intitle:catalytic

                     just
     • filetype search: filetype:pdf
                          filetype:xls OR filetype:xlsx
                          filetype:doc OR filetype:docx
                                                            more than shown in
                                                             advanced search
                                                             drop-down menu
                          filetype:rss
     • exact search:      "greenhouses“       [or VERBATIM for all words]



20
advanced search

     Google is hiding its advanced search screen :
     you must perform a simple search
     first, to get the "cog wheel"




21
some more "how to"

     some of this can be done from the advanced search screen
     but regular search box offers greater flexibility,
     once you know the syntax
     • domain search: [in combination with real search terms]
                         site:codarts.nl
                         site:edu OR site:edu.* [for all edu (sub)domains]
                         site:last.fm OR site:spotify.com
     • url search:       inurl:course
     • title search:     intitle:guitar



22
some more "how to" (2)

     • filetype search:    filetype:pdf
                           filetype:xls OR filetype:xlsx     more types than shown
                                                              in advanced search
                           filetype:doc OR filetype:docx
                                                                drop-down menu
                           filetype:rss
     • numeric search: 10..20              [includes all values in between]
                           $10..$20        [not for other currencies]
     • punctuation:        &, %, dot, ...          [can be searched]
                           €, /, ", comma, ...     [is ignored]
     • exact search:       "greenhouses“         [or VERBATIM for all words]
     • synonym search: ~guitar
     • time limitations:   [after search, hidden in top menu]

23
synonym
 search
date
limitations
26
who searches for “Bach” is probably more interested
       in data about him, than in websites about him; and
       most probably in "J.S." instead of one of his relatives




Google's "Knowledge Graph"
knows 500 million objects
with 3,5 billion properties and
even more mutual relations
(but only in English)
it also interprets the intention of your query (sometimes ;-)




28
general
         search engines besides google
 • Bing         microsoft, large
 • Yahoo!       content=Bing, large
 • Blekko       uses hashtags to search more [domain-] selective
                also many predefined hashtags; e.g. /likes for Facebook
 • DuckDuckGo assures privacy, no personalisation, no filter-bubble,
                rather small, !Bang-function offers many extras
 • Gigablast    green search engine, rather small, some unique functions
 • Exalead      french, many advanced functions, primarily demo system
 • Millionshort leaves out results from most popular sites → the long tail
 • WolframAlpha knowledge engine, facts, calculations
 together, these others have 30% market share in US; in NL only 3%
 •   Yandex        in Russia more popular than Google
 •   Baidu         in China more popular than Google
 •   Naver, Daum   in South Korea more popular than Google
 •   Seznam        in Czechia more popular than Google
30
material type specific search
     science   google scholar, microsoft academic, scirus,
               oaister, scientific commons, science.gov
     reference wikipedia, quora, wolfram|alpha, answers.com
     news     google news, yahoo news, bing news, cnn, bbc
     old news way-back-machine, historische kranten KB
     images google image, yahoo image, bing image, flickr,
                tineye (ip-check), panoramio (geo-search)
     video      google video, youtube, youtube edu channel,
                bing video, blinkx, voxalead-news
     tweets     twitter search, topsy, postpost, snapbird
     social     socialsearcher, socialmention, whostalkin, kurrently
     forums     google groups, omgili, boardtracker
     blogs      google blogs, icerocket, [rss] CTRLQ, RSS SearchHub
31
scientific search

     books
       –   Google Books (full text search)
       –   Hathitrust Digital Library (open book scan project / part of G-books)
       –   Librarything (catalog of 58.000.000 books from 1.000.000 owners)
       –   GoodReads (reviews, recommandation, friends, ...)
       –   Open Textbook Catalog (open access leerboeken)

     journal articles
       –   licensed databases (like JStor, ...)
       –   Google Scholar (articles, dissertations, reports, ...)
       –   sEURch / UvA-library ("discovery" systems of EUR / UvA)
       –   Scirus / SciVerse (journal articles -Elsevier- , database content, webpages)
       –   Magportal (also -English- popular magazines)
       –   DeepDyve (scientific articles "for rent" - for 24 hours)

32
Google Books

     •   all pages scanned and full-text searchable
     •   important to discover specific subjects/terms - not primary book topic
     •   often limitations on display and browsability
         (no preview / snippet view / limited preview / full preview)
     •   content from publishers and large libraries
     •   problems with viewing copyrighted material also from libraries
     •   build your personal ‘My Library’
     •   NL-books not only from Gent University (and soon KB), also from
         US/UK
     •   also some ‘magazines’
     •   metadata on about-this-book-page


33
Google Scholar

     •   > 100 million scientific publications (most articles)
     •   differences between availability (and hence searchability) of
         full-text (majority), bibliographic-only, and citation data
     •   competitor of Web of Science, Scopus, Scirus, ...
     •   indexing many selected -even licensed- sources (publishers,
         abstract-databases, university sites, institutional repositories, ...)
     •   includes numbers of citations! [and links to them]
     •   number of citations important factor for relevance ranking
         (!! reason why recent publications get low rankings)
     •   advanced search limited, many mistakes in metadata (authors etc.)
     •   accessibility of full-text often a problem because of licences
     •   often many versions of same article (including sometimes free ones)
     •   coupling with library subscriptions to allow smoother linking
     •   no info about sources, updates etc.
37
open access




            if this article is interesting,
            these 23 more recent ones probably also




  ## of
citations
                                                      subscription
                                                      univ. utrecht
facts and reference

     encyclopedias
       – wikipedia
       – internet movie database
       – ...
     Q&A (human powered)
       – Quora
       – Yahoo-answers
     direct answers, facts and calculations
       – Wolfram|Alpha
     dictionaries, translations
       –   answers.com (metasearch)
       –   Roget thesaurus
       –   Bartleby
       –   Google Translate
       –   Google Translated search           >
       –   Synoniemen.net (dutch)
41
wikipedia

     •   >250 languages
     •   “wisdom of the crowds” ?=? “wisdom” for all topics?
     •   quite good for “factual” topics
     •   many detailed specific topics (>20 million lemmas, >1 million NL)
     •   there are policies & guidelines
         & management: stewards, administrators
     •   for searching the wikipedia use Google rather than internal search
         limit to:               site:wikipedia.org
         gives more complete results
         and searches directly in all language versions together




42
google's
"translated search"
is now almost hidden
translates original query
(here in english)
into chosen languages
and translates results
back into english
... and pages selected
from the result list are
translated in English too
old stuff : web & news

     •   web archive
          – "way-back machine": old versions of websites, back to 1996
            access thru the -original- url, NO search
            internal site links will mostly work
          – also other archived materials (a.o. music)
     •   historical Dutch newspapers
           – historische kranten KB (1618-1995 ; full-text search)
     •   historical international newspapers
           – British newspapers 1800-1900
           – historic American newspapers
           – international overview



50
… and the very oldest one from february 1998:




53
twitter & social search

     twitter search (often limited to messages from past 1 - 2 weeks only)
           – twitter (also advanced search)
           – topsy (best one at the moment, also older messages)
           – postpost (search your own timeline - everything you're following)
           – snapbird (search thru all tweets of particular person -
                        you have to know twittername)
     real time / social search
           – socialsearcher (facebook | twitter | g+ : side by side)
           – socialmention (also weblogs)
           – samepoint, whostalkin, kurrently, … (also weblogs)
     forum discussions
         – omgili, boardtracker, ...
         – Google groups

54
55
56
57
58
multimedia search / images

     mostly search by keywords
       – Google-image (simple image recognition)
       – Yahoo-image (also pictures from Flickr)
       – Bing-image
       – Flickr (photo upload-site; search on user tags;
                      filter on “Creative Commons” material)
       – photographs on twitter (twicsy, picfog, topsy, skylines.io, …)
       – special sites (beeldbank nationaal archief, wikimedia commons, ...)

     special techniques:
       – geographical (panoramio [google-maps], worldc.am [instagram], ...)
       – Google (search by example)
       – Tineye (search for -almost- exact copies; a.o. copyright infringed?)

62
63
image search

     Content based image retrieval (CBIR)
     •   search on colors
          – examples: Tineye, Chromatik, Picitup, Google, ...




64
image search

 Content based image retrieval
 • search by example

     – draw it yourself
       Retrievr, ...

     – existing image
       Google (visually similar)
       Tineye (almost exact copies)
       Retrievr, ...
       example found on the web or
       uploaded from your own computer



65
example




67
google looks for most probable
keywords to describe this image
and in the search box combines
them already with the image




           ... and how about these
           "visually similar images" ?
photoshopped
advertisement,
but what's the
  original ?
multimedia search / video
     (mostly) uploaded material
      – YouTube (growth: 70 hours/minute ; also many "how to" video's)
        also: YouTube-channels / YouTube-education / YouTube-teachers /
        YouTube-movies / YouTube-shows / …
      – Vimeo

     (mostly) broadcasted material
      – Blinkx (35 million hours video, speech recognition?)
      – VoxaleadNews (speech recognition in several languages - also NL!
        hence "full-text" search on spoken words)
      – Bing-video (not easy to find from European home page)
      – Google-video (also videos from YouTube; metadata search only)
      – Dutch TV-programs:
          • Uitzending gemist (limited search functionality)
          • Beeld & Geluid (metadata search; use “uitgebreid zoeken”)
          • Academia (selection from Beeld & Geluid for higher education)
74
?
the end
     any questions?




77

More Related Content

What's hot

An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataFabien Gandon
 
Pick n Mix: Choosing the right research tool
Pick n Mix: Choosing the right research toolPick n Mix: Choosing the right research tool
Pick n Mix: Choosing the right research toolRachel Scott Halls
 
Web Search Alert 2006
Web Search Alert 2006Web Search Alert 2006
Web Search Alert 2006Gwen Harris
 
"Whatever I can get..."
"Whatever I can get...""Whatever I can get..."
"Whatever I can get..."Dan Brickley
 
Queen Mary MA Performance Induction
Queen Mary MA Performance InductionQueen Mary MA Performance Induction
Queen Mary MA Performance Inductioncolin71
 

What's hot (7)

Name That Graph !
Name That Graph !Name That Graph !
Name That Graph !
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
Pick n Mix: Choosing the right research tool
Pick n Mix: Choosing the right research toolPick n Mix: Choosing the right research tool
Pick n Mix: Choosing the right research tool
 
Web Search Alert 2006
Web Search Alert 2006Web Search Alert 2006
Web Search Alert 2006
 
"Whatever I can get..."
"Whatever I can get...""Whatever I can get..."
"Whatever I can get..."
 
Research 2 0
Research 2 0Research 2 0
Research 2 0
 
Queen Mary MA Performance Induction
Queen Mary MA Performance InductionQueen Mary MA Performance Induction
Queen Mary MA Performance Induction
 

Viewers also liked

Models of Information Searching
Models of Information SearchingModels of Information Searching
Models of Information SearchingJohan Koren
 
CT231: Research & search skills
CT231: Research & search skillsCT231: Research & search skills
CT231: Research & search skillsct231
 
Blossom591 interactivepresentation
Blossom591 interactivepresentationBlossom591 interactivepresentation
Blossom591 interactivepresentationLeigh Blossom
 
20110521 eightfold path and meditation2
20110521 eightfold path and meditation220110521 eightfold path and meditation2
20110521 eightfold path and meditation2Tom
 
Information Searching Skills
Information Searching SkillsInformation Searching Skills
Information Searching SkillsAnn Celestine
 
Information Search Skills
Information Search SkillsInformation Search Skills
Information Search Skillswendy0315
 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Gerard de Melo
 
Gathering information and Scanning the environment
Gathering information and Scanning the environmentGathering information and Scanning the environment
Gathering information and Scanning the environmentFree Talk 2 Other
 
Effective web search techniques
Effective web search techniquesEffective web search techniques
Effective web search techniquesaliciafe0215
 

Viewers also liked (11)

Models of Information Searching
Models of Information SearchingModels of Information Searching
Models of Information Searching
 
CT231: Research & search skills
CT231: Research & search skillsCT231: Research & search skills
CT231: Research & search skills
 
The 8-Fold Path to Web Searching Power
The 8-Fold Path to Web Searching PowerThe 8-Fold Path to Web Searching Power
The 8-Fold Path to Web Searching Power
 
Blossom591 interactivepresentation
Blossom591 interactivepresentationBlossom591 interactivepresentation
Blossom591 interactivepresentation
 
20110521 eightfold path and meditation2
20110521 eightfold path and meditation220110521 eightfold path and meditation2
20110521 eightfold path and meditation2
 
Information Searching Skills
Information Searching SkillsInformation Searching Skills
Information Searching Skills
 
Information Search Skills
Information Search SkillsInformation Search Skills
Information Search Skills
 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)
 
Gathering information and Scanning the environment
Gathering information and Scanning the environmentGathering information and Scanning the environment
Gathering information and Scanning the environment
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
Effective web search techniques
Effective web search techniquesEffective web search techniques
Effective web search techniques
 

Similar to Searching the internet - better with Google / Google not always best

Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowEric Sieverts
 
Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowEric Sieverts
 
Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014notess
 
Advance searching techniques
Advance searching techniquesAdvance searching techniques
Advance searching techniquesHumayun Khan
 
Advanced google searching (1)
Advanced google searching (1)Advanced google searching (1)
Advanced google searching (1)Brenda Crawford
 
05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptxGambari Amosa Isiaka
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Roi Blanco
 
Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011cyberspaced educator
 
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-1910 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19rgillis
 
Search Analytics for Content Strategists
Search Analytics for Content StrategistsSearch Analytics for Content Strategists
Search Analytics for Content StrategistsLouis Rosenfeld
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web searchVictor de Boer
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationLorri Mon
 
Semantic Search
Semantic SearchSemantic Search
Semantic Searchsssw2012
 
Glider Research Intro
Glider Research IntroGlider Research Intro
Glider Research Introsmkitsis
 
Google Magic2
Google Magic2Google Magic2
Google Magic2Velma
 

Similar to Searching the internet - better with Google / Google not always best (20)

Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should know
 
Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should know
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014
 
Advance searching techniques
Advance searching techniquesAdvance searching techniques
Advance searching techniques
 
GoogleSmart
GoogleSmartGoogleSmart
GoogleSmart
 
Advanced google searching (1)
Advanced google searching (1)Advanced google searching (1)
Advanced google searching (1)
 
05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
 
Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011
 
Basics of Web Research for ELA 10
Basics of Web Research for ELA 10Basics of Web Research for ELA 10
Basics of Web Research for ELA 10
 
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-1910 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
 
Search Analytics for Content Strategists
Search Analytics for Content StrategistsSearch Analytics for Content Strategists
Search Analytics for Content Strategists
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
 
Google Search
Google SearchGoogle Search
Google Search
 
Semantic Search
Semantic SearchSemantic Search
Semantic Search
 
Google Dorks
Google DorksGoogle Dorks
Google Dorks
 
Glider Research Intro
Glider Research IntroGlider Research Intro
Glider Research Intro
 
Google Magic2
Google Magic2Google Magic2
Google Magic2
 

More from Eric Sieverts

Automatische classificatie
Automatische classificatieAutomatische classificatie
Automatische classificatieEric Sieverts
 
Een andere blik op Google
Een andere blik op GoogleEen andere blik op Google
Een andere blik op GoogleEric Sieverts
 
Wij zullen vinden - ook in 2023
Wij zullen vinden - ook in 2023Wij zullen vinden - ook in 2023
Wij zullen vinden - ook in 2023Eric Sieverts
 
Zoekmachines weten het antwoord
Zoekmachines weten het antwoordZoekmachines weten het antwoord
Zoekmachines weten het antwoordEric Sieverts
 
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
Vertrouwen op semantische zoeksystemen of zelf aan het stuurVertrouwen op semantische zoeksystemen of zelf aan het stuur
Vertrouwen op semantische zoeksystemen of zelf aan het stuurEric Sieverts
 
Semantisch zoeken in een webomgeving
Semantisch zoeken in een webomgevingSemantisch zoeken in een webomgeving
Semantisch zoeken in een webomgevingEric Sieverts
 
Information Retrieval: van specialisme tot commodity
Information Retrieval: van specialisme tot commodityInformation Retrieval: van specialisme tot commodity
Information Retrieval: van specialisme tot commodityEric Sieverts
 
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...Eric Sieverts
 
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.Eric Sieverts
 
Zin en onzin van metadata
Zin en onzin van metadataZin en onzin van metadata
Zin en onzin van metadataEric Sieverts
 
40 jaar informatiegebruik
40 jaar informatiegebruik40 jaar informatiegebruik
40 jaar informatiegebruikEric Sieverts
 
UBU 3.0: semantisch web & linked data voor de UB?
UBU 3.0: semantisch web & linked data voor de UB?UBU 3.0: semantisch web & linked data voor de UB?
UBU 3.0: semantisch web & linked data voor de UB?Eric Sieverts
 
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
Metadata, standaarden, interoperabiliteit, semantisch web en linked dataMetadata, standaarden, interoperabiliteit, semantisch web en linked data
Metadata, standaarden, interoperabiliteit, semantisch web en linked dataEric Sieverts
 
A pair of shoes in the thesaurus; some reflexions on human and computer indexing
A pair of shoes in the thesaurus; some reflexions on human and computer indexingA pair of shoes in the thesaurus; some reflexions on human and computer indexing
A pair of shoes in the thesaurus; some reflexions on human and computer indexingEric Sieverts
 
Een digitale bibliotheek of alleen Google?
Een digitale bibliotheek of alleen Google?Een digitale bibliotheek of alleen Google?
Een digitale bibliotheek of alleen Google?Eric Sieverts
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated informationEric Sieverts
 
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overloadLifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overloadEric Sieverts
 
Vinden dankzij / ondanks metadata
Vinden dankzij / ondanks metadataVinden dankzij / ondanks metadata
Vinden dankzij / ondanks metadataEric Sieverts
 
UBU-2.0 : allesopeenrijtje-2.0
UBU-2.0 : allesopeenrijtje-2.0UBU-2.0 : allesopeenrijtje-2.0
UBU-2.0 : allesopeenrijtje-2.0Eric Sieverts
 

More from Eric Sieverts (20)

Automatische classificatie
Automatische classificatieAutomatische classificatie
Automatische classificatie
 
Een andere blik op Google
Een andere blik op GoogleEen andere blik op Google
Een andere blik op Google
 
Wij zullen vinden - ook in 2023
Wij zullen vinden - ook in 2023Wij zullen vinden - ook in 2023
Wij zullen vinden - ook in 2023
 
Zoekmachines weten het antwoord
Zoekmachines weten het antwoordZoekmachines weten het antwoord
Zoekmachines weten het antwoord
 
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
Vertrouwen op semantische zoeksystemen of zelf aan het stuurVertrouwen op semantische zoeksystemen of zelf aan het stuur
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
 
Semantisch zoeken in een webomgeving
Semantisch zoeken in een webomgevingSemantisch zoeken in een webomgeving
Semantisch zoeken in een webomgeving
 
Information Retrieval: van specialisme tot commodity
Information Retrieval: van specialisme tot commodityInformation Retrieval: van specialisme tot commodity
Information Retrieval: van specialisme tot commodity
 
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
 
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
 
Zin en onzin van metadata
Zin en onzin van metadataZin en onzin van metadata
Zin en onzin van metadata
 
40 jaar informatiegebruik
40 jaar informatiegebruik40 jaar informatiegebruik
40 jaar informatiegebruik
 
UBU 3.0: semantisch web & linked data voor de UB?
UBU 3.0: semantisch web & linked data voor de UB?UBU 3.0: semantisch web & linked data voor de UB?
UBU 3.0: semantisch web & linked data voor de UB?
 
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
Metadata, standaarden, interoperabiliteit, semantisch web en linked dataMetadata, standaarden, interoperabiliteit, semantisch web en linked data
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
 
Searchtrends
SearchtrendsSearchtrends
Searchtrends
 
A pair of shoes in the thesaurus; some reflexions on human and computer indexing
A pair of shoes in the thesaurus; some reflexions on human and computer indexingA pair of shoes in the thesaurus; some reflexions on human and computer indexing
A pair of shoes in the thesaurus; some reflexions on human and computer indexing
 
Een digitale bibliotheek of alleen Google?
Een digitale bibliotheek of alleen Google?Een digitale bibliotheek of alleen Google?
Een digitale bibliotheek of alleen Google?
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated information
 
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overloadLifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
 
Vinden dankzij / ondanks metadata
Vinden dankzij / ondanks metadataVinden dankzij / ondanks metadata
Vinden dankzij / ondanks metadata
 
UBU-2.0 : allesopeenrijtje-2.0
UBU-2.0 : allesopeenrijtje-2.0UBU-2.0 : allesopeenrijtje-2.0
UBU-2.0 : allesopeenrijtje-2.0
 

Recently uploaded

Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 

Recently uploaded (20)

Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 

Searching the internet - better with Google / Google not always best

  • 1. UB Utrecht HvA-MIC GO Opleidingen searching the internet better with Google / Google not always best Eric Sieverts @sieverts CODARTS, 04-03-2013
  • 2. agenda • searching the web • smart searching • google options • beyond google • beyond general web search for all links see: http://sieverts.pbworks.com/codarts 2
  • 3. the general agenda importance web of specific ?=? material everything types? general specific web material search search how to … how to … when & why
  • 4. an ever changing google landscape • unreliable numbers • irreproducible results • disappearing functions • changing interfaces 4
  • 5. 5
  • 6. building block approach systematic searching in structured information systems (like JStor etc.) start analytically with so-called building block approach e.g.: subject "modern american composers" – it breaks up in 3 facets – collect keywords for each facet – combine keywords with OR and AND operators modern american composers modern american composer contemporary america composers 20th century OR usa OR songwriters OR twentieth century united states … … … 6 AND AND
  • 7. building block approach modern american composers modern american composer contemporary america composers 20th century OR usa OR songwriters OR twentieth century united states … … … AND AND it makes a query: (modern OR contemporary OR "twentieth century" OR "20th century") AND (america OR american OR usa OR "united states") AND (composer OR composers OR songwriter OR songwriters) 7
  • 8. building block approach also with Google ? web search engines are not specifically designed for such structured queries, but it is possible to do Google and Yahoo make it even easier, since you may omit parentheses and the AND-operator (since it is default) : implied AND modern OR contemporary OR "twentieth century" OR "20th century" america OR american OR usa OR "united states" composer OR composers OR songwriter OR songwriters implied AND 8
  • 9. relevance ranking (1) Google (and other web search engines) are primarily focused on presenting search results in order of relevance how do they know what is relevant? – they interpret the importance of words for the subject matter of the retrieved documents (your search terms present in title, url, headings, ... ?) • you can enhance importance of a certain term for your query by repeating that word a couple of times – they estimate the importance of the relation between words in the retrieved documents: whether .. • your search words occur close together • your search words occur in same order as you entered them 9 >> formulate your query like you expect it formulated
  • 11. relevance ranking (2) Google (and other web search engines) are primarily focused on presenting search results in order of relevance how do they know what is relevant? – importance or quality of retrieved web pages is deduced from the number and the importance of links from other sites (for each site a pagerank is calculated) – importance of retrieved web pages for your personal interest is deduced on basis of your previous search and browse behaviour, which is monitored whenever you're logged in since every search engine uses somewhat different algorithms for its relevance calculations (and their coverage is different as well) there tends to be little overlap between top 10 results form different engines 11
  • 12.
  • 13. search terms use of proper search terms is crucial for search success think of : – singular / plural , verbs / nouns / adjectives , conjugations , ... – spelling variations (behavior / behaviour) – compound terms (writer / songwriter) – synonyms, acronyms (compact disc / compact disk / cd / digital disc) how would the answer to my question be formulated in a relevant document? "think as if being a document" – the right terms – as an "exact phrase" or in most probable word order – use wildcard for variable words ("modern * * composers") – use known examples from a list to be found – use of popular <> scientific terms etc. 13
  • 14. refining searches if results are too broad, too diverse – add another essential term or set of terms to your query – see what your search engine suggests while you enter your query – exclude unwanted term with NOT (francis bacon NOT philosopher) NB: Google does not understand NOT ; use minus-sign instead: 14 francis bacon -philosopher
  • 15. nice interactive infographic "how search works" http://www.google.com/insidesearch/howsearchworks/thestory/ 15
  • 16. is Google outsmarting us ? Google tries to improve and to broaden your queries • automatic spelling corrections (veilgheid >> veiligheid) • automatic search for words with same word stem (singular/plural, verb, conjugation, inflection, …) • expands acronyms (jfk >> john f kennedy | wwii >> world war II) • adds some synonyms (vaccination >> immunization) • transforms separate words to compound term & vice versa (veiligheid maatregel >> veiligheidsmaatregel | catfood >> cat food) • may leave out term as optional if not differentiating enough more often what/when or notEnglish than in Dutch never sure and elaborate in • personalisation based on previous search behaviour but what, if you don't like all of this ........ 16 >> "verbatim"
  • 17.
  • 18.
  • 19. d searche only literally t f or t he exac u w ords yo entered on google.nl: "woord voor woord"
  • 20. some more "how to" • domain search: site:edu OR site:edu.* [for all edu (sub)domains] site:shell.com OR site:philips.com • url search: inurl:novelty • title search: intitle:catalytic just • filetype search: filetype:pdf filetype:xls OR filetype:xlsx filetype:doc OR filetype:docx more than shown in advanced search drop-down menu filetype:rss • exact search: "greenhouses“ [or VERBATIM for all words] 20
  • 21. advanced search Google is hiding its advanced search screen : you must perform a simple search first, to get the "cog wheel" 21
  • 22. some more "how to" some of this can be done from the advanced search screen but regular search box offers greater flexibility, once you know the syntax • domain search: [in combination with real search terms] site:codarts.nl site:edu OR site:edu.* [for all edu (sub)domains] site:last.fm OR site:spotify.com • url search: inurl:course • title search: intitle:guitar 22
  • 23. some more "how to" (2) • filetype search: filetype:pdf filetype:xls OR filetype:xlsx more types than shown in advanced search filetype:doc OR filetype:docx drop-down menu filetype:rss • numeric search: 10..20 [includes all values in between] $10..$20 [not for other currencies] • punctuation: &, %, dot, ... [can be searched] €, /, ", comma, ... [is ignored] • exact search: "greenhouses“ [or VERBATIM for all words] • synonym search: ~guitar • time limitations: [after search, hidden in top menu] 23
  • 26. 26
  • 27. who searches for “Bach” is probably more interested in data about him, than in websites about him; and most probably in "J.S." instead of one of his relatives Google's "Knowledge Graph" knows 500 million objects with 3,5 billion properties and even more mutual relations (but only in English)
  • 28. it also interprets the intention of your query (sometimes ;-) 28
  • 29.
  • 30. general search engines besides google • Bing microsoft, large • Yahoo! content=Bing, large • Blekko uses hashtags to search more [domain-] selective also many predefined hashtags; e.g. /likes for Facebook • DuckDuckGo assures privacy, no personalisation, no filter-bubble, rather small, !Bang-function offers many extras • Gigablast green search engine, rather small, some unique functions • Exalead french, many advanced functions, primarily demo system • Millionshort leaves out results from most popular sites → the long tail • WolframAlpha knowledge engine, facts, calculations together, these others have 30% market share in US; in NL only 3% • Yandex in Russia more popular than Google • Baidu in China more popular than Google • Naver, Daum in South Korea more popular than Google • Seznam in Czechia more popular than Google 30
  • 31. material type specific search science google scholar, microsoft academic, scirus, oaister, scientific commons, science.gov reference wikipedia, quora, wolfram|alpha, answers.com news google news, yahoo news, bing news, cnn, bbc old news way-back-machine, historische kranten KB images google image, yahoo image, bing image, flickr, tineye (ip-check), panoramio (geo-search) video google video, youtube, youtube edu channel, bing video, blinkx, voxalead-news tweets twitter search, topsy, postpost, snapbird social socialsearcher, socialmention, whostalkin, kurrently forums google groups, omgili, boardtracker blogs google blogs, icerocket, [rss] CTRLQ, RSS SearchHub 31
  • 32. scientific search books – Google Books (full text search) – Hathitrust Digital Library (open book scan project / part of G-books) – Librarything (catalog of 58.000.000 books from 1.000.000 owners) – GoodReads (reviews, recommandation, friends, ...) – Open Textbook Catalog (open access leerboeken) journal articles – licensed databases (like JStor, ...) – Google Scholar (articles, dissertations, reports, ...) – sEURch / UvA-library ("discovery" systems of EUR / UvA) – Scirus / SciVerse (journal articles -Elsevier- , database content, webpages) – Magportal (also -English- popular magazines) – DeepDyve (scientific articles "for rent" - for 24 hours) 32
  • 33. Google Books • all pages scanned and full-text searchable • important to discover specific subjects/terms - not primary book topic • often limitations on display and browsability (no preview / snippet view / limited preview / full preview) • content from publishers and large libraries • problems with viewing copyrighted material also from libraries • build your personal ‘My Library’ • NL-books not only from Gent University (and soon KB), also from US/UK • also some ‘magazines’ • metadata on about-this-book-page 33
  • 34.
  • 35.
  • 36.
  • 37. Google Scholar • > 100 million scientific publications (most articles) • differences between availability (and hence searchability) of full-text (majority), bibliographic-only, and citation data • competitor of Web of Science, Scopus, Scirus, ... • indexing many selected -even licensed- sources (publishers, abstract-databases, university sites, institutional repositories, ...) • includes numbers of citations! [and links to them] • number of citations important factor for relevance ranking (!! reason why recent publications get low rankings) • advanced search limited, many mistakes in metadata (authors etc.) • accessibility of full-text often a problem because of licences • often many versions of same article (including sometimes free ones) • coupling with library subscriptions to allow smoother linking • no info about sources, updates etc. 37
  • 38. open access if this article is interesting, these 23 more recent ones probably also ## of citations subscription univ. utrecht
  • 39.
  • 40.
  • 41. facts and reference encyclopedias – wikipedia – internet movie database – ... Q&A (human powered) – Quora – Yahoo-answers direct answers, facts and calculations – Wolfram|Alpha dictionaries, translations – answers.com (metasearch) – Roget thesaurus – Bartleby – Google Translate – Google Translated search > – Synoniemen.net (dutch) 41
  • 42. wikipedia • >250 languages • “wisdom of the crowds” ?=? “wisdom” for all topics? • quite good for “factual” topics • many detailed specific topics (>20 million lemmas, >1 million NL) • there are policies & guidelines & management: stewards, administrators • for searching the wikipedia use Google rather than internal search limit to: site:wikipedia.org gives more complete results and searches directly in all language versions together 42
  • 43.
  • 44.
  • 46. translates original query (here in english) into chosen languages and translates results back into english
  • 47. ... and pages selected from the result list are translated in English too
  • 48.
  • 49.
  • 50. old stuff : web & news • web archive – "way-back machine": old versions of websites, back to 1996 access thru the -original- url, NO search internal site links will mostly work – also other archived materials (a.o. music) • historical Dutch newspapers – historische kranten KB (1618-1995 ; full-text search) • historical international newspapers – British newspapers 1800-1900 – historic American newspapers – international overview 50
  • 51.
  • 52.
  • 53. … and the very oldest one from february 1998: 53
  • 54. twitter & social search twitter search (often limited to messages from past 1 - 2 weeks only) – twitter (also advanced search) – topsy (best one at the moment, also older messages) – postpost (search your own timeline - everything you're following) – snapbird (search thru all tweets of particular person - you have to know twittername) real time / social search – socialsearcher (facebook | twitter | g+ : side by side) – socialmention (also weblogs) – samepoint, whostalkin, kurrently, … (also weblogs) forum discussions – omgili, boardtracker, ... – Google groups 54
  • 55. 55
  • 56. 56
  • 57. 57
  • 58. 58
  • 59.
  • 60.
  • 61.
  • 62. multimedia search / images mostly search by keywords – Google-image (simple image recognition) – Yahoo-image (also pictures from Flickr) – Bing-image – Flickr (photo upload-site; search on user tags; filter on “Creative Commons” material) – photographs on twitter (twicsy, picfog, topsy, skylines.io, …) – special sites (beeldbank nationaal archief, wikimedia commons, ...) special techniques: – geographical (panoramio [google-maps], worldc.am [instagram], ...) – Google (search by example) – Tineye (search for -almost- exact copies; a.o. copyright infringed?) 62
  • 63. 63
  • 64. image search Content based image retrieval (CBIR) • search on colors – examples: Tineye, Chromatik, Picitup, Google, ... 64
  • 65. image search Content based image retrieval • search by example – draw it yourself Retrievr, ... – existing image Google (visually similar) Tineye (almost exact copies) Retrievr, ... example found on the web or uploaded from your own computer 65
  • 66.
  • 68. google looks for most probable keywords to describe this image and in the search box combines them already with the image ... and how about these "visually similar images" ?
  • 69.
  • 70.
  • 72.
  • 73.
  • 74. multimedia search / video (mostly) uploaded material – YouTube (growth: 70 hours/minute ; also many "how to" video's) also: YouTube-channels / YouTube-education / YouTube-teachers / YouTube-movies / YouTube-shows / … – Vimeo (mostly) broadcasted material – Blinkx (35 million hours video, speech recognition?) – VoxaleadNews (speech recognition in several languages - also NL! hence "full-text" search on spoken words) – Bing-video (not easy to find from European home page) – Google-video (also videos from YouTube; metadata search only) – Dutch TV-programs: • Uitzending gemist (limited search functionality) • Beeld & Geluid (metadata search; use “uitgebreid zoeken”) • Academia (selection from Beeld & Geluid for higher education) 74
  • 75.
  • 76. ?
  • 77. the end any questions? 77

Editor's Notes

  1. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  2. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  3. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  4. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  5. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  6. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  7. Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie