SlideShare une entreprise Scribd logo
1  sur  69
UB Utrecht               HvA-MIC                 GO Opleidingen




     searching the internet
what patent searchers should know

                    Eric Sieverts

                                          WON, 11-12-2012
agenda

•   searching the web
•   the volatile google landscape
•   smart searching
•   dating and back to the past
•   reliability
•   google options
•   beyond google
•   beyond general web search
•   the social landscape
the general
                        agenda               importance
    web                                       of specific
    ?=?                                        material
 everything                                     types?
              general             specific
               web                material
              search              search       how to …
how to …




                          when
                          & why
an ever changing google landscape




        •   unreliable numbers
        •   irreproducible results
        •   disappearing functions
        •   changing interfaces
"coping" with numbers of results

in structured databases the effect on the number of results of
how you combine terms, generally meets expectations, but:
• with Google (and other web search) numbers are not
  stable, irreproducible, unreliable, with inexplicable effects
   –   refine with an AND-relation may increase number of results
   –   expand with an OR-relation may decrease number of results
   –   numbers are only extrapolations from small part of search index
   –   depends on distribution of the index over servers
   –   depends on Google version, browser, whether logged in, history, ...
   –   not just Google: Bing results also depend on geographic setting
• Danny Sullivan explains why Google can not calculate:
  http://searchengineland.com/why-google-cant-count-results-properly-53559
       Why Google Can’t Count Results Properly
Google as a vanishing machine

some services and options disappear completely
 –   timeline, wonder wheel, toolbar, ...
 –   + operator
 –   real time results, code search
 –   google buzz, google wave, google directory, ...

others are only hidden
 – links for advanced search and for settings hidden under “cog wheel”
    (sometimes dependent on browser)
 – Scholar, Patents and Groups no longer mentioned in menus
 – backlink search no longer in advanced search
 – search for "similar" pages & "cache"-link are hidden in "invisible"
   pop-up page preview
 – …
like faceted search in for instance Scopus
e
                                                but m  eanwhil
                                                               dy
                                                 this is alrea
                                                          nt erface !
                                               an "old" i

refinements and additional functions
like in modern "web scale discovery" systems
tools & facets from clear left column
to blurry top menu (for mobile's sake?)

                              google.nl
                              [until 2 weeks ago]
                                        google.com
all options by material type, in old interface
Google tries outsmarting us

Google tries to improve and to broaden your queries
• automatic spelling corrections (veilgheid >> veiligheid)
• search for words with same word stem (singular/plural, verb,
  conjugation, inflection, …)
• expands acronyms (jfk >> john f kennedy | wwii >> world war II)
• adds synonyms (vaccination >> immunization)
• transforms separate words to compound term & vice versa
  (veiligheid maatregel >> veiligheidsmaatregel | catfood >> cat food)
• may leave out term as optional if not differentiating enough

never often what/when or notEnglish than in Dutch
more sure and elaborate in
• personalises search, based on previous search behaviour

and if you don't like all of this ........        >> "verbatim"
option
                                   recently
                                           to
                                   moved
                                            u
                                   top men
new option introduced early 2012
           verbatim
on google.nl: "woord voor woord"
standard semantic coding
     allowed Google to make a
     recipe search engine
     "embedded metadata"




         standardisation of
    property descriptions in HTML
        of recipe pages, with
"microformats"/"rich snippets markup"
Google's "Knowledge Graph"
knows 500 million objects
with 3,5 billion properties
(but only in English)
dates
 ??
 no
publication dates

• limitation while searching google
    – before search:   only "past day/week/month/year"
    – after search:    also limitation on custom range "from .. to .."

      search tools:
publication dates

• limitation while searching google
   – before search:     only "past day/week/month/year"
   – after search:      also limitation on custom range "from .. to .."
• how reliable are google's dates?               NOT
• how else to determine date?
   – look at page text (especially top and bottom or blogging date)
   – look in page source (HTML) for metadata
   – try entering javascript in browser URL bar

                javascript:alert(document.lastModified)
     but does NOT work for CMS generated pages
   – look for indexing date in Google cache
   – try to find recent time stamped version in Web Archive
     (waybackmachine)
disappeared / old versions of pages
• recently disappeared: try search engine cache
  not just google! :
                Bing




               Yahoo



               Exalead
disappeared / old versions of pages

for older versions: try web archive (waybackmachine)
http://archive.org
• links within same site are mostly working
• if particular page has not been crawled, they show which
  other pages on that site have been crawled
• some pages/sites have only recently been crawled
• other pages/sites go far back in time
• if domain name has changed, you must use the old name
• some sites don't want to be crawled
but sometimes:
intermezzo about
trust and integrity
reliability & integrity - general

general website assessment criteria
•   professional lay-out
•   indication of author/organisation (“about us”)
•   data about organisation: address, telephone, map/driving directions
•   indication of targeted audience
•   not too many advertisements and pop-ups (although every site has them)
•   clear navigation
•   internal search option
•   speed of web server
•   backlinks from well known organisations **
•   up to date-ness (with date given)
•   language use
•   interpret the URL/domain-name (eg: edu, edu.au, edu.sg, edu.ng, edu.lb, ac.uk,
    gov, gov.uk, gov.hk, gov.au, gov.on.ca, gob.es, gob.mx, gob.ve, gob.ec, ...)
reliability & integrity - organisation

Information about organisation
• Google pagerank (backlinks)
  use for instance: http://www.prchecker.info/
                       http://www.checkpagerank.net/
• Alexa rank (web traffic)
  see for instance: http://www.alexa.com/
                       http://www.seomastering.com/alexa-rank-checker.php
• domain owner
  use for instance:    http://centralops.net/co/DomainDossier.aspx
                       http://whois.domaintools.com/
• search for "backlinks"
reliability & integrity - backlinks

search backlinks to particular web-page/-site
• Google:         link:http://www.domain.zz/folder/file.html
                  very incomplete result
• Yahoo site explorer: died last year
• DuckDuckGo: link:http://www.domain.zz/folder/file.html
              often > google; no total numbers given
• OpenSiteExplorer: linking pages + linking domains
                very complete; also domain & page authority
                paid subscription if more than 3 queries /day
• Exalead:        link:http://www.domain.zz/
                  no backlinks to specific page, but to whole site
• Alexa:          100 most important domains backlinking to site
the 35 sites
mentioned under
  "reputation"




      after
        9
       no
      more
     results
total
  list:
   30
results
backlinks - variable ratios



reported #
 backlinks    google     DDG       OSE

homepage1       17        9        2016

deeppage1       4         0         30

deeppage2       9         30       224
some more "how to"


• domain search: site:edu OR site:edu.* [for all edu (sub)domains]
                     site:shell.com OR site:philips.com
• url search:        inurl:novelty
• title search:      intitle:catalytic

                just
• filetype search: filetype:pdf
                     filetype:xls OR filetype:xlsx
                     filetype:doc OR filetype:docx
                                                       more than shown in
                                                        advanced search
                                                        drop-down menu
                     filetype:rss
• exact search:      "greenhouses“       [or VERBATIM for all words]
general
        search engines besides google
• Bing         microsoft, large
• Yahoo!       content=Bing, large
• Blekko       uses hashtags to search more [domain-] selective
               also many predefined hashtags; e.g. /likes for Facebook
• DuckDuckGo assures privacy, no personalisation, no filter-bubble,
               rather small, !Bang-function offers many extras
• Gigablast    green search engine, rather small, some unique functions
• Exalead      french, many advanced functions, primarily demo system
• Millionshort leaves out results from most popular sites → the long tail
• WolframAlpha knowledge engine, facts, calculations
together, these others have 30% market share in US; in NL only 3%
•   Yandex        in Russia more popular than Google
•   Baidu         in China more popular than Google
•   Naver, Daum   in South Korea more popular than Google
•   Seznam        in Czechia more popular than Google
material type specific search

blogs     google blogs, icerocket, technorati
          [rss] CTRLQ, RSS SearchHub
video     google video, youtube, youtube edu channel,
          bing video, blinkx, voxalead-news
images google image, yahoo image, bing image, flickr,
          tineye (ip-check), panoramio (geo-search)
science google scholar, microsoft academic, scirus,
          oaister, scientific commons, science.gov
nieuws    google news, yahoo news, bing news, cnn, bbc,
          historische kranten KB, historic american newspapers (LOC)
tweets    twitter search, topsy, tweetzi, postpost, snapbird
social    socialsearcher, socialmention, samepoint, whostalkin, kurrently
forums    google groups, omgili, boardtracker
tweets & social search
• Twitter in 140 characters
   – often with shortened links
   – often with photo- or video-link
   – often with hashtags (#agreeduponkeyword)

  search (often limited to last 1 - 2 weeks,
                            and .... to those 140 characters)
   –   twitter-search (also advanced search), tweetzi, …
   –   topsy (also older messages)
   –   postpost (your own timeline - i.e. everything you're following)
   –   snapbird (full tweet history of 1 person – by his/her twittername)
   –   twicsy (photo's on twitter)
   –   ...
  overview/review of tools: All the easiest ways to search old tweets

                                                                      57
tweets & social search
• “Real time / social search engines”
   – socialsearcher, socialmention, samepoint, whostalkin, kurrently, …
     (tweets + blogs + facebook + …)
   – Google personal results / Google+ ("search plus your world")
   – real-time pictures: skylines

• Forum discussions
   – omgili, boardtracker, ...
   – Google groups (also old newsgroup discussions)

for research methods:
   – advice from Henk van Ess (dutch): "de digitale detective" (2012)
   – How to: use social media in newsgathering (2012)
   – 100+ Social Media Monitoring Tools (2010)
                                                                  66
the end
any questions?

Contenu connexe

Tendances

Cubrickz - Tutorial: Google for Webmasters
Cubrickz - Tutorial: Google for WebmastersCubrickz - Tutorial: Google for Webmasters
Cubrickz - Tutorial: Google for WebmastersRed Angel, LLC
 
ваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmastersваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmastersкрылов сергей
 
Search Faster, Search Smarter: Using Google products to quickly locate and or...
Search Faster, Search Smarter: Using Google products to quickly locate and or...Search Faster, Search Smarter: Using Google products to quickly locate and or...
Search Faster, Search Smarter: Using Google products to quickly locate and or...mollyjschoen
 
Best Free Web Services for Broke Libraries
Best Free Web Services for Broke LibrariesBest Free Web Services for Broke Libraries
Best Free Web Services for Broke LibrariesSarah Houghton
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web searchVictor de Boer
 
Best Free Web Stuff for Broke Libraries
Best Free Web Stuff for Broke LibrariesBest Free Web Stuff for Broke Libraries
Best Free Web Stuff for Broke LibrariesSarah Houghton
 
Lost in the Net: Navigating Search Engines
Lost in the Net:  Navigating Search EnginesLost in the Net:  Navigating Search Engines
Lost in the Net: Navigating Search EnginesJohan Koren
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internetdrgath
 
Working with Oxpoints
Working with OxpointsWorking with Oxpoints
Working with OxpointsIWMW
 
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always bestSearching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always bestEric Sieverts
 
Ict research presentation updated
Ict research presentation updatedIct research presentation updated
Ict research presentation updatedABoy74
 
Ict research presentation updated
Ict research presentation updatedIct research presentation updated
Ict research presentation updatedABoy74
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...Ícaro Medeiros
 
Internet Research: Finding Websites, Blogs, Wikis, and More
Internet Research: Finding Websites, Blogs, Wikis, and MoreInternet Research: Finding Websites, Blogs, Wikis, and More
Internet Research: Finding Websites, Blogs, Wikis, and Moreeclark131
 

Tendances (18)

Cubrickz - Tutorial: Google for Webmasters
Cubrickz - Tutorial: Google for WebmastersCubrickz - Tutorial: Google for Webmasters
Cubrickz - Tutorial: Google for Webmasters
 
ваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmastersваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmasters
 
Search Faster, Search Smarter: Using Google products to quickly locate and or...
Search Faster, Search Smarter: Using Google products to quickly locate and or...Search Faster, Search Smarter: Using Google products to quickly locate and or...
Search Faster, Search Smarter: Using Google products to quickly locate and or...
 
Best Free Web Services for Broke Libraries
Best Free Web Services for Broke LibrariesBest Free Web Services for Broke Libraries
Best Free Web Services for Broke Libraries
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
 
Best Free Web Stuff for Broke Libraries
Best Free Web Stuff for Broke LibrariesBest Free Web Stuff for Broke Libraries
Best Free Web Stuff for Broke Libraries
 
Lost in the Net: Navigating Search Engines
Lost in the Net:  Navigating Search EnginesLost in the Net:  Navigating Search Engines
Lost in the Net: Navigating Search Engines
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
 
Working with Oxpoints
Working with OxpointsWorking with Oxpoints
Working with Oxpoints
 
Google linkedinhaapc
Google linkedinhaapcGoogle linkedinhaapc
Google linkedinhaapc
 
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always bestSearching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
 
Websites On Speed
Websites On SpeedWebsites On Speed
Websites On Speed
 
Shally source con2012
Shally source con2012Shally source con2012
Shally source con2012
 
Ict research presentation updated
Ict research presentation updatedIct research presentation updated
Ict research presentation updated
 
Ict research presentation updated
Ict research presentation updatedIct research presentation updated
Ict research presentation updated
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
 
Seo Cheat Sheet
Seo Cheat SheetSeo Cheat Sheet
Seo Cheat Sheet
 
Internet Research: Finding Websites, Blogs, Wikis, and More
Internet Research: Finding Websites, Blogs, Wikis, and MoreInternet Research: Finding Websites, Blogs, Wikis, and More
Internet Research: Finding Websites, Blogs, Wikis, and More
 

En vedette

Een andere blik op Google
Een andere blik op GoogleEen andere blik op Google
Een andere blik op GoogleEric Sieverts
 
Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowEric Sieverts
 
Wij zullen vinden - ook in 2023
Wij zullen vinden - ook in 2023Wij zullen vinden - ook in 2023
Wij zullen vinden - ook in 2023Eric Sieverts
 
40 jaar informatiegebruik
40 jaar informatiegebruik40 jaar informatiegebruik
40 jaar informatiegebruikEric Sieverts
 
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
Metadata, standaarden, interoperabiliteit, semantisch web en linked dataMetadata, standaarden, interoperabiliteit, semantisch web en linked data
Metadata, standaarden, interoperabiliteit, semantisch web en linked dataEric Sieverts
 
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...Eric Sieverts
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated informationEric Sieverts
 
A pair of shoes in the thesaurus; some reflexions on human and computer indexing
A pair of shoes in the thesaurus; some reflexions on human and computer indexingA pair of shoes in the thesaurus; some reflexions on human and computer indexing
A pair of shoes in the thesaurus; some reflexions on human and computer indexingEric Sieverts
 
Vinden dankzij / ondanks metadata
Vinden dankzij / ondanks metadataVinden dankzij / ondanks metadata
Vinden dankzij / ondanks metadataEric Sieverts
 
Zin en onzin van metadata
Zin en onzin van metadataZin en onzin van metadata
Zin en onzin van metadataEric Sieverts
 
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overloadLifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overloadEric Sieverts
 
Een digitale bibliotheek of alleen Google?
Een digitale bibliotheek of alleen Google?Een digitale bibliotheek of alleen Google?
Een digitale bibliotheek of alleen Google?Eric Sieverts
 
Information Retrieval: van specialisme tot commodity
Information Retrieval: van specialisme tot commodityInformation Retrieval: van specialisme tot commodity
Information Retrieval: van specialisme tot commodityEric Sieverts
 
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
Vertrouwen op semantische zoeksystemen of zelf aan het stuurVertrouwen op semantische zoeksystemen of zelf aan het stuur
Vertrouwen op semantische zoeksystemen of zelf aan het stuurEric Sieverts
 
Zoekmachines weten het antwoord
Zoekmachines weten het antwoordZoekmachines weten het antwoord
Zoekmachines weten het antwoordEric Sieverts
 
Semantisch zoeken in een webomgeving
Semantisch zoeken in een webomgevingSemantisch zoeken in een webomgeving
Semantisch zoeken in een webomgevingEric Sieverts
 
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.Eric Sieverts
 

En vedette (17)

Een andere blik op Google
Een andere blik op GoogleEen andere blik op Google
Een andere blik op Google
 
Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should know
 
Wij zullen vinden - ook in 2023
Wij zullen vinden - ook in 2023Wij zullen vinden - ook in 2023
Wij zullen vinden - ook in 2023
 
40 jaar informatiegebruik
40 jaar informatiegebruik40 jaar informatiegebruik
40 jaar informatiegebruik
 
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
Metadata, standaarden, interoperabiliteit, semantisch web en linked dataMetadata, standaarden, interoperabiliteit, semantisch web en linked data
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
 
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated information
 
A pair of shoes in the thesaurus; some reflexions on human and computer indexing
A pair of shoes in the thesaurus; some reflexions on human and computer indexingA pair of shoes in the thesaurus; some reflexions on human and computer indexing
A pair of shoes in the thesaurus; some reflexions on human and computer indexing
 
Vinden dankzij / ondanks metadata
Vinden dankzij / ondanks metadataVinden dankzij / ondanks metadata
Vinden dankzij / ondanks metadata
 
Zin en onzin van metadata
Zin en onzin van metadataZin en onzin van metadata
Zin en onzin van metadata
 
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overloadLifehacking met RSS en Netvibes? De strijd tegen informatie overload
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
 
Een digitale bibliotheek of alleen Google?
Een digitale bibliotheek of alleen Google?Een digitale bibliotheek of alleen Google?
Een digitale bibliotheek of alleen Google?
 
Information Retrieval: van specialisme tot commodity
Information Retrieval: van specialisme tot commodityInformation Retrieval: van specialisme tot commodity
Information Retrieval: van specialisme tot commodity
 
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
Vertrouwen op semantische zoeksystemen of zelf aan het stuurVertrouwen op semantische zoeksystemen of zelf aan het stuur
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
 
Zoekmachines weten het antwoord
Zoekmachines weten het antwoordZoekmachines weten het antwoord
Zoekmachines weten het antwoord
 
Semantisch zoeken in een webomgeving
Semantisch zoeken in een webomgevingSemantisch zoeken in een webomgeving
Semantisch zoeken in een webomgeving
 
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
 

Similaire à Searching the internet - what patent searchers should know

Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technologyStefanos Anastasiadis
 
Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014notess
 
Seo Training By Anand Saini
Seo Training By Anand SainiSeo Training By Anand Saini
Seo Training By Anand SainiDr,Saini Anand
 
Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011cyberspaced educator
 
Information Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based ResearchInformation Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based ResearchDavid Nzoputa Ofili
 
Keyword research tools for Search Engine Optimisation (SEO)
Keyword research tools for Search Engine Optimisation (SEO)Keyword research tools for Search Engine Optimisation (SEO)
Keyword research tools for Search Engine Optimisation (SEO)Duncan MacGruer
 
Advanced google searching (1)
Advanced google searching (1)Advanced google searching (1)
Advanced google searching (1)Brenda Crawford
 
A Complete Guide to Creating a Sound Information Architecture in Atlassian Co...
A Complete Guide to Creating a Sound Information Architecture in Atlassian Co...A Complete Guide to Creating a Sound Information Architecture in Atlassian Co...
A Complete Guide to Creating a Sound Information Architecture in Atlassian Co...Brikit
 
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-PractiseTechnical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-PractiseErudite
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideAdam Audette
 
Advanced Internet Searching
Advanced Internet SearchingAdvanced Internet Searching
Advanced Internet SearchingPamela Seabolt
 
Advanced Internet searching Autumn 2012
Advanced Internet searching Autumn 2012Advanced Internet searching Autumn 2012
Advanced Internet searching Autumn 2012Phil Bradley
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Nate Plaunt
 

Similaire à Searching the internet - what patent searchers should know (20)

Google Dorks
Google DorksGoogle Dorks
Google Dorks
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technology
 
Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014
 
Search engines
Search enginesSearch engines
Search engines
 
Seo Training By Anand Saini
Seo Training By Anand SainiSeo Training By Anand Saini
Seo Training By Anand Saini
 
Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011Google search and beyond sasta 25 11-2011
Google search and beyond sasta 25 11-2011
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
Information Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based ResearchInformation Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based Research
 
Keyword research tools for Search Engine Optimisation (SEO)
Keyword research tools for Search Engine Optimisation (SEO)Keyword research tools for Search Engine Optimisation (SEO)
Keyword research tools for Search Engine Optimisation (SEO)
 
Ad intjune2013
Ad intjune2013Ad intjune2013
Ad intjune2013
 
Advanced google searching (1)
Advanced google searching (1)Advanced google searching (1)
Advanced google searching (1)
 
A Complete Guide to Creating a Sound Information Architecture in Atlassian Co...
A Complete Guide to Creating a Sound Information Architecture in Atlassian Co...A Complete Guide to Creating a Sound Information Architecture in Atlassian Co...
A Complete Guide to Creating a Sound Information Architecture in Atlassian Co...
 
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-PractiseTechnical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
 
Internet Searching Version2
Internet Searching Version2Internet Searching Version2
Internet Searching Version2
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive Guide
 
Advanced Internet Searching
Advanced Internet SearchingAdvanced Internet Searching
Advanced Internet Searching
 
Google
GoogleGoogle
Google
 
Advanced Internet searching Autumn 2012
Advanced Internet searching Autumn 2012Advanced Internet searching Autumn 2012
Advanced Internet searching Autumn 2012
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2
 

Searching the internet - what patent searchers should know

  • 1. UB Utrecht HvA-MIC GO Opleidingen searching the internet what patent searchers should know Eric Sieverts WON, 11-12-2012
  • 2. agenda • searching the web • the volatile google landscape • smart searching • dating and back to the past • reliability • google options • beyond google • beyond general web search • the social landscape
  • 3. the general agenda importance web of specific ?=? material everything types? general specific web material search search how to … how to … when & why
  • 4. an ever changing google landscape • unreliable numbers • irreproducible results • disappearing functions • changing interfaces
  • 5. "coping" with numbers of results in structured databases the effect on the number of results of how you combine terms, generally meets expectations, but: • with Google (and other web search) numbers are not stable, irreproducible, unreliable, with inexplicable effects – refine with an AND-relation may increase number of results – expand with an OR-relation may decrease number of results – numbers are only extrapolations from small part of search index – depends on distribution of the index over servers – depends on Google version, browser, whether logged in, history, ... – not just Google: Bing results also depend on geographic setting • Danny Sullivan explains why Google can not calculate: http://searchengineland.com/why-google-cant-count-results-properly-53559 Why Google Can’t Count Results Properly
  • 6. Google as a vanishing machine some services and options disappear completely – timeline, wonder wheel, toolbar, ... – + operator – real time results, code search – google buzz, google wave, google directory, ... others are only hidden – links for advanced search and for settings hidden under “cog wheel” (sometimes dependent on browser) – Scholar, Patents and Groups no longer mentioned in menus – backlink search no longer in advanced search – search for "similar" pages & "cache"-link are hidden in "invisible" pop-up page preview – …
  • 7.
  • 8.
  • 9. like faceted search in for instance Scopus
  • 10. e but m eanwhil dy this is alrea nt erface ! an "old" i refinements and additional functions like in modern "web scale discovery" systems
  • 11. tools & facets from clear left column to blurry top menu (for mobile's sake?) google.nl [until 2 weeks ago] google.com
  • 12. all options by material type, in old interface
  • 13. Google tries outsmarting us Google tries to improve and to broaden your queries • automatic spelling corrections (veilgheid >> veiligheid) • search for words with same word stem (singular/plural, verb, conjugation, inflection, …) • expands acronyms (jfk >> john f kennedy | wwii >> world war II) • adds synonyms (vaccination >> immunization) • transforms separate words to compound term & vice versa (veiligheid maatregel >> veiligheidsmaatregel | catfood >> cat food) • may leave out term as optional if not differentiating enough never often what/when or notEnglish than in Dutch more sure and elaborate in • personalises search, based on previous search behaviour and if you don't like all of this ........ >> "verbatim"
  • 14.
  • 15. option recently to moved u top men new option introduced early 2012 verbatim on google.nl: "woord voor woord"
  • 16.
  • 17.
  • 18. standard semantic coding allowed Google to make a recipe search engine "embedded metadata" standardisation of property descriptions in HTML of recipe pages, with "microformats"/"rich snippets markup"
  • 19. Google's "Knowledge Graph" knows 500 million objects with 3,5 billion properties (but only in English)
  • 21. publication dates • limitation while searching google – before search: only "past day/week/month/year" – after search: also limitation on custom range "from .. to .." search tools:
  • 22. publication dates • limitation while searching google – before search: only "past day/week/month/year" – after search: also limitation on custom range "from .. to .." • how reliable are google's dates? NOT • how else to determine date? – look at page text (especially top and bottom or blogging date) – look in page source (HTML) for metadata – try entering javascript in browser URL bar javascript:alert(document.lastModified) but does NOT work for CMS generated pages – look for indexing date in Google cache – try to find recent time stamped version in Web Archive (waybackmachine)
  • 23.
  • 24.
  • 25.
  • 26. disappeared / old versions of pages • recently disappeared: try search engine cache not just google! : Bing Yahoo Exalead
  • 27. disappeared / old versions of pages for older versions: try web archive (waybackmachine) http://archive.org • links within same site are mostly working • if particular page has not been crawled, they show which other pages on that site have been crawled • some pages/sites have only recently been crawled • other pages/sites go far back in time • if domain name has changed, you must use the old name • some sites don't want to be crawled
  • 28.
  • 29.
  • 30.
  • 33. reliability & integrity - general general website assessment criteria • professional lay-out • indication of author/organisation (“about us”) • data about organisation: address, telephone, map/driving directions • indication of targeted audience • not too many advertisements and pop-ups (although every site has them) • clear navigation • internal search option • speed of web server • backlinks from well known organisations ** • up to date-ness (with date given) • language use • interpret the URL/domain-name (eg: edu, edu.au, edu.sg, edu.ng, edu.lb, ac.uk, gov, gov.uk, gov.hk, gov.au, gov.on.ca, gob.es, gob.mx, gob.ve, gob.ec, ...)
  • 34. reliability & integrity - organisation Information about organisation • Google pagerank (backlinks) use for instance: http://www.prchecker.info/ http://www.checkpagerank.net/ • Alexa rank (web traffic) see for instance: http://www.alexa.com/ http://www.seomastering.com/alexa-rank-checker.php • domain owner use for instance: http://centralops.net/co/DomainDossier.aspx http://whois.domaintools.com/ • search for "backlinks"
  • 35.
  • 36.
  • 37. reliability & integrity - backlinks search backlinks to particular web-page/-site • Google: link:http://www.domain.zz/folder/file.html very incomplete result • Yahoo site explorer: died last year • DuckDuckGo: link:http://www.domain.zz/folder/file.html often > google; no total numbers given • OpenSiteExplorer: linking pages + linking domains very complete; also domain & page authority paid subscription if more than 3 queries /day • Exalead: link:http://www.domain.zz/ no backlinks to specific page, but to whole site • Alexa: 100 most important domains backlinking to site
  • 38. the 35 sites mentioned under "reputation" after 9 no more results
  • 39.
  • 40.
  • 41. total list: 30 results
  • 42.
  • 43.
  • 44. backlinks - variable ratios reported # backlinks google DDG OSE homepage1 17 9 2016 deeppage1 4 0 30 deeppage2 9 30 224
  • 45. some more "how to" • domain search: site:edu OR site:edu.* [for all edu (sub)domains] site:shell.com OR site:philips.com • url search: inurl:novelty • title search: intitle:catalytic just • filetype search: filetype:pdf filetype:xls OR filetype:xlsx filetype:doc OR filetype:docx more than shown in advanced search drop-down menu filetype:rss • exact search: "greenhouses“ [or VERBATIM for all words]
  • 46.
  • 47.
  • 48. general search engines besides google • Bing microsoft, large • Yahoo! content=Bing, large • Blekko uses hashtags to search more [domain-] selective also many predefined hashtags; e.g. /likes for Facebook • DuckDuckGo assures privacy, no personalisation, no filter-bubble, rather small, !Bang-function offers many extras • Gigablast green search engine, rather small, some unique functions • Exalead french, many advanced functions, primarily demo system • Millionshort leaves out results from most popular sites → the long tail • WolframAlpha knowledge engine, facts, calculations together, these others have 30% market share in US; in NL only 3% • Yandex in Russia more popular than Google • Baidu in China more popular than Google • Naver, Daum in South Korea more popular than Google • Seznam in Czechia more popular than Google
  • 49. material type specific search blogs google blogs, icerocket, technorati [rss] CTRLQ, RSS SearchHub video google video, youtube, youtube edu channel, bing video, blinkx, voxalead-news images google image, yahoo image, bing image, flickr, tineye (ip-check), panoramio (geo-search) science google scholar, microsoft academic, scirus, oaister, scientific commons, science.gov nieuws google news, yahoo news, bing news, cnn, bbc, historische kranten KB, historic american newspapers (LOC) tweets twitter search, topsy, tweetzi, postpost, snapbird social socialsearcher, socialmention, samepoint, whostalkin, kurrently forums google groups, omgili, boardtracker
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57. tweets & social search • Twitter in 140 characters – often with shortened links – often with photo- or video-link – often with hashtags (#agreeduponkeyword) search (often limited to last 1 - 2 weeks, and .... to those 140 characters) – twitter-search (also advanced search), tweetzi, … – topsy (also older messages) – postpost (your own timeline - i.e. everything you're following) – snapbird (full tweet history of 1 person – by his/her twittername) – twicsy (photo's on twitter) – ... overview/review of tools: All the easiest ways to search old tweets 57
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66. tweets & social search • “Real time / social search engines” – socialsearcher, socialmention, samepoint, whostalkin, kurrently, … (tweets + blogs + facebook + …) – Google personal results / Google+ ("search plus your world") – real-time pictures: skylines • Forum discussions – omgili, boardtracker, ... – Google groups (also old newsgroup discussions) for research methods: – advice from Henk van Ess (dutch): "de digitale detective" (2012) – How to: use social media in newsgathering (2012) – 100+ Social Media Monitoring Tools (2010) 66
  • 67.
  • 68.