SlideShare a Scribd company logo
1 of 7
Discover the Invisible Web

Jeffrey Franklin and David Rakowski
What is the Invisible Web?
• a/k/a "hidden" "deep" and "dark" web
• Google currently indexes approximately 1 trillion Web pages
• It is estimated that the invisible web is 400‐550 times bigger and
  contains 7,500 terabytes of information (as compared to 19
  terabytes of information that Google currently indexes)
    – http://aip.completeplanet.com/aip‐engines/help/help_deepwebfaqs.j
      sp
• "The term ‘invisible web’ mainly refers to the vast repository of
  information that search engines and directories don't have direct
  access to, like databases."
    – http://websearch.about.com/od/invisibleweb/a/invisible_web.htm
Examples

• Resides in a database or a table
• Created dynamically
• Accessible only to registered users
• Stored in subdirectories deep within a
website
• Generally no Flash, zip or executable files
• Exists in real time
• Social media‐‐‐can be hit or miss
• Excluded by the owner (robots.txt)
How Does it Differ From the Visible
              Web?


"The ‘visible web’ is what you can find using
        general web search engines"
Learn to Find “Invisible Documents”
• Include the word “database” as part of your
  search
• Limit by filetype: PDF, .doc. xls, .ppt
Where do old web pages go? Learn to
           locate them
Wayback Machine (www.archive.org)
Public.Resource.Org (http://public.resource.org)
CyberCemetery (http://govinfo.library.unt.edu/)
Google Instant Preview

More Related Content

What's hot

Don't Click That Link! Cybersecurity Best Practices for Your Institution and ...
Don't Click That Link! Cybersecurity Best Practices for Your Institution and ...Don't Click That Link! Cybersecurity Best Practices for Your Institution and ...
Don't Click That Link! Cybersecurity Best Practices for Your Institution and ...Rob Nunez
 
Web History 101, or How the Future is Unwritten
Web History 101, or How the Future is UnwrittenWeb History 101, or How the Future is Unwritten
Web History 101, or How the Future is UnwrittenBookNet Canada
 
Intermediation In The New User Environment
Intermediation In The New User EnvironmentIntermediation In The New User Environment
Intermediation In The New User Environmentguest70b390
 
Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...
Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...
Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...BookNet Canada
 
The Future of Interlibrary Loan: How Do We Get There?
The Future of Interlibrary Loan: How Do We Get There?The Future of Interlibrary Loan: How Do We Get There?
The Future of Interlibrary Loan: How Do We Get There?kramsey
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everythinglibrarywebchic
 
Andrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State SenateAndrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State SenateAcquia
 
BookServer: A Web of Books
BookServer: A Web of BooksBookServer: A Web of Books
BookServer: A Web of BooksPeter Brantley
 
Open Source & Citizen Science
Open Source & Citizen ScienceOpen Source & Citizen Science
Open Source & Citizen ScienceAndrea Wiggins
 
Resource Oriented Architectures: The Future of Data API?
Resource Oriented Architectures: The Future of Data API?Resource Oriented Architectures: The Future of Data API?
Resource Oriented Architectures: The Future of Data API?Victor Olex
 

What's hot (12)

Don't Click That Link! Cybersecurity Best Practices for Your Institution and ...
Don't Click That Link! Cybersecurity Best Practices for Your Institution and ...Don't Click That Link! Cybersecurity Best Practices for Your Institution and ...
Don't Click That Link! Cybersecurity Best Practices for Your Institution and ...
 
Global BHL Activities
Global BHL ActivitiesGlobal BHL Activities
Global BHL Activities
 
Web History 101, or How the Future is Unwritten
Web History 101, or How the Future is UnwrittenWeb History 101, or How the Future is Unwritten
Web History 101, or How the Future is Unwritten
 
Intermediation In The New User Environment
Intermediation In The New User EnvironmentIntermediation In The New User Environment
Intermediation In The New User Environment
 
Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...
Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...
Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...
 
The Future of Interlibrary Loan: How Do We Get There?
The Future of Interlibrary Loan: How Do We Get There?The Future of Interlibrary Loan: How Do We Get There?
The Future of Interlibrary Loan: How Do We Get There?
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everything
 
Andrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State SenateAndrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State Senate
 
Resource Oriented Architecture
Resource Oriented ArchitectureResource Oriented Architecture
Resource Oriented Architecture
 
BookServer: A Web of Books
BookServer: A Web of BooksBookServer: A Web of Books
BookServer: A Web of Books
 
Open Source & Citizen Science
Open Source & Citizen ScienceOpen Source & Citizen Science
Open Source & Citizen Science
 
Resource Oriented Architectures: The Future of Data API?
Resource Oriented Architectures: The Future of Data API?Resource Oriented Architectures: The Future of Data API?
Resource Oriented Architectures: The Future of Data API?
 

Similar to Discover the invisible web

Presentation Deep Web Technology.pptx
Presentation Deep Web Technology.pptxPresentation Deep Web Technology.pptx
Presentation Deep Web Technology.pptxmayurbokan
 
Deep Web and Digital Investigations
Deep Web and Digital Investigations Deep Web and Digital Investigations
Deep Web and Digital Investigations Damir Delija
 
The invisible-webppt4899
The invisible-webppt4899The invisible-webppt4899
The invisible-webppt4899Eriik_lobo
 
Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010Juan Sequeda
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museumsmherbison
 
The Deep Web
The Deep WebThe Deep Web
The Deep Webmartinp
 
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...cmitch41
 
Deep Web Presentation April 25
Deep Web Presentation April 25Deep Web Presentation April 25
Deep Web Presentation April 25nagold
 
Using Web 2.0 tools in the library
Using Web 2.0 tools in the libraryUsing Web 2.0 tools in the library
Using Web 2.0 tools in the libraryPhil Bradley
 
NISO REST Training IIIF
NISO REST Training IIIF NISO REST Training IIIF
NISO REST Training IIIF Glen Robson
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3Essam Obaid
 
Minnebar9 -- The Next Web of Linked Data
Minnebar9 -- The Next Web of Linked DataMinnebar9 -- The Next Web of Linked Data
Minnebar9 -- The Next Web of Linked DataJay Myers
 
Deep Web and TOR Browser
Deep Web and TOR BrowserDeep Web and TOR Browser
Deep Web and TOR BrowserArjith K Raj
 
الجيل الثاني للشبكة العنكبوتية Web 2
الجيل الثاني للشبكة العنكبوتية Web 2الجيل الثاني للشبكة العنكبوتية Web 2
الجيل الثاني للشبكة العنكبوتية Web 2Prof. Sherif Shaheen
 

Similar to Discover the invisible web (20)

Presentation Deep Web Technology.pptx
Presentation Deep Web Technology.pptxPresentation Deep Web Technology.pptx
Presentation Deep Web Technology.pptx
 
Deep web Seminar
Deep web Seminar Deep web Seminar
Deep web Seminar
 
Deep Web and Digital Investigations
Deep Web and Digital Investigations Deep Web and Digital Investigations
Deep Web and Digital Investigations
 
Deep Web
Deep WebDeep Web
Deep Web
 
The invisible-webppt4899
The invisible-webppt4899The invisible-webppt4899
The invisible-webppt4899
 
Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010
 
Overview of Deep web
Overview of Deep webOverview of Deep web
Overview of Deep web
 
Shaping our futures: the Social Semantic Web
Shaping our futures: the Social Semantic WebShaping our futures: the Social Semantic Web
Shaping our futures: the Social Semantic Web
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
 
The Deep Web
The Deep WebThe Deep Web
The Deep Web
 
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
 
Deep Web Presentation April 25
Deep Web Presentation April 25Deep Web Presentation April 25
Deep Web Presentation April 25
 
Practical uses for Web2.0 in your organisation
Practical uses for Web2.0 in your organisationPractical uses for Web2.0 in your organisation
Practical uses for Web2.0 in your organisation
 
Using Web 2.0 tools in the library
Using Web 2.0 tools in the libraryUsing Web 2.0 tools in the library
Using Web 2.0 tools in the library
 
NISO REST Training IIIF
NISO REST Training IIIF NISO REST Training IIIF
NISO REST Training IIIF
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
 
Minnebar9 -- The Next Web of Linked Data
Minnebar9 -- The Next Web of Linked DataMinnebar9 -- The Next Web of Linked Data
Minnebar9 -- The Next Web of Linked Data
 
Deep Web
Deep WebDeep Web
Deep Web
 
Deep Web and TOR Browser
Deep Web and TOR BrowserDeep Web and TOR Browser
Deep Web and TOR Browser
 
الجيل الثاني للشبكة العنكبوتية Web 2
الجيل الثاني للشبكة العنكبوتية Web 2الجيل الثاني للشبكة العنكبوتية Web 2
الجيل الثاني للشبكة العنكبوتية Web 2
 

More from drakowski

How to search like a pro
How to search like a proHow to search like a pro
How to search like a prodrakowski
 
Government internet resources
Government internet resourcesGovernment internet resources
Government internet resourcesdrakowski
 
Finding practice tools
Finding practice toolsFinding practice tools
Finding practice toolsdrakowski
 
Legal ethics online
Legal ethics onlineLegal ethics online
Legal ethics onlinedrakowski
 
Locating companies for due diligence and background information
Locating companies for due diligence and background informationLocating companies for due diligence and background information
Locating companies for due diligence and background informationdrakowski
 
Chapter 1 [ Compatibility Mode]
Chapter 1 [ Compatibility  Mode]Chapter 1 [ Compatibility  Mode]
Chapter 1 [ Compatibility Mode]drakowski
 

More from drakowski (6)

How to search like a pro
How to search like a proHow to search like a pro
How to search like a pro
 
Government internet resources
Government internet resourcesGovernment internet resources
Government internet resources
 
Finding practice tools
Finding practice toolsFinding practice tools
Finding practice tools
 
Legal ethics online
Legal ethics onlineLegal ethics online
Legal ethics online
 
Locating companies for due diligence and background information
Locating companies for due diligence and background informationLocating companies for due diligence and background information
Locating companies for due diligence and background information
 
Chapter 1 [ Compatibility Mode]
Chapter 1 [ Compatibility  Mode]Chapter 1 [ Compatibility  Mode]
Chapter 1 [ Compatibility Mode]
 

Discover the invisible web

  • 1. Discover the Invisible Web Jeffrey Franklin and David Rakowski
  • 2.
  • 3. What is the Invisible Web? • a/k/a "hidden" "deep" and "dark" web • Google currently indexes approximately 1 trillion Web pages • It is estimated that the invisible web is 400‐550 times bigger and contains 7,500 terabytes of information (as compared to 19 terabytes of information that Google currently indexes) – http://aip.completeplanet.com/aip‐engines/help/help_deepwebfaqs.j sp • "The term ‘invisible web’ mainly refers to the vast repository of information that search engines and directories don't have direct access to, like databases." – http://websearch.about.com/od/invisibleweb/a/invisible_web.htm
  • 4. Examples • Resides in a database or a table • Created dynamically • Accessible only to registered users • Stored in subdirectories deep within a website • Generally no Flash, zip or executable files • Exists in real time • Social media‐‐‐can be hit or miss • Excluded by the owner (robots.txt)
  • 5. How Does it Differ From the Visible Web? "The ‘visible web’ is what you can find using general web search engines"
  • 6. Learn to Find “Invisible Documents” • Include the word “database” as part of your search • Limit by filetype: PDF, .doc. xls, .ppt
  • 7. Where do old web pages go? Learn to locate them Wayback Machine (www.archive.org) Public.Resource.Org (http://public.resource.org) CyberCemetery (http://govinfo.library.unt.edu/) Google Instant Preview

Editor's Notes

  1. THIS SLIDEINTENTIONALLY LEFT BLANK
  2. Think about this in terms of electronic discovery….