SlideShare a Scribd company logo
1 of 31
Download to read offline
Adventures
In SearchLand



Valeria de Paiva
July 2009
PARC
Outline
●   Personal background
●   What is a search engine?
●   How do they work?
●   SearchLand?
●   Cuil!
●   Adventures...
●   and Opportunities
Yours truly...
●   Pure mathematics in Cambridge
●   Work on Category Theory
●   Programming languages
●   Natural language & KR in PARC
●   Search...



                BRIDGE
Search engines...
●   Until last year my
    understanding of search
    engines was like my
    understanding of telephones
    or cars...
●   I know when they're working
    and how to use them.
●   I have no idea why or how
    they work...
●   Assuming you're like this too,
    some tidbits...
Search Engines are like Librarians
●   Have to have loads of documents a
    pesky user might want to see.
●   Need to know the contents of the
    documents, to give the appropriate
    document.
●   Need to aggregate the records of the
    contents of the documents in the index.
●   When the user asks for a document,
    the librarian has to consult its index,
    decide on the most appropriate
    answers (the hits), find and deliver
    them in a timely and pleasant manner
Metaphor continued...
●   There is a building up step:
    collecting and indexing documents


●   There is a serving up process:
    reading the query in, massaging it,
    finding the results, ranking results
    and serving results.


●   These correspond to the modules of the search
    engine: crawler, indexer, query analyzer, finding and
    ranking algorithms, webserver magic
Metaphor gone too far...
●   Books don't arrive at a library in
    tens of thousands every day
    Search engines crawl the web all the time
    (and freshness is a real problem)
●   Libraries get rid of books once a year
    Search engines would re-index every five minutes
          if they could
●   Libraries simply hand off their goods,
     search engines differentiate themselves by how they
    deliver their goods
Search Engine Basics
A search engine has modules
   –   Crawler
   –   Indexer
   –   Query analyzer
   –   Searcher
   –   Ranking
   –   Webserver
                   Why writing your own search engine is hard
                   Patterson, ACM Q, 2004
                   Building Nutch: Open Source,
                   Cafarella and Cutting, 2004
                   Search technologies for the internet
                   Henzinger, Science. 2007
Search Engine Scheme


              WEB         WEB
             (users)     (data)




           Web server             crawler


                                            mining

                                  indexer
ranking   Index server
          Query server
SearchLand...
●   So far, so good.
●   Like Alice in the Wonderland in
    the Oxford meadows with her sister
●   Then she follows the rabbit into the
    hole and things began to change..
Getting there
●   PARC: a big change from academia.
    There are things that you cannot tell
    your friends about your industrial
    research
●   Timing is an art: you cannot publish
    too early, as IP has to be protected.
    Wait too much and there's nothing to
    publish.
●   But PARC is still much closer to academia
    than I realized. It's research! It must
    become a product. Pretty soon. But it isn't
    one to begin with.
Are we there yet?
●   Start-up landscape is different:
    no offices, an open plan with
    individual desks and machines
●   No book shelves, no work phones
●   No four All Hands per year,
    one every week.
●   Release of new code once a week,
    usually more
●   Life moves fast...
SearchLand: Cool Cuil!
●   How did I get there?
    Anna Patterson and Tom Costello
    are friends of many years.
    How did they get there?
●   They did a search start-up called Xift in 1999. Then
    Anna designed, wrote and sold Recall—the largest
    search engine in 2004 to Google. Also architect of
    Google’s TeraGoogle in early 2006.
●   Tom worked in IBM on the prototype of WebFountain
    and on Storage Systems Strategy worldwide
●   Then they decided to work together in Cuil
The reasons for Cuil
●   There are many search engines.
    But their results tend to be very similar.
    Are we seeing everything?
●   Reports estimate we can see only 15%
    of the existing web. This is decreasing
●   Probing the web is mostly popularity
    based. You're likely to see what others
    have seen before.
    But your seeing increases the
    popularity of what you saw, thereby
    reducing the pool of available stuff.
●   Deep Web too?...
The reasons for Cuil
Much rubbish on the web.
Some say all we don't see is
web rot: web spam, porn,
mindless duplication of non-
content...
Cuil says let's check it out, let's
analyze contents of the pages.
People want to find information
important to them, even when
it's not popular.
[e.g. vanity search yields long
lost brother]
The reasons for Cuil
●   Cost and natural resources
●   Users don't pay directly for
    using search engines and their
    server farms
●   But costs to the environment
    should be part of the equation
●   Cuil can serve a bigger index
    using a small fraction of the
    number of machines
●   Cheaper for the environment
    and for the company
The reasons for Cuil

          ●   Cuil doesn't need to
              know your search
              history and habits.
          ●   So we don't.
          ●   no names, no IP
              addresses, and no
              cookies
          ●   Your search history is
              your business, not
              ours.
The reasons for Cuil
●   There is (too much) information on the web.
●   Cuil 'organizes' the web so that you can find
    information that you didn't know you wanted..
Organizing the web...
●   Images can help.
●   Longer snippets help.
●   Tabs and categories show new stuff.
●   Images can help.
●   Definitions –easier then going to a dictionary
●   Timelines -- show you the evolution of your concept
●   Maplines – new connections
●   Videos from Hulu, maps from Mapquest.
Organization is fundamental
●   Definitions –easier then going to a dictionary
●   Timelines - show the evolution of your concept
●   Maplines – new connections
●   Videos from Hulu, maps from Mapquest.
Adventures
●   There are many.
●   Talking about three:
●   Launch!
        –   And blogsphere...
●   Timelines
●   Languages
Launching a product
●   It's different from anything
    I had ever done before.
●   Launched July 28th, less than
    three months from my start.
●   Hoped for a “soft” launch in the
    middle of the summer..
●   Unbelievable “flood” of interest
After the hype, the blogs...
●   Hadn't realized how much the
    valley runs on blogs
●   Didn't know about tech
    celebrities or valleywag...
●   Had no idea how many people
    make a living doing SEO
●   Unbelievable that people went
    to the trouble of “faking” bad
    results.
Timelines
●   Launched in March'09
●   Dynamic timelines, not pre-
    computed for a few subjects
●   Project completed in less than
    six weeks
●   Too many? Algorithm still
    needs improvement
●   But a personal battle won...
Multiple Languages
●   Launched in May'09
●   Infra-structure in place, took
    less than a month to release
●   Seven languages so far
●   Evaluation hardly started
●   But loads of offers to help
●   All of this organization with a
    team of less than thirty...
Opportunities...
●   There are many.
●   Quality evaluation
●   Relevance
    improvement
●   More services...
More Opportunities...
●   Three banes of my life:
●   Spam, spam, spam
●   (Economics of) malware
●   Attacking pornography
Summing up
●   Life in Searchland is very
    different
●   And lots of fun!
●   As Patterson says in “Why
    Writing your Own Search Engine
    is Hard”, AM Q 2004,
    “[...] once the search bug gets
    you, you'll be back. The problem
    isn't getting any easier, and it
    needs all the experience anyone
    can muster.”
And ever, as the story drained
  The wells of fancy dry,
And faintly strove that weary one
  To put the subject by,
“The rest next time--” “It is next time!”
  The happy voices cry.

                        Lewis Carroll -- Proem




                             Thank You!
PARC Forum 2009: Adventures in SearchLand

More Related Content

Similar to PARC Forum 2009: Adventures in SearchLand

Indextank east bay ruby meetup slides
Indextank east bay ruby meetup slidesIndextank east bay ruby meetup slides
Indextank east bay ruby meetup slidesYogiWanKenobi
 
Find my tea [sync ipswich] a technical journey through new product development
Find my tea [sync ipswich] a technical journey through new product developmentFind my tea [sync ipswich] a technical journey through new product development
Find my tea [sync ipswich] a technical journey through new product developmentPaulGrenyer1
 
Android Developer Skills, Techniques, and Patterns
Android Developer Skills, Techniques, and PatternsAndroid Developer Skills, Techniques, and Patterns
Android Developer Skills, Techniques, and Patternsgdgut
 
Informal talk at pict
Informal talk at pictInformal talk at pict
Informal talk at pictMayank Jain
 
Challenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali LanguageChallenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali LanguageChandan Goopta
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
 
The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis...
The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis...The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis...
The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis...Lucidworks
 
Python in Industry
Python in IndustryPython in Industry
Python in IndustryDharmit Shah
 
Sourcing from unconventional sources - sosu v 281020
Sourcing from unconventional sources - sosu v 281020Sourcing from unconventional sources - sosu v 281020
Sourcing from unconventional sources - sosu v 281020Gordon Lokenberg
 
Of Dodos, 'Karma' & Free Software in the Library
Of Dodos, 'Karma' & Free Software in the LibraryOf Dodos, 'Karma' & Free Software in the Library
Of Dodos, 'Karma' & Free Software in the LibraryIndranil Das Gupta
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Abhishek Thakur
 
How to write a web framework
How to write a web frameworkHow to write a web framework
How to write a web frameworkNgoc Dao
 
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
Master Technical Recruiting Workshop:  How to Recruit Top Tech TalentMaster Technical Recruiting Workshop:  How to Recruit Top Tech Talent
Master Technical Recruiting Workshop: How to Recruit Top Tech TalentRecruitingDaily.com LLC
 
Website Content Planning For Law Firms | LawLytics Webinars
Website Content Planning For Law Firms | LawLytics WebinarsWebsite Content Planning For Law Firms | LawLytics Webinars
Website Content Planning For Law Firms | LawLytics WebinarsDan Jaffe
 
10 Mistakes When Moving to Topic-Based Authoring
10 Mistakes When Moving to Topic-Based Authoring10 Mistakes When Moving to Topic-Based Authoring
10 Mistakes When Moving to Topic-Based Authoringdclsocialmedia
 
DEF CON 23 - Ryan Mitchell - separating bots from humans
DEF CON 23 - Ryan Mitchell - separating bots from humansDEF CON 23 - Ryan Mitchell - separating bots from humans
DEF CON 23 - Ryan Mitchell - separating bots from humansFelipe Prado
 
10 Digital Marketing Trends for 2017
10 Digital Marketing Trends for 201710 Digital Marketing Trends for 2017
10 Digital Marketing Trends for 2017DragonSearch
 
Load testing, Lessons learnt and Loadzen - Martin Buhr at DevTank - 31st Janu...
Load testing, Lessons learnt and Loadzen - Martin Buhr at DevTank - 31st Janu...Load testing, Lessons learnt and Loadzen - Martin Buhr at DevTank - 31st Janu...
Load testing, Lessons learnt and Loadzen - Martin Buhr at DevTank - 31st Janu...Loadzen
 
Techhub Riga - tm 27.07
Techhub Riga - tm  27.07Techhub Riga - tm  27.07
Techhub Riga - tm 27.07Toms Bauģis
 

Similar to PARC Forum 2009: Adventures in SearchLand (20)

Indextank east bay ruby meetup slides
Indextank east bay ruby meetup slidesIndextank east bay ruby meetup slides
Indextank east bay ruby meetup slides
 
Find my tea [sync ipswich] a technical journey through new product development
Find my tea [sync ipswich] a technical journey through new product developmentFind my tea [sync ipswich] a technical journey through new product development
Find my tea [sync ipswich] a technical journey through new product development
 
Android Developer Skills, Techniques, and Patterns
Android Developer Skills, Techniques, and PatternsAndroid Developer Skills, Techniques, and Patterns
Android Developer Skills, Techniques, and Patterns
 
Informal talk at pict
Informal talk at pictInformal talk at pict
Informal talk at pict
 
Challenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali LanguageChallenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali Language
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
 
The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis...
The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis...The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis...
The Search for Better Search at Reddit - Nick Caldwell, Chris Slowe, and Luis...
 
Impact of Open Source
Impact of Open SourceImpact of Open Source
Impact of Open Source
 
Python in Industry
Python in IndustryPython in Industry
Python in Industry
 
Sourcing from unconventional sources - sosu v 281020
Sourcing from unconventional sources - sosu v 281020Sourcing from unconventional sources - sosu v 281020
Sourcing from unconventional sources - sosu v 281020
 
Of Dodos, 'Karma' & Free Software in the Library
Of Dodos, 'Karma' & Free Software in the LibraryOf Dodos, 'Karma' & Free Software in the Library
Of Dodos, 'Karma' & Free Software in the Library
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)
 
How to write a web framework
How to write a web frameworkHow to write a web framework
How to write a web framework
 
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
Master Technical Recruiting Workshop:  How to Recruit Top Tech TalentMaster Technical Recruiting Workshop:  How to Recruit Top Tech Talent
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
 
Website Content Planning For Law Firms | LawLytics Webinars
Website Content Planning For Law Firms | LawLytics WebinarsWebsite Content Planning For Law Firms | LawLytics Webinars
Website Content Planning For Law Firms | LawLytics Webinars
 
10 Mistakes When Moving to Topic-Based Authoring
10 Mistakes When Moving to Topic-Based Authoring10 Mistakes When Moving to Topic-Based Authoring
10 Mistakes When Moving to Topic-Based Authoring
 
DEF CON 23 - Ryan Mitchell - separating bots from humans
DEF CON 23 - Ryan Mitchell - separating bots from humansDEF CON 23 - Ryan Mitchell - separating bots from humans
DEF CON 23 - Ryan Mitchell - separating bots from humans
 
10 Digital Marketing Trends for 2017
10 Digital Marketing Trends for 201710 Digital Marketing Trends for 2017
10 Digital Marketing Trends for 2017
 
Load testing, Lessons learnt and Loadzen - Martin Buhr at DevTank - 31st Janu...
Load testing, Lessons learnt and Loadzen - Martin Buhr at DevTank - 31st Janu...Load testing, Lessons learnt and Loadzen - Martin Buhr at DevTank - 31st Janu...
Load testing, Lessons learnt and Loadzen - Martin Buhr at DevTank - 31st Janu...
 
Techhub Riga - tm 27.07
Techhub Riga - tm  27.07Techhub Riga - tm  27.07
Techhub Riga - tm 27.07
 

More from Valeria de Paiva

Dialectica Categorical Constructions
Dialectica Categorical ConstructionsDialectica Categorical Constructions
Dialectica Categorical ConstructionsValeria de Paiva
 
Logic & Representation 2021
Logic & Representation 2021Logic & Representation 2021
Logic & Representation 2021Valeria de Paiva
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear LogicsValeria de Paiva
 
Dialectica Categories Revisited
Dialectica Categories RevisitedDialectica Categories Revisited
Dialectica Categories RevisitedValeria de Paiva
 
Networked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceNetworked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceValeria de Paiva
 
Going Without: a modality and its role
Going Without: a modality and its roleGoing Without: a modality and its role
Going Without: a modality and its roleValeria de Paiva
 
Problemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoProblemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoValeria de Paiva
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesValeria de Paiva
 
The importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in PortugueseThe importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in PortugueseValeria de Paiva
 
Negation in the Ecumenical System
Negation in the Ecumenical SystemNegation in the Ecumenical System
Negation in the Ecumenical SystemValeria de Paiva
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear LogicsValeria de Paiva
 
Semantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACTSemantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACTValeria de Paiva
 
Categorical Explicit Substitutions
Categorical Explicit SubstitutionsCategorical Explicit Substitutions
Categorical Explicit SubstitutionsValeria de Paiva
 
Logic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogLogic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogValeria de Paiva
 
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Valeria de Paiva
 

More from Valeria de Paiva (20)

Dialectica Comonoids
Dialectica ComonoidsDialectica Comonoids
Dialectica Comonoids
 
Dialectica Categorical Constructions
Dialectica Categorical ConstructionsDialectica Categorical Constructions
Dialectica Categorical Constructions
 
Logic & Representation 2021
Logic & Representation 2021Logic & Representation 2021
Logic & Representation 2021
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear Logics
 
Dialectica Categories Revisited
Dialectica Categories RevisitedDialectica Categories Revisited
Dialectica Categories Revisited
 
PLN para Tod@s
PLN para Tod@sPLN para Tod@s
PLN para Tod@s
 
Networked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceNetworked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better Science
 
Going Without: a modality and its role
Going Without: a modality and its roleGoing Without: a modality and its role
Going Without: a modality and its role
 
Problemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoProblemas de Kolmogorov-Veloso
Problemas de Kolmogorov-Veloso
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and Machines
 
Dialectica Petri Nets
Dialectica Petri NetsDialectica Petri Nets
Dialectica Petri Nets
 
The importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in PortugueseThe importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in Portuguese
 
Negation in the Ecumenical System
Negation in the Ecumenical SystemNegation in the Ecumenical System
Negation in the Ecumenical System
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear Logics
 
Semantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACTSemantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACT
 
NLCS 2013 opening slides
NLCS 2013 opening slidesNLCS 2013 opening slides
NLCS 2013 opening slides
 
Dialectica Comonads
Dialectica ComonadsDialectica Comonads
Dialectica Comonads
 
Categorical Explicit Substitutions
Categorical Explicit SubstitutionsCategorical Explicit Substitutions
Categorical Explicit Substitutions
 
Logic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogLogic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for Dialog
 
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

PARC Forum 2009: Adventures in SearchLand

  • 1.
  • 2. Adventures In SearchLand Valeria de Paiva July 2009 PARC
  • 3. Outline ● Personal background ● What is a search engine? ● How do they work? ● SearchLand? ● Cuil! ● Adventures... ● and Opportunities
  • 4. Yours truly... ● Pure mathematics in Cambridge ● Work on Category Theory ● Programming languages ● Natural language & KR in PARC ● Search... BRIDGE
  • 5. Search engines... ● Until last year my understanding of search engines was like my understanding of telephones or cars... ● I know when they're working and how to use them. ● I have no idea why or how they work... ● Assuming you're like this too, some tidbits...
  • 6. Search Engines are like Librarians ● Have to have loads of documents a pesky user might want to see. ● Need to know the contents of the documents, to give the appropriate document. ● Need to aggregate the records of the contents of the documents in the index. ● When the user asks for a document, the librarian has to consult its index, decide on the most appropriate answers (the hits), find and deliver them in a timely and pleasant manner
  • 7. Metaphor continued... ● There is a building up step: collecting and indexing documents ● There is a serving up process: reading the query in, massaging it, finding the results, ranking results and serving results. ● These correspond to the modules of the search engine: crawler, indexer, query analyzer, finding and ranking algorithms, webserver magic
  • 8. Metaphor gone too far... ● Books don't arrive at a library in tens of thousands every day Search engines crawl the web all the time (and freshness is a real problem) ● Libraries get rid of books once a year Search engines would re-index every five minutes if they could ● Libraries simply hand off their goods, search engines differentiate themselves by how they deliver their goods
  • 9. Search Engine Basics A search engine has modules – Crawler – Indexer – Query analyzer – Searcher – Ranking – Webserver Why writing your own search engine is hard Patterson, ACM Q, 2004 Building Nutch: Open Source, Cafarella and Cutting, 2004 Search technologies for the internet Henzinger, Science. 2007
  • 10. Search Engine Scheme WEB WEB (users) (data) Web server crawler mining indexer ranking Index server Query server
  • 11. SearchLand... ● So far, so good. ● Like Alice in the Wonderland in the Oxford meadows with her sister ● Then she follows the rabbit into the hole and things began to change..
  • 12. Getting there ● PARC: a big change from academia. There are things that you cannot tell your friends about your industrial research ● Timing is an art: you cannot publish too early, as IP has to be protected. Wait too much and there's nothing to publish. ● But PARC is still much closer to academia than I realized. It's research! It must become a product. Pretty soon. But it isn't one to begin with.
  • 13. Are we there yet? ● Start-up landscape is different: no offices, an open plan with individual desks and machines ● No book shelves, no work phones ● No four All Hands per year, one every week. ● Release of new code once a week, usually more ● Life moves fast...
  • 14. SearchLand: Cool Cuil! ● How did I get there? Anna Patterson and Tom Costello are friends of many years. How did they get there? ● They did a search start-up called Xift in 1999. Then Anna designed, wrote and sold Recall—the largest search engine in 2004 to Google. Also architect of Google’s TeraGoogle in early 2006. ● Tom worked in IBM on the prototype of WebFountain and on Storage Systems Strategy worldwide ● Then they decided to work together in Cuil
  • 15. The reasons for Cuil ● There are many search engines. But their results tend to be very similar. Are we seeing everything? ● Reports estimate we can see only 15% of the existing web. This is decreasing ● Probing the web is mostly popularity based. You're likely to see what others have seen before. But your seeing increases the popularity of what you saw, thereby reducing the pool of available stuff. ● Deep Web too?...
  • 16. The reasons for Cuil Much rubbish on the web. Some say all we don't see is web rot: web spam, porn, mindless duplication of non- content... Cuil says let's check it out, let's analyze contents of the pages. People want to find information important to them, even when it's not popular. [e.g. vanity search yields long lost brother]
  • 17. The reasons for Cuil ● Cost and natural resources ● Users don't pay directly for using search engines and their server farms ● But costs to the environment should be part of the equation ● Cuil can serve a bigger index using a small fraction of the number of machines ● Cheaper for the environment and for the company
  • 18. The reasons for Cuil ● Cuil doesn't need to know your search history and habits. ● So we don't. ● no names, no IP addresses, and no cookies ● Your search history is your business, not ours.
  • 19. The reasons for Cuil ● There is (too much) information on the web. ● Cuil 'organizes' the web so that you can find information that you didn't know you wanted..
  • 20. Organizing the web... ● Images can help. ● Longer snippets help. ● Tabs and categories show new stuff. ● Images can help. ● Definitions –easier then going to a dictionary ● Timelines -- show you the evolution of your concept ● Maplines – new connections ● Videos from Hulu, maps from Mapquest.
  • 21. Organization is fundamental ● Definitions –easier then going to a dictionary ● Timelines - show the evolution of your concept ● Maplines – new connections ● Videos from Hulu, maps from Mapquest.
  • 22. Adventures ● There are many. ● Talking about three: ● Launch! – And blogsphere... ● Timelines ● Languages
  • 23. Launching a product ● It's different from anything I had ever done before. ● Launched July 28th, less than three months from my start. ● Hoped for a “soft” launch in the middle of the summer.. ● Unbelievable “flood” of interest
  • 24. After the hype, the blogs... ● Hadn't realized how much the valley runs on blogs ● Didn't know about tech celebrities or valleywag... ● Had no idea how many people make a living doing SEO ● Unbelievable that people went to the trouble of “faking” bad results.
  • 25. Timelines ● Launched in March'09 ● Dynamic timelines, not pre- computed for a few subjects ● Project completed in less than six weeks ● Too many? Algorithm still needs improvement ● But a personal battle won...
  • 26. Multiple Languages ● Launched in May'09 ● Infra-structure in place, took less than a month to release ● Seven languages so far ● Evaluation hardly started ● But loads of offers to help ● All of this organization with a team of less than thirty...
  • 27. Opportunities... ● There are many. ● Quality evaluation ● Relevance improvement ● More services...
  • 28. More Opportunities... ● Three banes of my life: ● Spam, spam, spam ● (Economics of) malware ● Attacking pornography
  • 29. Summing up ● Life in Searchland is very different ● And lots of fun! ● As Patterson says in “Why Writing your Own Search Engine is Hard”, AM Q 2004, “[...] once the search bug gets you, you'll be back. The problem isn't getting any easier, and it needs all the experience anyone can muster.”
  • 30. And ever, as the story drained The wells of fancy dry, And faintly strove that weary one To put the subject by, “The rest next time--” “It is next time!” The happy voices cry. Lewis Carroll -- Proem Thank You!