SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Finding anything: Real-time
  search with IndexTank



         Tim Spence
        April 19, 2011
About the Presenter
Tim Spence
●   Senior Infrastructure Engineer at MedHelp
    ( http://www.medhelp.org/ )
●   Former .NET developer
●   Recently converted to Ruby
●   In love with Open Source Software
●   More at http://whyhello.im/tim
Agenda
●   State of search today
●   Quick survey: how much time/effort did
    YOU spend implementing search on your
    webapp?
●   Examples of services that need improved
    search
●   IndexTank to the rescue
●   Case study: reddit.com
Agenda, continued
●   How I found out about IndexTank
●   Two apps I built with IndexTank
●   Live Demo
The State of Search Today
●   Not well implemented at all
        –   Search works, but...
        –   Barely
●   How many pages of results do you typically
    browse through before finding what you
    were looking for?
●   Or do you give up and head for google site
    search instead?
Survey Time!
●   How much time/effort did YOU spend
    implementing search on your webapp?
●   How many times have you iterated on your
    search feature?
●   When was the last time someone thanked
    you for building a powerful, reliable search
    feature for your webapp?
My Opinion
●   Search as an in-app feature is an
    afterthought
●   Minimal implementation is the norm
●   If it wasn't for MySQL/MS-SQL full text
    indexing, most apps probably wouldn't
    even have a search feature
●   Most good web apps don't make it easy for
    users to find specific content outside of
    predetermined navigation
Let's pick on some apps!
●   These are companies with great products,
    but their search comes up short
●   Don't worry–they can take it!
App #1: Github
App #1: Github
App #1: Github
●   Interface is decent
        –   Search repos, code, users, or everything
        –   Search by language
●   However...
        –   Can't do much with results but browse
        –   Check out this example
App #1: Github
App #1: Github
●   Why these results aren't so hot
        –   Can't search by most recently maintained
        –   Can't search by most popular (most
             watched)
        –   Are you ready to browse 1,297 results?
●   Advanced search capabilities exist, but not
    the best interface
        –   recency/popularity implemented, but
              require specific arguments
App #2: Amazon Web Services
●   ”Hey, I bet I can find an AMI from the
    community for the exact EC2 setup I need”
●   Fact: probably not
App #2: Amazon Web Services
App #2: Amazon Web Services
●   Notice something missing?
       –   No search
       –   Only sort by date, title
●   Ready to browse 934 results?
       –   I'd rather build my own AMI
●   Incredible missed opportunity
       –   o/s search
       –   Stack search
       –   etc...
Fact: Github & Amazon aren't the
            only ones
●   Lots of good web services
●   Massive quantities of quality content
●   Unfortunately not discoverable in
    meaningful ways
Interlude: Sites with great search
●   Foodspotting
       –   Proximity
       –   Recency
       –   Rating
●   Medhelp
       –   Content category
       –   Promoted content
●   Other sites I overlooked? Whose search
    do you like?
What was the point of that last
               slide?
●   Search can be useful if it is valued as a
    feature
●   Any company willing to invest in the
    resources can build and host a high quality
    search engine
●   However, must you roll your own?
Enter Search as a Service
●   No need for you to invest in additional
    infrastructure
●   No need to reinvent the wheel
        –   Search is a solved problem
        –   Let the experts refine it
IndexTank to the rescue!
●   Hosted–no load on your infrastructure
●   Powerful
       –   We'll get into the details next
●   Always Improving
       –   Search IS their product
●   Freemium
●   Easy to implement
Let's talk features
●   Real-time search
       –   Real-time indexing–results immediately
            available
●   Custom scoring
●   Autocomplete
●   Faceting
●   Geo search
●   Advanced text search
●Real-time search
●   Real-time indexing
       –   results immediately available
●   Index multiple docs/sec
●   Overwrite existing docs as you wish
       –   Changes also immediately available
Custom Scoring
●   Implementer has full control over how
    results are returned
●   Choose which fields are searched
●   Use pre-written scoring functions
●   Or write your own
Custom Scoring
Everyone loves autocomplete
●   Saves users time
●   Potentially avoids spelling errors
        –   Not for hunters/peckers
●   Adds a degree of intelligence to the search
    process
Faceting
●   Does it make sense for you to categorize
    documents in your index?
       –   In all cases, YES
●   Consider your advanced users and the
    narrow results they seek
       –   Don't make anyone sift through irrelevant
            results
Faceting
Geo
●   It's 2011
        –   Location is more relevant than ever before
        –   Mobile is skyrocketing–every client has a
             GPS
●   IndexTank has built-in geo proximity
    search capability
Geo
Advanced Text Search (Beta)
●   Fuzzy search (Did you mean...?)
●   Stemming
        –   Alternate word forms (tense, possession,
              etc...)
●   Alternate spellings
        –   Misspellings
Other Benefits
●   Zero maintenance
●   Scalability included for free
●   Easy implementation
        –   Clients available in many languages
        –   Excellent documentation–Let's check it out
●   Excellent support
        –   Humans or bots? You decide
●   Dog food: their site search is done well
Case Study: reddit.com
●   High traffic news aggregator (> 1.0E9
    pvs/mo) with tons of content
●   Who remembers how bad reddit's search
    was?
        –   When it even worked
●   Can't blame them for trying
        –   Many attempts, but none worked
●   IndexTank excelled in all areas
●   Let's check it out now
My experience with IndexTank
●   Discovered through Heroku/IndexTank
    contest
●   Built my first irl Rails app in an
    afternoon/evening w/ fellow hacker Chris
    Saylor (@cwsaylor)
●   Didn't win the contest but learned how
    easy it is to quickly create highly targeted
    search
App #1: Toxosis
●   Searchable database of toxic release data
    supplied by U.S. E.P.A.
●   Hosted at http://toxosis.heroku.com/
●   Search enabled on many fields including
    city/state/zip, toxin
●   Additional fields can be added to index
        –   When I have time, of course...
More personal backstory
●   Still in the business of reinventing myself
    as a Rails developer
●   How to get a Rails gig? Develop an app
    multiple Rails apps and show it them off
●   Opportunities are everywhere–contests,
    hackathons, and weekend hacks for
    developer community
App #2: SXSWdex
●   Searchable database of 2011 SXSW
    attendees
●   Hosted at http://sxswdex.heroku.com/
●   Design goal: do a better job than SXSW
    official site
●   Search within bio, company, location,
    name
●   Facets: company, city/state
The moment we've all been
            waiting for
●   Let's build an app!
Questions?
●   Q&A time with an IndexTank engineer

Contenu connexe

Similaire à Indextank east bay ruby meetup slides

Building a Fast and Powerful Search App with Lucidworks Site Search - Andrew ...
Building a Fast and Powerful Search App with Lucidworks Site Search - Andrew ...Building a Fast and Powerful Search App with Lucidworks Site Search - Andrew ...
Building a Fast and Powerful Search App with Lucidworks Site Search - Andrew ...Lucidworks
 
Neil Perlin - We're Going Mobile! Great! Are We Ready?
Neil Perlin - We're Going Mobile! Great! Are We Ready?Neil Perlin - We're Going Mobile! Great! Are We Ready?
Neil Perlin - We're Going Mobile! Great! Are We Ready?LavaConConference
 
10 Digital Marketing Trends for 2017
10 Digital Marketing Trends for 201710 Digital Marketing Trends for 2017
10 Digital Marketing Trends for 2017DragonSearch
 
Building multi billion ( dollars, users, documents ) search engines on open ...
Building multi billion ( dollars, users, documents ) search engines  on open ...Building multi billion ( dollars, users, documents ) search engines  on open ...
Building multi billion ( dollars, users, documents ) search engines on open ...Andrei Lopatenko
 
Django on app engine
Django on app engineDjango on app engine
Django on app enginebenpotato
 
Path dependent-development (PyCon India)
Path dependent-development (PyCon India)Path dependent-development (PyCon India)
Path dependent-development (PyCon India)ncoghlan_dev
 
Challenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali LanguageChallenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali LanguageChandan Goopta
 
Performing Technical Keyword Research for a NEW Website
Performing Technical Keyword Research for a NEW WebsitePerforming Technical Keyword Research for a NEW Website
Performing Technical Keyword Research for a NEW WebsiteFrom The Future
 
Single Search For Your Phone - Presented at TRLN Annual Meeting 2014
Single Search For Your Phone - Presented at TRLN Annual Meeting 2014Single Search For Your Phone - Presented at TRLN Annual Meeting 2014
Single Search For Your Phone - Presented at TRLN Annual Meeting 2014Cory Lown
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...MongoDB
 
PARC Forum 2009: Adventures in SearchLand
PARC Forum 2009: Adventures in SearchLandPARC Forum 2009: Adventures in SearchLand
PARC Forum 2009: Adventures in SearchLandValeria de Paiva
 
Mozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineersMozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineersJohn Schneider
 
Gdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdfGdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdfSparshJhariya2
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsTomer Gabel
 
We’re Going Mobile! Great! Wait… What Does That Mean?
We’re Going Mobile! Great! Wait… What Does That Mean?We’re Going Mobile! Great! Wait… What Does That Mean?
We’re Going Mobile! Great! Wait… What Does That Mean?STC-Philadelphia Metro Chapter
 
Tech Thursdays: Building Products
Tech Thursdays: Building ProductsTech Thursdays: Building Products
Tech Thursdays: Building ProductsHayden Bleasel
 
[Spycob] Montenegro
[Spycob] Montenegro[Spycob] Montenegro
[Spycob] MontenegroRodion Mamin
 

Similaire à Indextank east bay ruby meetup slides (20)

Building a Fast and Powerful Search App with Lucidworks Site Search - Andrew ...
Building a Fast and Powerful Search App with Lucidworks Site Search - Andrew ...Building a Fast and Powerful Search App with Lucidworks Site Search - Andrew ...
Building a Fast and Powerful Search App with Lucidworks Site Search - Andrew ...
 
Neil Perlin - We're Going Mobile! Great! Are We Ready?
Neil Perlin - We're Going Mobile! Great! Are We Ready?Neil Perlin - We're Going Mobile! Great! Are We Ready?
Neil Perlin - We're Going Mobile! Great! Are We Ready?
 
10 Digital Marketing Trends for 2017
10 Digital Marketing Trends for 201710 Digital Marketing Trends for 2017
10 Digital Marketing Trends for 2017
 
Building multi billion ( dollars, users, documents ) search engines on open ...
Building multi billion ( dollars, users, documents ) search engines  on open ...Building multi billion ( dollars, users, documents ) search engines  on open ...
Building multi billion ( dollars, users, documents ) search engines on open ...
 
Django on app engine
Django on app engineDjango on app engine
Django on app engine
 
Path dependent-development (PyCon India)
Path dependent-development (PyCon India)Path dependent-development (PyCon India)
Path dependent-development (PyCon India)
 
Challenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali LanguageChallenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali Language
 
Usable Software Design
Usable Software DesignUsable Software Design
Usable Software Design
 
Performing Technical Keyword Research for a NEW Website
Performing Technical Keyword Research for a NEW WebsitePerforming Technical Keyword Research for a NEW Website
Performing Technical Keyword Research for a NEW Website
 
Single Search For Your Phone - Presented at TRLN Annual Meeting 2014
Single Search For Your Phone - Presented at TRLN Annual Meeting 2014Single Search For Your Phone - Presented at TRLN Annual Meeting 2014
Single Search For Your Phone - Presented at TRLN Annual Meeting 2014
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
 
PARC Forum 2009: Adventures in SearchLand
PARC Forum 2009: Adventures in SearchLandPARC Forum 2009: Adventures in SearchLand
PARC Forum 2009: Adventures in SearchLand
 
Mozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineersMozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineers
 
Requirements the Last Bottleneck
Requirements the Last BottleneckRequirements the Last Bottleneck
Requirements the Last Bottleneck
 
Gdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdfGdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdf
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
 
We’re Going Mobile! Great! Wait… What Does That Mean?
We’re Going Mobile! Great! Wait… What Does That Mean?We’re Going Mobile! Great! Wait… What Does That Mean?
We’re Going Mobile! Great! Wait… What Does That Mean?
 
Tech Thursdays: Building Products
Tech Thursdays: Building ProductsTech Thursdays: Building Products
Tech Thursdays: Building Products
 
2014 Picking a Platform by Anand Kulkarni
2014 Picking a Platform by Anand Kulkarni2014 Picking a Platform by Anand Kulkarni
2014 Picking a Platform by Anand Kulkarni
 
[Spycob] Montenegro
[Spycob] Montenegro[Spycob] Montenegro
[Spycob] Montenegro
 

Indextank east bay ruby meetup slides

  • 1. Finding anything: Real-time search with IndexTank Tim Spence April 19, 2011
  • 2. About the Presenter Tim Spence ● Senior Infrastructure Engineer at MedHelp ( http://www.medhelp.org/ ) ● Former .NET developer ● Recently converted to Ruby ● In love with Open Source Software ● More at http://whyhello.im/tim
  • 3. Agenda ● State of search today ● Quick survey: how much time/effort did YOU spend implementing search on your webapp? ● Examples of services that need improved search ● IndexTank to the rescue ● Case study: reddit.com
  • 4. Agenda, continued ● How I found out about IndexTank ● Two apps I built with IndexTank ● Live Demo
  • 5.
  • 6. The State of Search Today ● Not well implemented at all – Search works, but... – Barely ● How many pages of results do you typically browse through before finding what you were looking for? ● Or do you give up and head for google site search instead?
  • 7. Survey Time! ● How much time/effort did YOU spend implementing search on your webapp? ● How many times have you iterated on your search feature? ● When was the last time someone thanked you for building a powerful, reliable search feature for your webapp?
  • 8. My Opinion ● Search as an in-app feature is an afterthought ● Minimal implementation is the norm ● If it wasn't for MySQL/MS-SQL full text indexing, most apps probably wouldn't even have a search feature ● Most good web apps don't make it easy for users to find specific content outside of predetermined navigation
  • 9. Let's pick on some apps! ● These are companies with great products, but their search comes up short ● Don't worry–they can take it!
  • 12. App #1: Github ● Interface is decent – Search repos, code, users, or everything – Search by language ● However... – Can't do much with results but browse – Check out this example
  • 14. App #1: Github ● Why these results aren't so hot – Can't search by most recently maintained – Can't search by most popular (most watched) – Are you ready to browse 1,297 results? ● Advanced search capabilities exist, but not the best interface – recency/popularity implemented, but require specific arguments
  • 15. App #2: Amazon Web Services ● ”Hey, I bet I can find an AMI from the community for the exact EC2 setup I need” ● Fact: probably not
  • 16. App #2: Amazon Web Services
  • 17. App #2: Amazon Web Services ● Notice something missing? – No search – Only sort by date, title ● Ready to browse 934 results? – I'd rather build my own AMI ● Incredible missed opportunity – o/s search – Stack search – etc...
  • 18. Fact: Github & Amazon aren't the only ones ● Lots of good web services ● Massive quantities of quality content ● Unfortunately not discoverable in meaningful ways
  • 19. Interlude: Sites with great search ● Foodspotting – Proximity – Recency – Rating ● Medhelp – Content category – Promoted content ● Other sites I overlooked? Whose search do you like?
  • 20. What was the point of that last slide? ● Search can be useful if it is valued as a feature ● Any company willing to invest in the resources can build and host a high quality search engine ● However, must you roll your own?
  • 21. Enter Search as a Service ● No need for you to invest in additional infrastructure ● No need to reinvent the wheel – Search is a solved problem – Let the experts refine it
  • 22. IndexTank to the rescue! ● Hosted–no load on your infrastructure ● Powerful – We'll get into the details next ● Always Improving – Search IS their product ● Freemium ● Easy to implement
  • 23. Let's talk features ● Real-time search – Real-time indexing–results immediately available ● Custom scoring ● Autocomplete ● Faceting ● Geo search ● Advanced text search
  • 24. ●Real-time search ● Real-time indexing – results immediately available ● Index multiple docs/sec ● Overwrite existing docs as you wish – Changes also immediately available
  • 25. Custom Scoring ● Implementer has full control over how results are returned ● Choose which fields are searched ● Use pre-written scoring functions ● Or write your own
  • 27. Everyone loves autocomplete ● Saves users time ● Potentially avoids spelling errors – Not for hunters/peckers ● Adds a degree of intelligence to the search process
  • 28. Faceting ● Does it make sense for you to categorize documents in your index? – In all cases, YES ● Consider your advanced users and the narrow results they seek – Don't make anyone sift through irrelevant results
  • 30. Geo ● It's 2011 – Location is more relevant than ever before – Mobile is skyrocketing–every client has a GPS ● IndexTank has built-in geo proximity search capability
  • 31. Geo
  • 32. Advanced Text Search (Beta) ● Fuzzy search (Did you mean...?) ● Stemming – Alternate word forms (tense, possession, etc...) ● Alternate spellings – Misspellings
  • 33. Other Benefits ● Zero maintenance ● Scalability included for free ● Easy implementation – Clients available in many languages – Excellent documentation–Let's check it out ● Excellent support – Humans or bots? You decide ● Dog food: their site search is done well
  • 34.
  • 35. Case Study: reddit.com ● High traffic news aggregator (> 1.0E9 pvs/mo) with tons of content ● Who remembers how bad reddit's search was? – When it even worked ● Can't blame them for trying – Many attempts, but none worked ● IndexTank excelled in all areas ● Let's check it out now
  • 36. My experience with IndexTank ● Discovered through Heroku/IndexTank contest ● Built my first irl Rails app in an afternoon/evening w/ fellow hacker Chris Saylor (@cwsaylor) ● Didn't win the contest but learned how easy it is to quickly create highly targeted search
  • 37. App #1: Toxosis ● Searchable database of toxic release data supplied by U.S. E.P.A. ● Hosted at http://toxosis.heroku.com/ ● Search enabled on many fields including city/state/zip, toxin ● Additional fields can be added to index – When I have time, of course...
  • 38. More personal backstory ● Still in the business of reinventing myself as a Rails developer ● How to get a Rails gig? Develop an app multiple Rails apps and show it them off ● Opportunities are everywhere–contests, hackathons, and weekend hacks for developer community
  • 39. App #2: SXSWdex ● Searchable database of 2011 SXSW attendees ● Hosted at http://sxswdex.heroku.com/ ● Design goal: do a better job than SXSW official site ● Search within bio, company, location, name ● Facets: company, city/state
  • 40. The moment we've all been waiting for ● Let's build an app!
  • 41. Questions? ● Q&A time with an IndexTank engineer