This document provides an overview of IndexTank, a real-time search service. It discusses the current state of search implementation on many web applications, which is often lacking. IndexTank aims to improve search capabilities by offering search as a hosted service, allowing developers to focus on their applications instead of building their own search infrastructure. Key IndexTank features highlighted include real-time indexing and search, custom scoring, autocomplete, faceting, geo search, and advanced text search capabilities. The document also presents a case study of reddit.com improving its search with IndexTank.
2. About the Presenter
Tim Spence
● Senior Infrastructure Engineer at MedHelp
( http://www.medhelp.org/ )
● Former .NET developer
● Recently converted to Ruby
● In love with Open Source Software
● More at http://whyhello.im/tim
3. Agenda
● State of search today
● Quick survey: how much time/effort did
YOU spend implementing search on your
webapp?
● Examples of services that need improved
search
● IndexTank to the rescue
● Case study: reddit.com
4. Agenda, continued
● How I found out about IndexTank
● Two apps I built with IndexTank
● Live Demo
5.
6. The State of Search Today
● Not well implemented at all
– Search works, but...
– Barely
● How many pages of results do you typically
browse through before finding what you
were looking for?
● Or do you give up and head for google site
search instead?
7. Survey Time!
● How much time/effort did YOU spend
implementing search on your webapp?
● How many times have you iterated on your
search feature?
● When was the last time someone thanked
you for building a powerful, reliable search
feature for your webapp?
8. My Opinion
● Search as an in-app feature is an
afterthought
● Minimal implementation is the norm
● If it wasn't for MySQL/MS-SQL full text
indexing, most apps probably wouldn't
even have a search feature
● Most good web apps don't make it easy for
users to find specific content outside of
predetermined navigation
9. Let's pick on some apps!
● These are companies with great products,
but their search comes up short
● Don't worry–they can take it!
12. App #1: Github
● Interface is decent
– Search repos, code, users, or everything
– Search by language
● However...
– Can't do much with results but browse
– Check out this example
14. App #1: Github
● Why these results aren't so hot
– Can't search by most recently maintained
– Can't search by most popular (most
watched)
– Are you ready to browse 1,297 results?
● Advanced search capabilities exist, but not
the best interface
– recency/popularity implemented, but
require specific arguments
15. App #2: Amazon Web Services
● ”Hey, I bet I can find an AMI from the
community for the exact EC2 setup I need”
● Fact: probably not
17. App #2: Amazon Web Services
● Notice something missing?
– No search
– Only sort by date, title
● Ready to browse 934 results?
– I'd rather build my own AMI
● Incredible missed opportunity
– o/s search
– Stack search
– etc...
18. Fact: Github & Amazon aren't the
only ones
● Lots of good web services
● Massive quantities of quality content
● Unfortunately not discoverable in
meaningful ways
19. Interlude: Sites with great search
● Foodspotting
– Proximity
– Recency
– Rating
● Medhelp
– Content category
– Promoted content
● Other sites I overlooked? Whose search
do you like?
20. What was the point of that last
slide?
● Search can be useful if it is valued as a
feature
● Any company willing to invest in the
resources can build and host a high quality
search engine
● However, must you roll your own?
21. Enter Search as a Service
● No need for you to invest in additional
infrastructure
● No need to reinvent the wheel
– Search is a solved problem
– Let the experts refine it
22. IndexTank to the rescue!
● Hosted–no load on your infrastructure
● Powerful
– We'll get into the details next
● Always Improving
– Search IS their product
● Freemium
● Easy to implement
23. Let's talk features
● Real-time search
– Real-time indexing–results immediately
available
● Custom scoring
● Autocomplete
● Faceting
● Geo search
● Advanced text search
24. ●Real-time search
● Real-time indexing
– results immediately available
● Index multiple docs/sec
● Overwrite existing docs as you wish
– Changes also immediately available
25. Custom Scoring
● Implementer has full control over how
results are returned
● Choose which fields are searched
● Use pre-written scoring functions
● Or write your own
27. Everyone loves autocomplete
● Saves users time
● Potentially avoids spelling errors
– Not for hunters/peckers
● Adds a degree of intelligence to the search
process
28. Faceting
● Does it make sense for you to categorize
documents in your index?
– In all cases, YES
● Consider your advanced users and the
narrow results they seek
– Don't make anyone sift through irrelevant
results
30. Geo
● It's 2011
– Location is more relevant than ever before
– Mobile is skyrocketing–every client has a
GPS
● IndexTank has built-in geo proximity
search capability
32. Advanced Text Search (Beta)
● Fuzzy search (Did you mean...?)
● Stemming
– Alternate word forms (tense, possession,
etc...)
● Alternate spellings
– Misspellings
33. Other Benefits
● Zero maintenance
● Scalability included for free
● Easy implementation
– Clients available in many languages
– Excellent documentation–Let's check it out
● Excellent support
– Humans or bots? You decide
● Dog food: their site search is done well
34.
35. Case Study: reddit.com
● High traffic news aggregator (> 1.0E9
pvs/mo) with tons of content
● Who remembers how bad reddit's search
was?
– When it even worked
● Can't blame them for trying
– Many attempts, but none worked
● IndexTank excelled in all areas
● Let's check it out now
36. My experience with IndexTank
● Discovered through Heroku/IndexTank
contest
● Built my first irl Rails app in an
afternoon/evening w/ fellow hacker Chris
Saylor (@cwsaylor)
● Didn't win the contest but learned how
easy it is to quickly create highly targeted
search
37. App #1: Toxosis
● Searchable database of toxic release data
supplied by U.S. E.P.A.
● Hosted at http://toxosis.heroku.com/
● Search enabled on many fields including
city/state/zip, toxin
● Additional fields can be added to index
– When I have time, of course...
38. More personal backstory
● Still in the business of reinventing myself
as a Rails developer
● How to get a Rails gig? Develop an app
multiple Rails apps and show it them off
● Opportunities are everywhere–contests,
hackathons, and weekend hacks for
developer community
39. App #2: SXSWdex
● Searchable database of 2011 SXSW
attendees
● Hosted at http://sxswdex.heroku.com/
● Design goal: do a better job than SXSW
official site
● Search within bio, company, location,
name
● Facets: company, city/state