1. Leveraging Publisher’s Search
Engines to Deliver Relevant
Results to Users
Presented by
Abe Lederman, President and CTO
Deep Web Technologies, LLC
28th Annual Scholarly Publishing Meeting – Virginia – June 9, 2006
2. Abe’s Background
• Earned B.S. and M.S. Computer Science degrees, MIT
• 18 years experience developing sophisticated
information retrieval applications
• Cofounded Verity, 1988
• Consulted to LANL, 1994-2000
• Deployed first “federated search” portal in the Federal
government, 1999
• Founded Deep Web Technologies (DWT), 2002
DWT is a New Mexico based company focused on providing
state-of-the-art software solutions which search, retrieve,
aggregate, and analyze content from web-based databases.
3. The Problem:
Searching a
large number of
sources can lead
to a flood of
results
4. Relevance
ranking
begins as
soon as the
user clicks
the Search
button
5. Ranking Recipe
INGREDIENTS
Source Selection
Query Language
Search Conductor
Ranking Algorithms
MIX WELL AND SERVE UP
RELEVANT RESULTS
7. Powerful Query Language
• Takes advantage of search capabilities of
each source
• Supports full Boolean operators where
possible
• Supports fielded search
• Translates natural language questions into
query syntax
8. Search Conductor
Select sources
to search
Perform Search
Enough
Get Next YES Deliver results
good
Results results? to user
NO
Can I get
YES more results
from “good”
sources?
NO
9. Challenges in Organizing and
Ranking Results
Multi-tier Relevance
Ranking
User-driven Ranking
Clustering of Results
10. Multi-tier Relevance Ranking
• QuickRank – Ranks results based
on occurrence of search terms in
title, author, and snippet
• MetaRank – Ranks results utilizing
custom algorithms applied to meta-
data
• DeepRank – Downloads and HEAVY LIFTING
indexes full-text documents REQUIRED!
11. User-driven Ranking
Credibility of source Geographic proximity
Date range Popularity of document
Document length Reading level
Document type Relevance
Desired: Blending (weighing) of above criteria
13. Attributes of Successful
Federated Search
• Powerful query language that takes
advantage of publisher search capabilities
• Source selection optimizer will reduce
unnecessary searches
• Search conductor gets more results from
sources bringing back good results
• A tool that highlights best search results
• Caching of search results
14. Advice for Publishers
• Use good search engines with good
relevance ranking
• Return 100 or more results at a time
• Return meta-data (author, journal, snippet)
as part of result list
• Provide access to your content through
XML Gateway or Web Services
• Speed up search time