SlideShare une entreprise Scribd logo
1  sur  29
The Case for Lucene/Solr:
A Manager’s Guide
to Real World
Open Source
         pplications
Search Applications



By Lucid Imagination
Abstract
In today’s information-driven environment, search is a critical solution to problems when it slashes
the time and effort separating end users from the data they value. Search spans the range of
business models and use cases—from driving direct customer sales, to analytics and business
intelligence, employee productivity, and reduced administrative overhead. Making the best use of
search requires two perspectives: both a look at the business requirements for a search application
and a view to new business opportunities created by using search to leverage the organization’s
content resources.

Thousands of organizations across different sectors and business models have harnessed Apache
Lucene/Solr to search their rapidly growing and diversifying content resources. Underlying this
broad adoption is the extraordinary power, scalability, and versatility of open source search
technologies.

This paper provides an overview of both the requirements and the opportunities for search
applications. It then explores how real world organizations are successfully using Lucene/Solr
search applications to meet those opportunities, presenting how the technology is used for specific
business models and use cases across industries. In addition, it offers a baseline for setting search
requirements that managers and architects can use to adopt Lucene/Solr, and adapt this open
source search technology to the unique needs of their business.




© 2010, Lucid Imagination




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                 Page ii
Table of Contents
Introduction ............................................................................................................................................................ 1
Understanding Search Opportunities and Requirements ..................................................................... 2
        What Data and Documents Are You Searching? ............................................................................... 2
        Who Needs the Results and Why? ......................................................................................................... 3
        Where Is Search Integrated with IT Infrastructure? ...................................................................... 4
        How Is the Search Interface Presented to the User? ...................................................................... 5
The Real World: Applications and Case Studies........................................................................................ 7
   Yellow Pages, Local Search, and Searching Classifieds....................................................................... 8
   Media................................................................................................................................................................... 10
   E-commerce ..................................................................................................................................................... 12
   Job and Career Sites ...................................................................................................................................... 14
   Libraries, Archives, and Museums (LAMs) Search ............................................................................ 16
   Social Media Search ....................................................................................................................................... 18
   Enterprise (Intranet) Search ..................................................................................................................... 21
Business Use Case Matrix................................................................................................................................ 23
Appendix: Lucene/Solr Features and Benefits ....................................................................................... 24




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                                                                                   Page iii
Introduction
As fast as companies, communities, and consumers produce data—about each other, products,
opinions, research, and everything else imaginable—they need faster, more versatile search
capabilities to find the information they need to create opportunities for competitive advantage. In
today’s information-driven environment, search addresses the critical problems created by the
explosive growth of content by slashing the time and effort users expend in finding data they value.
Search spans the range of business models and use cases: from driving direct customer sales, to
analytics and business intelligence, employee productivity, and reduced administrative overhead.
Apache Lucene/Solr1 open source search technology has been implemented across the broadest
range of applications and business models—and likely in ways that can fit the needs of your
organization. In successful operation today at thousands of enterprises, Lucene/Solr technology
scales from tens of thousands to hundreds and billions of documents; searches data that is
structured, unstructured, and in combination; data inside and outside the firewall; and ranges in
use from a simple website search box through sophisticated faceted navigation. It addresses equally
diverse business processes and mission critical applications. Across the spectrum, Lucene/Solr
helps users find, make sense of, and act upon information quickly and efficiently.
In this white paper, we’ll review real-world case studies for Lucene/Solr functionality across
business sectors to demonstrate its versatility and varied applicability. The diversity of examples
provides strong evidence of Lucene/Solr’s flexibility and power as a search technology. The
examples also attest to the innovation and transparency inherent to the open source development
model. Our focus is on familiarizing the audience of business managers and application owners with
existing Lucene/Solr applications; the substantial technical advantages to developers are covered
elsewhere.
We’ll first survey the key requirements and business use cases of search and then look at where
they are built into search applications. Our objective is to provide business managers and
application owners with a broad perspective on how Lucene/Solr search technology is used to build
solutions to compelling business problems. In the Appendix, we provide an overview of
Lucene/Solr’s key features and benefits, with a basic outline of the capabilities offered to meet the
broadest range of business needs.




1
 Lucene and Solr are complementary technologies that offer very similar underlying capabilities; Solr is the Lucene
Search Server. Since Lucene serves as the core of Solr’s search capabilities, this paper refers to the two as
Lucene/Solr. For more information, see the Appendix.

The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                             Page 1
Understanding Search
Opportunities and Requirements
Search technology has come a long way from its roots in matching keywords with appearance in
documents and obtaining undifferentiated results. Search today empowers users by delivering
actionable information quickly and efficiently, across multiple, diverse sources of data. The
business use cases range from executing mission critical commercial transactions (e.g., e-commerce
sites) to unlocking employee and end-user productivity in the search for a single relevant document
(e.g., enterprise search).
Given the breadth of capability of the problem domain, it’s useful to look at search and ask two
fundamental questions: “How it can it solve my business problems?” and “What new business
opportunities can search solve for?”
In considering how search technology solves business problems, it is useful to start with an
elucidation of the requirements you’ll need to consider for your search application. At the same
time, be sure to look more broadly at the capabilities that Lucene/Solr offers, as it can help open up
new frontiers for incorporating search and leveraging more value from data repositories.
Starting with some basic questions—what, who, how, and where—you can clarify the high-level
business requirements specific to your business needs, which in turn allow you to make the best
decisions for your search application. The process of looking at the fundamentals also raises new
questions about how and where the search technology offered by Lucene and Solr can create new
business opportunities.
Let’s look at four fundamental questions you should address in understanding search opportunities
and requirements:
           •   What data and documents are you searching?
           •   Who needs the results and why?
           •   Where is search integrated with IT Infrastructure?
           •   How is the search interface presented to the user?


What Data and Documents Are You Searching?
Business today is driven more than ever by the end-users’ creation and consumption of real-time
information. A key differentiating capability of search technology is ingesting a broad range of
content types and processing large collections of diverse data in real time in order to deliver
actionable information. Two aspects to consider:




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                 Page 2
•    Types of Content
        Content comes in multiple formats: HTML pages, XML files, PDFs, images, PowerPoint
        presentations, Excel spreadsheets, Word documents, log files, multimedia content, and
        more. Content resides in various repositories, including databases, file servers, content
        management systems, archiving systems, collaboration applications, and employee
        desktops and laptops. Search technology must be able to locate, organize, and aggregate
        data whatever its form or location.
   •    Frequency of Updating Content
        Organizations update content at varying intervals, driven by differing business processes
        and models—social media or news applications have real-time content need, whereas an e-
        commerce application might re-index in response to new inventory on a batch basis and a
        research institution might add to its collection less often still. Search applications need to be
        adaptable to the differences in content change frequency.


Who Needs the Results and Why?
Business search puts a high priority on end user experience and results in which the searched
content is tuned to the unique needs of each user. Because, after all, the human dimension—the
usefulness of results and the efficacy of interaction—is the acid test of a search application. Internet
search applications like Google, Yahoo, and Bing are now common and mature. They have raised
user expectations about key qualities of the search experience...but they solve a very different
problem.
While Internet searches can produce millions of results in milliseconds, they rely on measures like
website popularity or URLs and domain names—not relevant and not generally applicable to
purpose-built applications for businesses. What’s more, they rely on generalizing relevancy for a
global population of all Internet users, without being tied to business rules, or business process
logic, or the opportunity cost of improved precision for a specific set of data or search users.
Business search applications cannot rely on such brute force coarse approaches to tune their
results. They need far more control and precision. They have to be able to deliver highly useful
results while matching, if not exceeding, the levels of user experience that people have come to
expect by virtue of their daily interactions with commercial search engines. Key points of
consideration from a business perspective are:




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                    Page 3
•    Relevance
       Relevance is entirely a factor of the goals of the search application’s users. The application
       must have the mechanisms to recognize the subjective needs of users and tune results
       accordingly. It must also provide easier ways to narrow search criteria without requiring
       users to come up with perfect query terms. Flexibility for drilling deeper will make results
       richer and valuable. Mechanisms to apply filters, proximity values, and sorting parameters
       to narrow search scope can also lead to a richer set of more useful results, with less time
       and effort.
  •    Cost of Relevance
       As business goals are driven by revenue opportunities and cost savings, it is critical to tie
       relevance to the economics of the business. For example, a public-facing retail site should
       focus on matching merchandise to search, site stickiness, and customer loyalty. It requires
       search technology that streamlines and simplifies the shopping experience with relevant
       results directly contributing to sales revenue. For knowledge workers, internal search
       applications should help make employees more productive by reducing the amount of time
       and effort to find documents they need to do their jobs. Multiple studies show that
       information workers can spend 20–30% of their time searching for information.
   •   Precision Ranking
       Result accuracy, sorted by attributes like relevance, date, field, or any document property
       feature, makes the search process better. End users generally abandon a search before
       tackling the fine points of Boolean logic or scrolling for a result buried too far down.
  •    Query Response Speed
       Today, 5–7 seconds is the typical threshold for end-user patience. Too much wait time for
       search results frustrates users, and causes them to abandon pages. Fast, relevant results
       cannot be limited by search technology hamstrung by data influx or query overload. Query
       response time should also work hand-in-hand with the refinement of multiple search
       attributes, so that increasingly complex queries do not extract a performance penalty.


Where Is Search Integrated with IT Infrastructure?
Useful, valuable search technology rarely exists in isolation. Searched data is transformed into
actionable information when it is integrated with the organization’s information infrastructure:
business process to business intelligence to content management systems. A robust search
technology must be customizable to integrate with the existing systems seamlessly.




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                 Page 4
•    Application Integration
         A key requirement for a search application is its extensibility for integration with existing
         infrastructure and applications like content management systems, databases, and the full
         range of business processes and applications. It should have interfaces that support
         ingestion of data as well as delivery of results in readily consumable formats—because in
         many cases, results are consumed by other applications, not a human.
    •    Scalability
         We can assume that data will change and grow. So scalability is a key factor for search
         application. Applications should grow to address future needs without penalties for the
         breadth of data or for the count of documents indexed. The search application should be
         able to grow with the requirements of the organization, without needing additional large
         investments in hardware to match the pace of growth. Proprietary search vendors often
         charge for search by the number of documents indexed. In a world where constantly
         expanding content growth is the norm, such costs can be a real and substantial drag on
         the cost of ownership for search applications, many times resulting in negative return.
    •    Security
         Every organization has its own security requirements and access controls. Search
         technologies need to comply with the security policies of the enterprise, controlling
         results that have restricted access. The search technology should also be able to make use
         of document-level security from other sources.


How Is the Search Interface Presented to the User?
The user interface is where search delivers on findability and presents actionable results. The
search application is only as good as the convenience of submitting queries, reviewing and refining
results, and finding information. Key aspects to consider:
   •    Navigation
        Users benefit from guidance that makes their queries more productive. Techniques such as
        faceted search with result clustering, advance hinting (“did you mean”), “more like this,”
        and drop down menus for setting search scope help users achieve desired results faster,
        making a search application both user- and information-friendly. It is also important to
        allow users to draw associative connections between results—using the technology to
        uncover relationships and discover more about what they were seeking than they knew at
        the outset.




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                 Page 5
The NetFlix search
                                                                          application is powered
                                                                          by Solr; it adds the fuzzy
                                                                          dimension to search,
                                                                          with auto-completion of
                                                                          movie names, correction
                                                                          of misspelled names of
                                                                          actors, and suggests
                                                                          titles closest to the
                                                                          query. As a result, 85%
                                                                          of users have found the
                                                                          movie they were looking
                                                                          for ranked at the #1 spot
                                                                          in the results.




   •   Discovery
       Search application functionality should extend beyond the generic presentation of a result
       list of documents that contain a keyword. Highlighting keywords in searched results,
       expanding searches with synonyms and spell checking, and offering users ways to learn a
       bit more about documents in the results without having to load the document are great
       ways to significantly improve usability.

   •   Intuitive Intelligence
       Search applications must go beyond keyword search to help users retrieve accurate
       information even when they are not sure of the best keywords. Additionally, they should
       reduce misinterpretations where homonyms, spelling errors, and ambiguous keywords are
       involved (e.g., is “apple” a fruit or a computer company?).




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                              Page 6
The Real World: Applications and Case Studies
With an understanding of the fundamentals of search business applications in hand, it is
helpful to gain additional context on business usage through a survey of organizations that
have successfully used Lucene/Solr for powerful search applications.
All of these cases were built on the capability of Lucene/Solr to provide innovative, high-
performance, cross-platform, feature-rich search technology suitable for nearly every
application. By powering diverse search applications for thousands of organizations such
as AT&T, Zappos, McClatchy, Smithsonian, MTV Networks, LinkedIn, MySpace, Comcast,
Monster, Netflix, and many more, Lucene/Solr has provided mission critical capability that
turns search into a robust competitive advantage.
For these organizations, Lucene/Solr solutions regularly index and search hundreds of
millions of documents with subsecond response time, unencumbered by costly licensing or
vendor lock-in. Together they represent a compelling argument for the broad applicability
of Lucene/Solr across the full range of business opportunities and search needs. Business
use case studies we’ll review include:
   •   Yellow Pages, Local Search, and Searching Classifieds
   •   Media
   •   E-commerce
   •   Job and Career Sites
   •   Libraries, Archives, and Museums (LAMs) Search
   •   Social Media Search
   •   Enterprise (Intranet) Search




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                       Page 7
Yellow Pages, Local Search, and Searching                                                    Requirements
Classifieds
In the business of online local search, geographic-based (location)                          •   Intelligent results going
relevance generates competitive advantage. Online directories                                    beyond keyword search
need to provide a rich, interactive search experience to users to                            •   Deeper, faceted
increase site views and stickiness, which in turn translates into                                navigation
increased advertising revenue. Simplified location-based search,                             •   Seamless integration
intuitive faceted query response, and data mashups are a few                                     with latest Web 2.0
features that define search functionality for an online directory.                               tools
Lucene/Solr solutions offer accurate search results, factoring in                            •   Lower IT-related costs
location, users’ reviews, and ratings, alongside paid advertising. By                        •   Geocentric user
taking advantage of Solr’s open source model—with search                                         experience
algorithms that are completely transparent—companies can invest                              •   Search numeric values
in configuring their search solutions to match their business logic,
rather than trying to infer or pay for exposure proprietary back-                            Solr Solution
end logic.
                                                                                             •   Customizable Search
                                                                                                 Index which can be
                               Internet Yellow pages and local                                   tuned transparently to
                               online search is forecast to                                      account for key
                                                                                                 findability drivers
                               grow to $27.8 billion in 2011.
                                                                                             •   Drop down filters for
                                                      The Kelsey         Report1                 narrowing or widening
                                                                                                 the scope of search
Success Stories                                                                              •   Seamless integration
     •    YP.com, a division of AT&T Interactive                                                 with existing
                                                                                                 technologies
     •    Zvents.com, local event search service
     •    Yelp.com, the community local search site                                          •   Native numeric
                                                                                                 encoding and search
                                                                                                 capabilities
                                                                                             •   Reduced server
                                                                                                 footprint for lower TCO
                                                                                                 than most commercial
                                                                                                 vendors
1The Kelsey Group’s Global Print Yellow Pages, Internet Yellow Pages and Local Search Five
Year Outlook




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                                     Page 8
Case Study 1

yp.com by AT&T Interactive
AT&T Interactive is an online and mobile search and advertising company. Their leading-edge portal, yp.com—an
online business listing and advertising site—was originally implemented with a commercial proprietary search
application. It faced issues of scalability, vendor lock-in, and performance. With help from Lucid Imagination, AT&T
successfully migrated to a Solr-based search solution that leveraged the flexibility of open source without
compromising features and functionality. And they did so with a much smaller budget.
Business Needs

    •   Addressing the need to factor in location to support geographic search, and include relevant comments
    •   Striking a balance between organic search and advertised content
    •   Indexing highly unstructured content such as user comments
    •   Increasing relevancy of results and boosting paid search results for preferential placement of advertisers
    •   Linguistic support to enable search experience, such as spellchecking, synonyms, find-similar, etc.
    •   Integrating with latest Web 2.0 tools
    •   Reducing server footprint

The Solr Solution

    •   Context-specific relevancy, geographic proximity, ad placement, and user comments
    •   Faceting, drop down filters to narrow/widen the scope of search
    •   Functional support for creating new features
    •   Spell-correction, and location-optimized search results to show users businesses nearest to them first
    •   Seamless integration with many Web 2.0 tools to create innovative features and mashups
    •   Lowers TCO by reducing the number of search servers from 120 to two dozen servers




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                        Page 9
Media
Brand reinforcement, premium content, and easy accessibility
are the main business motivators for online media and                Requirements
publishing companies. Relevant information improves time on          •   Real-time indexing of
the site and encourages users to explore related content,                petabytes of structured
boosting subscription rates and site views. These translate into a       and unstructured data
virtuous cycle of additional revenue generation.                     •   Deeper search capability
Given that content is the business, the need for a robust search     •   Improved query
application ties directly to competitive advantage.                      response time
Lucene/Solr provides a customized, function rich solution for the
                                                                     •   Reduced infrastructure
                                                                         and customization costs
media and publishing industry. It addresses dynamic challenges
of content diversity, content freshness, and content acquisition ,
                                                                     Solr Solution
and gives companies a platform on which to build a world-class
innovative search experience to differentiate themselves in a
                                                                     • Reverse indexing
highly competitive marketplace.                                      • Intelligent, faceted search
                                                                         to enable contextual and
                                                                         linguistic relevance
                     “Solr has done wonders for us.                  • Easy configuration for
                     It is easy to understand and                        parsing structured and
                                                                         unstructured data
                     deploy, and has reduced our
                                                                     • Easy and seamless
                     costs drastically.”                                 installation for lower
                                     Doug Steigerwald,                   TCO
                                                                     • Customization with open
                                   McClatchy Interactive                 source code


Success Stories
   •   McClatchy Newspapers
   •   Netflix
   •   Comcast Interactive
   •   MTV Networks, a division of Viacom
   •   The Motley Fool, fool.com
   •   Fanfeedr.com, personalized sports aggregator




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                             Page 10
Case Study 2
    McClatchy—Leading Newspaper Publisher
    The third largest newspaper publisher in the United States, McClatchy Company owns 30 daily
    newspapers in 29 markets across the country. To win online, McClatchy knew it had to have a robust
    search solution, to empower the McClatchy audience with the information they wanted and secure
    loyalty from readers and sponsorships from advertisers. Working with Lucid Imagination, McClatchy
    migrated from proprietary search software to open source and chose Solr for its high performance,
    comprehensive capabilities, and superior value
    Requirements
       • Proliferating content and data sources (text, videos, audios, images), with real-time
           streaming
       • Empowering end users with ease of use
       • Supporting peak traffic and popular search spikes with consistent performance
       • Providing scalability for a database growing by orders of magnitude annually
       • Providing flexibility to support customization
       • Controlling IT costs while exceeding performance benchmarks of competition

    The Lucene/Solr Solution
       • Deeper content by indexing both structured and unstructured data in real time, effortlessly
       • Indexes millions of documents, with search results delivered in milliseconds
       • User-friendly navigation with drop down filters, faceted navigation, linguistic corrections,
           etc.
       • Excellent performance, even in peak hours, by load-balancing search requests across servers
       • Scalability without impact on performance
       • High degree of customization, since it’s open source
       • Integration with existing IT infrastructure and eliminates associated license fees to cut costs
       • 8-fold reduction in server footprint




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                     Page 11
E-commerce
E-commerce businesses must provide a compelling shopping experience                            Requirements
in order to maintain brand equity and thrive in a very highly competitive                      •   Multidimensional,
market landscape. By reducing the time and effort required to navigate                             dynamic search
available merchandise and find what they want, superior search                                 •   Faster results
contributes directly to a satisfying buying experience for customers.
                                                                                               •   Real-time indexing
Search then translates directly into higher revenues and customer
                                                                                                   of products
loyalty. Instant results, intuitively organized, advanced faceting for easy
browsing, synchronizing results with images, and integration with user
                                                                                               •   Faceting and
                                                                                                   browsing
ratings are among the must have features of an e-commerce search
application.                                                                                       capabilities
                                                                                               •   Seamless
Lucene/Solr gives companies the ability to build their sites around the                            integration with
concept of “searchendizing”—putting the desired merchandise at the top                             existing IT
of the results list—which can make the difference between sales made                               infrastructure
and sales lost. Faceting, database integration, real-time indexing, and
query monitoring all enable users to find products they want, driving
conversion rates and enabling a winning online experience. 2                                   Solr Solution
                                                                                               •   Faceted search for
                                              Online retail sales in the                           deeper drill down
                                                                                                   and browsing
                                              B2C market are expected                          •   Intuitive search
Success Stories
                                              to reach $340 billion by                             capabilities for
                                              201321                                               cross-channel
    •    Buy.com                                                                                   shopping
    •    Sears.com                                                                                 experience
    •    Macys.com                                     Forrester Research                      •   System
    •    Zappos.com                                                                                administration tools
    •    Advanceautoparts.com                                                                      for data loading,
    •    Dollardays.com                                                                            index replication,
                                                                                                   monitoring, logging,
                                                                                                   and cache
                                                                                                   management
                                                                                               •   Query monitoring
                                                                                                   for better
                                                                                                   highlighting of
2“Consumers will spend more than $340 billion online by 2013, says Forrester,”                     popular products
Internet Retailer, 27 November 2009, http://www.internetretailer.com/dailyNews.asp?id=32630.



The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                                 Page 12
Case Study 3
 Zappos
 Zappos is the premier destination for online shoe shopping. At Zappos, the mission is excellent online customer
 service—customers should be able to browse shoe styles, sizes, shapes, and colors more easily than any other shoe
 store, on or offline. To achieve this, Zappos wanted a robust, flexible, multifunctional search solution/application.
 After evaluating many commercial search technologies, Zappos zeroed in on Solr, working with Lucid Imagination to
 ensure continued, successful deployment.
 Requirements
     •   Simplified, attractive user experience that makes it easy to find and buy
     •   Relevant results, fast
     •   Navigation across attributes, such as size, color, and style for broader and deeper results
     •   Indexing products as they were entered in the catalogs
     •   Cross-functional navigation to give customers a realistic shopping experience
     •   Intuitive intelligence to provide alternate suggestions
     •   Analytical capabilities to drive business strategy
     •   Facilitating control on results
     •   Integration with existing IT infrastructure

 The Solr Solution
     •   Search results in subseconds, across categories
     •   Faceting, for easy browsing and discovery and a compelling user experience
     •   Real-time indexing of products
     •   Synchronization of visuals, specs, filters, and promotions to make shopping experience true to life
     •   Information on user activity to help build strategy on product promotions
     •   Controls to rank popular or high-stock products in results where users are more likely to buy them
     •   Facilitates integration with heterogeneous open source environment




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                         Page 13
Job and Career Sites

Job portals are countercyclical to the economy. When the economy             Requirements
flourishes, posted jobs grow in number; when it sags, candidates flock in
to post their resumes. Success for an online job portal is tied to the       •   Linguistic
efficiency of its search capability—matching résumés to job listings and         intelligence for
vice versa—so both employers and prospective employees can zero in on            more relevant
just the right opportunity.                                                      results
                                                                             •   Control search
For example, an employer may want to navigate through filters to
                                                                                 results to maintain
narrow the scope of a candidate search, such as education, previous
                                                                                 privacy
employer, salary history, skillsets, etc.; a job seeker may want to expose
these attributes, but keep a current employer’s name confidential. A job-    •   Deeper search
seeker may want to apply to jobs within a particular geographic area.            capability
                                                                             •   Numeric search
Lucene/Solr not only provides such flexibility but also addresses other      •   Faster query
complexities of this industry by enabling linguistic intelligence (such as       response
identical acronyms that correspond to different entities; variations in
                                                                             •   Reduced
spelling, imperfectly constructed search queries); indexing unstructured
                                                                                 infrastructure and
data (résumés); and managing ever-growing data.
                                                                                 customization costs

                                                                             Solr Solution
                           “I think the breakthrough was                     • Intelligent, faceted
                           when we tried it, and we                              search to enable
                           realized, wow, this thing could                       contextual and
                                                                                 linguistic relevance
                           really scale.”
                                                                             • Easy configuration
                                                                                 for parsing
                                     Peter Keegan, Monster.com                   structured and
Success Stories                                                                  unstructured data
   •   Monster
                                                                             • Easy and seamless
                                                                                 installation for
   •   The Big Jobs
                                                                                 lower TCO
   •   eBharatJobs
   •   Careerjet
                                                                             • Business process
                                                                                 integration and
                                                                                 Customization with
                                                                                 open source code

The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                             Page 14
Case Study 4
 Monster.com
 Monster is the largest job search engine in the world, with over a million jobs posted at any one time. By 2008 it had
 150 million résumés in its database, serving over 63 million job seekers per month, now running on average 300 to
 400 queries per second with an average response time of 40 milliseconds. To provide the highest level of service
 and support to their customers—both employers and job seekers—Monster has an unmatched marketplace for
 employment opportunities, with Lucene-based search at the heart of its business model.

 The Requirements
     •   Managing high volumes of data, continually increasing by double digit percentages annually
     •   Maintaining constant inventory updates and providing faster results
     •   Removing technological barriers that limit the scope of information
     •   Enabling end users to refine search and drill deeper without any performance impact
     •   Providing security controls to ensure end user privacy
     •   Facilitating scalability and flexibility in tandem with company’s vision and growth plans

 The Lucene Solution
     •   High volumes of data by clustering data to reduce the index size
     •   Real-time indexing for fresher, faster query results
     •   Intuitive search to enable in-depth cross-functional job and résumé browsing
     •   Faceted search and ‘single click’ filters for search refinement
     •   Security controls to manage user information
     •   Unlimited scalability and customization leveraging open source licensing



The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                       Page 15
Libraries, Archives, and Museums (LAMs) Search
The core asset of educational and research institutions is knowledge          Requirements
archived and accumulated over decades. In the world of academic search,
the diversity of information for any query—text, illustration, audio/video    •   Management of
media, or data in any other format—makes unstructured formats a key               multiple formats of
aspect of the searchable archive.                                                 data and documents
                                                                              •   Customization and
Lucene/Solr gives academic and research institutions the power to turn
                                                                                  scalability
information into knowledge by going beyond keyword-driven search to
                                                                              •   Linguistic support in
expose a rich variety of results and exploration. Based on the open source
                                                                                  queries
model, it not only integrates with the existing IT infrastructure but also
                                                                              •   Faster results
leverages the existing classification hierarchies to give structure to
terabytes of information spread across disparate collections, significantly
reducing overhead and enabling flexible and scalable deployment.              Solr Solution
                                                                              •   Optimized index
                      “With Solr, you can do so many things                       infrastructure limits
                                                                                  size without
                      without writing a lick of code. I hadn't                    compromising speed
                      realized how easy it is to extend our                       or flexibility
                      custom request handler, response                        •   Easy customization
                      writer, and update handler. Just move it                    for implementing
                                                                                  taxonomy rules
                      all to Solr and let it do the heavy                     •   Faceted search to
                      lifting.”                                                   narrow results to a
                                                                                  specific source across
                                      Sjored Siebinga, Europeana                  diverse sets of data
Success Stories                                                               •   Instant results
                                                                              •   Seamless integration
   •   Smithsonian Institute                                                      with IT
   •   Europeana, the European Union online cultural archive                      infrastructure for
   •   The US Library of Congress and World Digital Library                       lower TCO
   •   Stanford University Library
   •   University of Michigan Graduate Library




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                Page 16
Case Study 5
  Smithsonian
  The Smithsonian Institution is the flagship museum collection of the United States, supporting a research institute
  that provides “one-stop” searching for 2 million records, including nearly a quarter of a million media files (images,
  media files, online journals, and other resources) distributed across dozens of archives, databases, museums, and
  libraries. To make this treasure of information easily accessible to people, the Smithsonian needed an efficient
  search solution that could overcome the following challenges:
  The Challenges
      •   Managing a complicated taxonomy that could no longer accommodate a growing data index
      •   Indexing disparate types of content, including documents, videos, and images
      •   Making information available from a large database
      •   Providing access controls to restrict information
      •   Integrating with existing legacy tools

  Smithsonian chose Lucene/Solr, and worked with Lucid Imagination to create an optimized, well-designed solution.
  The Solr Solution
      •   Efficient index strategy to manage a mix of structured and unstructured data
      •   Holistic search, by optimizing configuration to reduce the number of servers and better handling query
          requests
      •   Filtering information through faceted search
      •   Access controls to restrict information based on membership profiles
      •   Integration with the existing IT infrastructure
      •   Provides guidance and assistance on setting replicated search environment




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                         Page 17
Social Media Search                                                     Requirements
Search solutions must support differentiated business models
matching Web 2.0 innovations, including user-generated content
                                                                        •   Deliver search results
and mashups, without compromising scalability—a challenge,
given the virtually limitless content on the Internet. Success and          as soon as content is
differentiation is measured by how well the site provides relevant          available
results to grow its user base and keeps them engaged.                   •   Deeper drill down
Increasingly, the technological factors driving Web 2.0 application         capabilities
paradigms are finding their way into the enterprise, unlocking          •   Intuitive interface
collaboration and productivity in new ways that challenge
conventional organizational bounds—and that rely in equal
measure on search to create the connections between employees           Lucene/Solr Solution
to enable discovery, cross-pollination, and more efficient collective
effort.                                                                 •   Near-instant results
                                                                            with segmentable
Lucene/Solr not only provides fast results but also facilitates
flexible, intuitive navigation to help end users connect with others.       indexing
It boosts the reach and performance of search, while cutting            •   Intuitive search
implementation costs and lowering barriers to innovation.               •   Data-driven
                                                                            spellchecking based
                                                                            on user search
Success Stories              “With Solr, we really treat it
                                                                            histories
   •   Digg                  as kind of a platform where                    Linguistic support
   •   Myspace               we can build other kind of
   •   LinkedIn                                                             through ‘Did you
   •   Reddit                things on top of it… We have                   mean" functionality
   •   Technorati            a very valuable set of data,                   Highlighting keywords
   •   Scout Labs            and we really want to                      •   Deeper drill down
   •   Xmarks.com
                             explore new ways of                            with faceting
                             building new features from                 •   Real-time content
                             that data set.”                                updating

                                     —Sammy Yu, Digg.com




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                               Page 18
Case Study 6

 Digg.com
 Digg displays the wisdom of the crowds. By leveraging the mass collaboration of readers distributed across the
 Internet—everything on Digg is submitted by the public community for the public community—it builds on the easy
                                                                                   community
 findability of information valued by the marketplace of readers and consumers.
 Digg realized early on that to succeed in the business of information, they need to make information available to
                                                                             needed
 their audience as effortlessly as possible. They saw the following challenges as roadblocks for implementing a base
 search application:
 Requirements
     •   Managing unstructured data (13 million documents and growing) in real time
     •   Providing results faster
     •   Facilitating smart navigation to provide information in digestible portions
     •   Recognizing and eliminating duplicate content
     •   Providing semantic and linguistic smart application
     •   Facilitating scalability while retaining costs

 Digg selected Solr for its unmatched flexibility and functionality.
 The Solr Solution
     •   Highly customizable and flexible
     •   Results in subseconds, with simple-to-use pull downs to refine results
                       seconds,      simple
     •   Fuzzy duplicate detection (by coding)
          uzzy
     •   Unlimited scalability and seamless integration with the heterogeneous environment
          nlimited




The Case for Lucene/Solr: Real World Search Applications
                             eal
A Lucid Imagination White Paper • January 2010                                                       Page 19
Case Study 7
 LinkedIn
 Connecting 50 million registered users from 200 countries across 170 industries and matching them to
 the right professional contacts is what LinkedIn is all about. LinkedIn’s business is premised on
                                                                        ’s
 intelligent search application that could overcome the following:
 The Challenges
     •  Managing an ever-growing database, with one new member joining and creating a profile every
                          growing data
        second
    • Indexing unstructured data in real time
    • Giving instant query responses, even in peak traffic hours
    • Providing intuitive navigation and intelligent linguistic support
    • Integrating with other Web 2.0 tools to build user profiles that integrate data from multiple
        sources
 They chose Lucene to implement the search function at the core of their business model.
                                                                                  model
 The Lucene Solution
     •   Used index segmentation for faster results and to limit index base
     •   Provided faceted search and intelligence support features like changing the view of search
         results and auto-complet
                           completion of contacts
     •   Calculated relative relevance, ranking results on the fly based on relationship between the user’s
         profile and the other profiles being searched
     •   Integrated with the latest web tools for example, incorporating videos in search results
                                         tools;
     •   Provided "scale as you grow” facility through the flexibility of the open source model
                    scale        grow




The Case for Lucene/Solr: Real World Search Applications
                             eal
A Lucid Imagination White Paper • January 2010                                                       Page 20
Enterprise (Intranet) Search
Enterprises today have a global footprint, which leads to the creation of       Requirements
multiple content types and the use of disparate applications and content
management systems across business centers. The result is often silos of        •   Single interface to
unmanaged data spread across the intranet of an enterprise—a situation              access enterprise
where information is omnipresent but cannot be used.                                data
To achieve a competitive advantage, enable intelligent decisionmaking,          •   Faster results
eliminate duplication of work, and lower the cost of ownership,                 •   Control over search
enterprises need a search application that gives structure to                       results
unstructured data; provides a single gateway to search across multiple          •   Ready integration
enterprise repositories, with speed, flexibility, and intuitive intelligence.       with existing
                                                                                    content
Lucene/Solr is a solid match for enterprise search. As a customizable and
                                                                                    management
multifunctional search application, Lucene/Solr provides robust search
                                                                                    software
features at minimal cost. The open source development model behind
Lucene/Solr integrates seamlessly with legacy tools, and brings down
                                                                                Solr Solution
the total cost of ownership significantly.
Given the sensitive nature of enterprise content, Lucene/Solr facilitates
                                                                                •   Single gateway for
                                                                                    all types of data
document-level, role-based security. And with the transparent search
algorithms and configurability for relevancy, Lucene/Solr enables               •   Dynamic boosting
intranet search with the precise control enterprise content owners                  of content
require, ensuring that results consistently deliver the right documents to      •   Transparent search
the right people.                                                                   algorithms and
                                                                                    relevancy tuning
                                                                                •   Customization and
                        “The search and discovery                                   easy integration
                        software market grew 19                                     with open source
                        percent in 2008 to $2.1 billion”                            code

                                             Sue Feldman, IDC




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                  Page 21
Case Study 8
 Food and Drug Administration
 The Food and Drug Administration (FDA) is a U.S. government agency responsible for regulating
 and supervising the safety of foods medications, veterinary products, tobacco, and cosmetics. The
 FDA has a large repository of information that dates back multiple decades, and exists in formats
 ranging from early optical character recognition to recent electronic formats. To mine this
 knowledge base, the FDA is developing a semantic mining framework using open source tools such
 as Apache Lucene and Solr.
 Requirements
     •    Integrating petabytes of data highly distributed across the intranet of an enterprise
     •    Managing multiple indices for documents stored in distributed repositories
     •    Managing and maintaining archival data and evolving vocabularies
     •    Indexing unstructured data in real time
     •    Recognizing and eliminating duplicate content
     •    Handling concurrent queries and delivering fast and relevant results
     •    Restricting search results according to agency access control policies
     •    Integrating with existing infrastructure without additional overhead

 The Lucene Solution
      •    A single gateway to search across multiple enterprise repositories
      •    Duplicate detection
      •    Fast and relevant results with content analysis and query interpretation algorithms
      •    Filters results based on access controls and security policies of an enterprise
      •    Facilitates integration with existing enterprise infrastructure to reduce TCO



The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                       Page 22
Business Use Case Matrix
To simplify mapping your search needs to existing search applications in the real world, the matrix
below compares business use cases against key search requirements. While not an exhaustive list,
the matrix highlights the different business use cases across sectors and business models, reflecting
the adaptability of Lucene/Solr across the various domains of search applications and use cases.


                                         Users                     Content            Content Update Frequency
                                                                                                                      Access
         Verticals                           Customer                                                                 Control
                              Internal                  Original       Aggregated   High       Medium        Low
                                              Facing

Enterprise (Intranet)            √                         √                                      √                        √

               Schools/
                                 √               √         √                 √                    √                        √
               Universities
Education
               Libraries         √               √                           √                    √                        √


Job Portals                                      √         √                 √       √


Social Networks                                  √         √                 √       √                                     √


              News                               √         √                 √       √
Media
              Media                              √         √                 √       √


E-Commerce Sites                                 √         √                 √       √            √                        √


Financial Services               √               √         √                         √                                     √


Yellow Pages                                     √                           √                    √


Horizontal Portals                               √         √                 √       √




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                                   Page 23
Appendix: Lucene/Solr Features and Benefits
Lucene and Solr are complementary technologies that offer very similar underlying capabilities. In
choosing a search solution that is best suited for your requirements, key factors to consider are
application scope, development environment, and software development preferences.
Lucene is a Java technology-based search library that offers speed, relevancy ranking, complete
query capabilities, portability, scalability, and low overhead indexes and rapid incremental
indexing.
Solr is the Lucene Search Server. It presents a web service layer built atop Lucene using the Lucene
search library and extending it to provide application users with a ready-to-use search platform.
Solr brings with it operational and administrative capabilities like web services, faceting,
configurable schema, caching, replication, and administrative tools for configuration, data loading,
statistics, logging, cache management, and more.
Lucene presents a collection of directly callable Java libraries and requires coding and solid
information retrieval experience. Solr extends the capabilities of Lucene to provide an enterprise-
ready search platform, eliminating the need for extensive programming.
Solr provides the starting point for most developers who are building a Lucene-based search
application. It comes ready to run in a servlet container such as Tomcat or Jetty, making it ready to
scale in a production Java environment.
With convenient ReST-like/web-service interfaces callable over HTTP, and transparent XML-based
configuration files, Solr can greatly accelerate application development and maintenance. In fact,
Lucene programmers have often reported that they find Solr contains “the same features I was
going to build myself as a framework for Lucene, but already very well implemented.” Using Solr,
enterprises can customize the search application according to their requirements, without
involving the cost and risk of writing the code from the scratch.
Lucene provides greater control of your source code and works best in development environments
where resources need to be controlled exclusively by Java API calls. It works best when
constructing and embedding a state-of-the-art search engine, allowing programmers to assemble
and compile inside a native Java application. While working with Lucene, programmers can directly
control the large set of sophisticated features with low-level access, data, or state manipulation.
Enterprises that do not require strict control of low-level Java libraries generally prefer Solr, as it
provides ease of use and scalable search power out of the box.




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                   Page 24
As functional siblings, Lucene and Solr have become popular alternatives for search applications;
the two differ mainly in the style of application development used. Key benefits of search with
Lucene/Solr include:


   •   Search Quality: Speed, Relevance, and Precision Lucene/Solr provides near-real-time
       search and strong relevance ranking to deliver contextually relevant and accurate results
       very quickly. Tailor-made coding for relevancy ranking and sophisticated search
       capabilities like faceted search help users in sorting, organizing, classifying, and structuring
       retrieved information to ensure that search delivers desired results. Search with
       Lucene/Solr also provides proximity operators, wildcards, fielded searching,
       term/field/document weights, find-similar functions, spell checking, multilingual search,
       and much more.
   •   Lower Cost and Greater Flexibility, Plug and Play Architecture Lucene/Solr reduces
       recurring and nonrecurring costs, lowering your TCO. As open source software, it does not
       require purchase of a license and is freely available for use. The open source code can be
       used as is, modified, customized, and updated as appropriate to your needs. Solr is easily
       embedded in your enterprise’s existing infrastructure, reducing costs of installation,
       configuration, and management.
   •   Open Source Platform for Portability and Easy Deployment Because Lucene/Solr is an
       open-source software solution, it is based on open standards and community-driven
       development processes. It is highly portable and can run on any platform that supports Java.
       For instance, you can build an index on Linux and copy it to a Microsoft Windows machine
       and search there. This unsurpassed portability enables you to keep your search application
       and your company’s evolving infrastructure in tandem. Lucene, in turn, has been
       implemented in other environments, including C#, C, Python, and PHP. At deployment time,
       Solr offers very flexible options; it can be easily deployed on a single server as well as on
       distributed, multiserver systems.
   •   Largest Installed Base of Applications, Increasing Customer Base Lucene/Solr is the
       most widely used open source search system and is installed in around 4,000 organizations
       worldwide. Publicly visible search sites that use Lucene/Solr include CNET, LinkedIn,
       Monster, Digg, Zappos, MySpace, Netflix, and Wikipedia. Lucene/Solr is also in use at Apple,
       HP, IBM, Iron Mountain, and Los Alamos National Laboratories.




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                                 Page 25
•   Large Developer Base and Adaptability As community developed software, Lucene/Solr
       provides transparent development and easy access to updates and releases. Developers can
       work with open source code and customize the software according to business-specific
       needs and objectives. Its open source paradigm lets Lucene/Solr provide developers with
       the freedom and flexibility to evolve the software with changing requirements, liberating
       them from the constraints of commercial vendors.
   •   Commercial-Grade Support for Mission Critical Search Applications from Lucid
       Imagination Lucid Imagination provides the expertise, resources, and services that are
       needed to help enterprises deploy and develop Lucene-based search solutions efficiently
       and cost-effectively. Lucid helps enterprises achieve optimal search performance and
       accuracy with its broad range of expertise, which includes indexing and metadata
       management, content analysis, business rule application, and natural language processing.
       Lucid Imagination also offers certified distributions of Lucene and Solr, commercial-grade
       SLA-based support, training, high-level consulting and value-added software extensions to
       enable customers to create powerful and successful search applications.




The Case for Lucene/Solr: Real World Search Applications
A Lucid Imagination White Paper • January 2010                                            Page 26

Contenu connexe

Tendances

State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solrguest432cd6
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and SparkLucidworks
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionLucidworks
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchCloudera, Inc.
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsLucidworks
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...lucenerevolution
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...lucenerevolution
 
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksIntroduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksLucidworks
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big featuresDavid Smiley
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 

Tendances (18)

State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data Search
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksIntroduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 

En vedette

A Study on High Speed Rails in India
A Study on High Speed Rails in IndiaA Study on High Speed Rails in India
A Study on High Speed Rails in IndiaSelladurai Muthusamy
 
"Unidos Por Argentina" Documento del Encuentro del PJ de la Provincia de Buen...
"Unidos Por Argentina" Documento del Encuentro del PJ de la Provincia de Buen..."Unidos Por Argentina" Documento del Encuentro del PJ de la Provincia de Buen...
"Unidos Por Argentina" Documento del Encuentro del PJ de la Provincia de Buen...Brest Fabian Dario
 
The OK! technology - Exposé v3.26 20170208 (LinkedIn)
The OK! technology - Exposé v3.26 20170208 (LinkedIn)The OK! technology - Exposé v3.26 20170208 (LinkedIn)
The OK! technology - Exposé v3.26 20170208 (LinkedIn)Manuel Mejías
 
Corporacion universitaria remigton
Corporacion universitaria remigtonCorporacion universitaria remigton
Corporacion universitaria remigtonjairito0922
 
Burke: Learning and Growing through Marketing Research
Burke: Learning and Growing through Marketing ResearchBurke: Learning and Growing through Marketing Research
Burke: Learning and Growing through Marketing ResearchAsif Mahmood Abbas
 
Collserola júlia, arnau i marc
Collserola júlia, arnau i marcCollserola júlia, arnau i marc
Collserola júlia, arnau i marcmanelpujad
 
Launching Your Brand in Social Media
Launching Your Brand in Social MediaLaunching Your Brand in Social Media
Launching Your Brand in Social MediaVictoria Edwards
 
Audience feedback analysis
Audience feedback analysisAudience feedback analysis
Audience feedback analysislauren molyneaux
 
Fall 2016 ats summit - Parent & Origin Selection
Fall 2016 ats summit  - Parent & Origin SelectionFall 2016 ats summit  - Parent & Origin Selection
Fall 2016 ats summit - Parent & Origin SelectionThomas Jackson
 
Tema2 bbdd
Tema2 bbddTema2 bbdd
Tema2 bbddTESCO
 
Modelo de Predicción de Riesgos Penales
Modelo de Predicción de Riesgos PenalesModelo de Predicción de Riesgos Penales
Modelo de Predicción de Riesgos Penalesescudolegal juridico
 

En vedette (16)

Prezi tic's
Prezi tic'sPrezi tic's
Prezi tic's
 
A Study on High Speed Rails in India
A Study on High Speed Rails in IndiaA Study on High Speed Rails in India
A Study on High Speed Rails in India
 
"Unidos Por Argentina" Documento del Encuentro del PJ de la Provincia de Buen...
"Unidos Por Argentina" Documento del Encuentro del PJ de la Provincia de Buen..."Unidos Por Argentina" Documento del Encuentro del PJ de la Provincia de Buen...
"Unidos Por Argentina" Documento del Encuentro del PJ de la Provincia de Buen...
 
The OK! technology - Exposé v3.26 20170208 (LinkedIn)
The OK! technology - Exposé v3.26 20170208 (LinkedIn)The OK! technology - Exposé v3.26 20170208 (LinkedIn)
The OK! technology - Exposé v3.26 20170208 (LinkedIn)
 
Corporacion universitaria remigton
Corporacion universitaria remigtonCorporacion universitaria remigton
Corporacion universitaria remigton
 
χριστουγεννα σε διαφορες χωρες/μαριάννα-λυδία-μαρία
χριστουγεννα σε διαφορες χωρες/μαριάννα-λυδία-μαρίαχριστουγεννα σε διαφορες χωρες/μαριάννα-λυδία-μαρία
χριστουγεννα σε διαφορες χωρες/μαριάννα-λυδία-μαρία
 
Burke: Learning and Growing through Marketing Research
Burke: Learning and Growing through Marketing ResearchBurke: Learning and Growing through Marketing Research
Burke: Learning and Growing through Marketing Research
 
surface water
surface watersurface water
surface water
 
Collserola júlia, arnau i marc
Collserola júlia, arnau i marcCollserola júlia, arnau i marc
Collserola júlia, arnau i marc
 
Launching Your Brand in Social Media
Launching Your Brand in Social MediaLaunching Your Brand in Social Media
Launching Your Brand in Social Media
 
IRECIPE BOT
IRECIPE BOTIRECIPE BOT
IRECIPE BOT
 
Audience feedback analysis
Audience feedback analysisAudience feedback analysis
Audience feedback analysis
 
Fall 2016 ats summit - Parent & Origin Selection
Fall 2016 ats summit  - Parent & Origin SelectionFall 2016 ats summit  - Parent & Origin Selection
Fall 2016 ats summit - Parent & Origin Selection
 
Tema2 bbdd
Tema2 bbddTema2 bbdd
Tema2 bbdd
 
Modelo de Predicción de Riesgos Penales
Modelo de Predicción de Riesgos PenalesModelo de Predicción de Riesgos Penales
Modelo de Predicción de Riesgos Penales
 
Magnifying loupe adjustment
Magnifying loupe adjustment  Magnifying loupe adjustment
Magnifying loupe adjustment
 

Similaire à Whitepaper- Real World Search

Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Lucidworks (Archived)
 
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchWhat Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchLucidworks (Archived)
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchLucidworks (Archived)
 
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Findwise
 
Situational applications and their role in enterprise it strategy
Situational applications and their role in enterprise it strategySituational applications and their role in enterprise it strategy
Situational applications and their role in enterprise it strategyNewton Day Uploads
 
A comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsA comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsSusantaSethi3
 
Intranets and hybrid search
Intranets and hybrid searchIntranets and hybrid search
Intranets and hybrid searchIntranätverk
 
Getting started with Lucidworks Enterprise
Getting started with Lucidworks EnterpriseGetting started with Lucidworks Enterprise
Getting started with Lucidworks EnterpriseLucidworks (Archived)
 
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...Findwise
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisationvoginip
 
How to be Successful with Search in YOUR Organization
How to be Successful with Search in YOUR OrganizationHow to be Successful with Search in YOUR Organization
How to be Successful with Search in YOUR OrganizationAgnes Molnar
 
Steering Away from Bolted-On Analytics
Steering Away from Bolted-On AnalyticsSteering Away from Bolted-On Analytics
Steering Away from Bolted-On AnalyticsConnexica
 
Research report nithish
Research report nithishResearch report nithish
Research report nithishNithish Kumar
 
Research Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish KumarResearch Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish KumarNithish Kumar
 
In-memory computing platform whitepaper
In-memory computing platform whitepaperIn-memory computing platform whitepaper
In-memory computing platform whitepaperCarolyn Sughrue
 
Interactive and Conversational Search with Google Cloud and Elasticsearch
Interactive and Conversational Search with Google Cloud and ElasticsearchInteractive and Conversational Search with Google Cloud and Elasticsearch
Interactive and Conversational Search with Google Cloud and ElasticsearchInexture Solutions
 

Similaire à Whitepaper- Real World Search (20)

Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
 
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchWhat Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source Search
 
Liwp consider opensource2010
Liwp consider opensource2010Liwp consider opensource2010
Liwp consider opensource2010
 
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
 
FAST Search-webinar-06-29-2010
FAST Search-webinar-06-29-2010FAST Search-webinar-06-29-2010
FAST Search-webinar-06-29-2010
 
Situational applications and their role in enterprise it strategy
Situational applications and their role in enterprise it strategySituational applications and their role in enterprise it strategy
Situational applications and their role in enterprise it strategy
 
A comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsA comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery tools
 
Intranets and hybrid search
Intranets and hybrid searchIntranets and hybrid search
Intranets and hybrid search
 
Getting started with Lucidworks Enterprise
Getting started with Lucidworks EnterpriseGetting started with Lucidworks Enterprise
Getting started with Lucidworks Enterprise
 
Exploring Splunk
Exploring SplunkExploring Splunk
Exploring Splunk
 
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisation
 
How to be Successful with Search in YOUR Organization
How to be Successful with Search in YOUR OrganizationHow to be Successful with Search in YOUR Organization
How to be Successful with Search in YOUR Organization
 
Steering Away from Bolted-On Analytics
Steering Away from Bolted-On AnalyticsSteering Away from Bolted-On Analytics
Steering Away from Bolted-On Analytics
 
Industry 4.0 module 4
Industry 4.0 module 4Industry 4.0 module 4
Industry 4.0 module 4
 
Research report nithish
Research report nithishResearch report nithish
Research report nithish
 
Research Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish KumarResearch Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish Kumar
 
In-memory computing platform whitepaper
In-memory computing platform whitepaperIn-memory computing platform whitepaper
In-memory computing platform whitepaper
 
Interactive and Conversational Search with Google Cloud and Elasticsearch
Interactive and Conversational Search with Google Cloud and ElasticsearchInteractive and Conversational Search with Google Cloud and Elasticsearch
Interactive and Conversational Search with Google Cloud and Elasticsearch
 

Plus de Enterprise Technology Management (ETM)

The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...
The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...
The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...Enterprise Technology Management (ETM)
 
Implementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTH
Implementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTHImplementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTH
Implementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTHEnterprise Technology Management (ETM)
 
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 PercentMicrosoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 PercentEnterprise Technology Management (ETM)
 

Plus de Enterprise Technology Management (ETM) (18)

The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...
The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...
The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...
 
IMPROVING ORDER-TO-CASH CYCLE.
IMPROVING ORDER-TO-CASH CYCLE.IMPROVING ORDER-TO-CASH CYCLE.
IMPROVING ORDER-TO-CASH CYCLE.
 
The future of Finance
The future of FinanceThe future of Finance
The future of Finance
 
.The Complete Guide to Log and Event Management
.The Complete Guide to Log and Event Management.The Complete Guide to Log and Event Management
.The Complete Guide to Log and Event Management
 
Optimizing the Cloud Infrastructure for Enterprise Applications
Optimizing the Cloud Infrastructure for Enterprise ApplicationsOptimizing the Cloud Infrastructure for Enterprise Applications
Optimizing the Cloud Infrastructure for Enterprise Applications
 
Managing The Virtualized Enterprise New Technology, New Challenges
Managing The Virtualized Enterprise New Technology, New ChallengesManaging The Virtualized Enterprise New Technology, New Challenges
Managing The Virtualized Enterprise New Technology, New Challenges
 
Leveraging Log Management to provide business value
Leveraging Log Management to provide business valueLeveraging Log Management to provide business value
Leveraging Log Management to provide business value
 
The Top Ten Insider Threats And How To Prevent Them
The Top Ten Insider Threats And How To Prevent ThemThe Top Ten Insider Threats And How To Prevent Them
The Top Ten Insider Threats And How To Prevent Them
 
Content Aware SIEM™ defined
Content Aware SIEM™ definedContent Aware SIEM™ defined
Content Aware SIEM™ defined
 
Is Outsourcing Right for You?
Is Outsourcing Right for You?Is Outsourcing Right for You?
Is Outsourcing Right for You?
 
Implementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTH
Implementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTHImplementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTH
Implementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTH
 
Ibm social commerce_whitepaper
Ibm social commerce_whitepaperIbm social commerce_whitepaper
Ibm social commerce_whitepaper
 
Cloud view platform-highlights-web3
Cloud view platform-highlights-web3Cloud view platform-highlights-web3
Cloud view platform-highlights-web3
 
10 obvious statements about software configuration and change
10 obvious statements about software configuration and change10 obvious statements about software configuration and change
10 obvious statements about software configuration and change
 
Don't let wireless_detour_your_pci_compliance
Don't let wireless_detour_your_pci_complianceDon't let wireless_detour_your_pci_compliance
Don't let wireless_detour_your_pci_compliance
 
Qradar Business Case
Qradar Business CaseQradar Business Case
Qradar Business Case
 
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 PercentMicrosoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
 
Kickfire: Best Of All Worlds
Kickfire: Best Of All WorldsKickfire: Best Of All Worlds
Kickfire: Best Of All Worlds
 

Dernier

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Whitepaper- Real World Search

  • 1. The Case for Lucene/Solr: A Manager’s Guide to Real World Open Source pplications Search Applications By Lucid Imagination
  • 2. Abstract In today’s information-driven environment, search is a critical solution to problems when it slashes the time and effort separating end users from the data they value. Search spans the range of business models and use cases—from driving direct customer sales, to analytics and business intelligence, employee productivity, and reduced administrative overhead. Making the best use of search requires two perspectives: both a look at the business requirements for a search application and a view to new business opportunities created by using search to leverage the organization’s content resources. Thousands of organizations across different sectors and business models have harnessed Apache Lucene/Solr to search their rapidly growing and diversifying content resources. Underlying this broad adoption is the extraordinary power, scalability, and versatility of open source search technologies. This paper provides an overview of both the requirements and the opportunities for search applications. It then explores how real world organizations are successfully using Lucene/Solr search applications to meet those opportunities, presenting how the technology is used for specific business models and use cases across industries. In addition, it offers a baseline for setting search requirements that managers and architects can use to adopt Lucene/Solr, and adapt this open source search technology to the unique needs of their business. © 2010, Lucid Imagination The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page ii
  • 3. Table of Contents Introduction ............................................................................................................................................................ 1 Understanding Search Opportunities and Requirements ..................................................................... 2 What Data and Documents Are You Searching? ............................................................................... 2 Who Needs the Results and Why? ......................................................................................................... 3 Where Is Search Integrated with IT Infrastructure? ...................................................................... 4 How Is the Search Interface Presented to the User? ...................................................................... 5 The Real World: Applications and Case Studies........................................................................................ 7 Yellow Pages, Local Search, and Searching Classifieds....................................................................... 8 Media................................................................................................................................................................... 10 E-commerce ..................................................................................................................................................... 12 Job and Career Sites ...................................................................................................................................... 14 Libraries, Archives, and Museums (LAMs) Search ............................................................................ 16 Social Media Search ....................................................................................................................................... 18 Enterprise (Intranet) Search ..................................................................................................................... 21 Business Use Case Matrix................................................................................................................................ 23 Appendix: Lucene/Solr Features and Benefits ....................................................................................... 24 The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page iii
  • 4. Introduction As fast as companies, communities, and consumers produce data—about each other, products, opinions, research, and everything else imaginable—they need faster, more versatile search capabilities to find the information they need to create opportunities for competitive advantage. In today’s information-driven environment, search addresses the critical problems created by the explosive growth of content by slashing the time and effort users expend in finding data they value. Search spans the range of business models and use cases: from driving direct customer sales, to analytics and business intelligence, employee productivity, and reduced administrative overhead. Apache Lucene/Solr1 open source search technology has been implemented across the broadest range of applications and business models—and likely in ways that can fit the needs of your organization. In successful operation today at thousands of enterprises, Lucene/Solr technology scales from tens of thousands to hundreds and billions of documents; searches data that is structured, unstructured, and in combination; data inside and outside the firewall; and ranges in use from a simple website search box through sophisticated faceted navigation. It addresses equally diverse business processes and mission critical applications. Across the spectrum, Lucene/Solr helps users find, make sense of, and act upon information quickly and efficiently. In this white paper, we’ll review real-world case studies for Lucene/Solr functionality across business sectors to demonstrate its versatility and varied applicability. The diversity of examples provides strong evidence of Lucene/Solr’s flexibility and power as a search technology. The examples also attest to the innovation and transparency inherent to the open source development model. Our focus is on familiarizing the audience of business managers and application owners with existing Lucene/Solr applications; the substantial technical advantages to developers are covered elsewhere. We’ll first survey the key requirements and business use cases of search and then look at where they are built into search applications. Our objective is to provide business managers and application owners with a broad perspective on how Lucene/Solr search technology is used to build solutions to compelling business problems. In the Appendix, we provide an overview of Lucene/Solr’s key features and benefits, with a basic outline of the capabilities offered to meet the broadest range of business needs. 1 Lucene and Solr are complementary technologies that offer very similar underlying capabilities; Solr is the Lucene Search Server. Since Lucene serves as the core of Solr’s search capabilities, this paper refers to the two as Lucene/Solr. For more information, see the Appendix. The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 1
  • 5. Understanding Search Opportunities and Requirements Search technology has come a long way from its roots in matching keywords with appearance in documents and obtaining undifferentiated results. Search today empowers users by delivering actionable information quickly and efficiently, across multiple, diverse sources of data. The business use cases range from executing mission critical commercial transactions (e.g., e-commerce sites) to unlocking employee and end-user productivity in the search for a single relevant document (e.g., enterprise search). Given the breadth of capability of the problem domain, it’s useful to look at search and ask two fundamental questions: “How it can it solve my business problems?” and “What new business opportunities can search solve for?” In considering how search technology solves business problems, it is useful to start with an elucidation of the requirements you’ll need to consider for your search application. At the same time, be sure to look more broadly at the capabilities that Lucene/Solr offers, as it can help open up new frontiers for incorporating search and leveraging more value from data repositories. Starting with some basic questions—what, who, how, and where—you can clarify the high-level business requirements specific to your business needs, which in turn allow you to make the best decisions for your search application. The process of looking at the fundamentals also raises new questions about how and where the search technology offered by Lucene and Solr can create new business opportunities. Let’s look at four fundamental questions you should address in understanding search opportunities and requirements: • What data and documents are you searching? • Who needs the results and why? • Where is search integrated with IT Infrastructure? • How is the search interface presented to the user? What Data and Documents Are You Searching? Business today is driven more than ever by the end-users’ creation and consumption of real-time information. A key differentiating capability of search technology is ingesting a broad range of content types and processing large collections of diverse data in real time in order to deliver actionable information. Two aspects to consider: The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 2
  • 6. Types of Content Content comes in multiple formats: HTML pages, XML files, PDFs, images, PowerPoint presentations, Excel spreadsheets, Word documents, log files, multimedia content, and more. Content resides in various repositories, including databases, file servers, content management systems, archiving systems, collaboration applications, and employee desktops and laptops. Search technology must be able to locate, organize, and aggregate data whatever its form or location. • Frequency of Updating Content Organizations update content at varying intervals, driven by differing business processes and models—social media or news applications have real-time content need, whereas an e- commerce application might re-index in response to new inventory on a batch basis and a research institution might add to its collection less often still. Search applications need to be adaptable to the differences in content change frequency. Who Needs the Results and Why? Business search puts a high priority on end user experience and results in which the searched content is tuned to the unique needs of each user. Because, after all, the human dimension—the usefulness of results and the efficacy of interaction—is the acid test of a search application. Internet search applications like Google, Yahoo, and Bing are now common and mature. They have raised user expectations about key qualities of the search experience...but they solve a very different problem. While Internet searches can produce millions of results in milliseconds, they rely on measures like website popularity or URLs and domain names—not relevant and not generally applicable to purpose-built applications for businesses. What’s more, they rely on generalizing relevancy for a global population of all Internet users, without being tied to business rules, or business process logic, or the opportunity cost of improved precision for a specific set of data or search users. Business search applications cannot rely on such brute force coarse approaches to tune their results. They need far more control and precision. They have to be able to deliver highly useful results while matching, if not exceeding, the levels of user experience that people have come to expect by virtue of their daily interactions with commercial search engines. Key points of consideration from a business perspective are: The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 3
  • 7. Relevance Relevance is entirely a factor of the goals of the search application’s users. The application must have the mechanisms to recognize the subjective needs of users and tune results accordingly. It must also provide easier ways to narrow search criteria without requiring users to come up with perfect query terms. Flexibility for drilling deeper will make results richer and valuable. Mechanisms to apply filters, proximity values, and sorting parameters to narrow search scope can also lead to a richer set of more useful results, with less time and effort. • Cost of Relevance As business goals are driven by revenue opportunities and cost savings, it is critical to tie relevance to the economics of the business. For example, a public-facing retail site should focus on matching merchandise to search, site stickiness, and customer loyalty. It requires search technology that streamlines and simplifies the shopping experience with relevant results directly contributing to sales revenue. For knowledge workers, internal search applications should help make employees more productive by reducing the amount of time and effort to find documents they need to do their jobs. Multiple studies show that information workers can spend 20–30% of their time searching for information. • Precision Ranking Result accuracy, sorted by attributes like relevance, date, field, or any document property feature, makes the search process better. End users generally abandon a search before tackling the fine points of Boolean logic or scrolling for a result buried too far down. • Query Response Speed Today, 5–7 seconds is the typical threshold for end-user patience. Too much wait time for search results frustrates users, and causes them to abandon pages. Fast, relevant results cannot be limited by search technology hamstrung by data influx or query overload. Query response time should also work hand-in-hand with the refinement of multiple search attributes, so that increasingly complex queries do not extract a performance penalty. Where Is Search Integrated with IT Infrastructure? Useful, valuable search technology rarely exists in isolation. Searched data is transformed into actionable information when it is integrated with the organization’s information infrastructure: business process to business intelligence to content management systems. A robust search technology must be customizable to integrate with the existing systems seamlessly. The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 4
  • 8. Application Integration A key requirement for a search application is its extensibility for integration with existing infrastructure and applications like content management systems, databases, and the full range of business processes and applications. It should have interfaces that support ingestion of data as well as delivery of results in readily consumable formats—because in many cases, results are consumed by other applications, not a human. • Scalability We can assume that data will change and grow. So scalability is a key factor for search application. Applications should grow to address future needs without penalties for the breadth of data or for the count of documents indexed. The search application should be able to grow with the requirements of the organization, without needing additional large investments in hardware to match the pace of growth. Proprietary search vendors often charge for search by the number of documents indexed. In a world where constantly expanding content growth is the norm, such costs can be a real and substantial drag on the cost of ownership for search applications, many times resulting in negative return. • Security Every organization has its own security requirements and access controls. Search technologies need to comply with the security policies of the enterprise, controlling results that have restricted access. The search technology should also be able to make use of document-level security from other sources. How Is the Search Interface Presented to the User? The user interface is where search delivers on findability and presents actionable results. The search application is only as good as the convenience of submitting queries, reviewing and refining results, and finding information. Key aspects to consider: • Navigation Users benefit from guidance that makes their queries more productive. Techniques such as faceted search with result clustering, advance hinting (“did you mean”), “more like this,” and drop down menus for setting search scope help users achieve desired results faster, making a search application both user- and information-friendly. It is also important to allow users to draw associative connections between results—using the technology to uncover relationships and discover more about what they were seeking than they knew at the outset. The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 5
  • 9. The NetFlix search application is powered by Solr; it adds the fuzzy dimension to search, with auto-completion of movie names, correction of misspelled names of actors, and suggests titles closest to the query. As a result, 85% of users have found the movie they were looking for ranked at the #1 spot in the results. • Discovery Search application functionality should extend beyond the generic presentation of a result list of documents that contain a keyword. Highlighting keywords in searched results, expanding searches with synonyms and spell checking, and offering users ways to learn a bit more about documents in the results without having to load the document are great ways to significantly improve usability. • Intuitive Intelligence Search applications must go beyond keyword search to help users retrieve accurate information even when they are not sure of the best keywords. Additionally, they should reduce misinterpretations where homonyms, spelling errors, and ambiguous keywords are involved (e.g., is “apple” a fruit or a computer company?). The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 6
  • 10. The Real World: Applications and Case Studies With an understanding of the fundamentals of search business applications in hand, it is helpful to gain additional context on business usage through a survey of organizations that have successfully used Lucene/Solr for powerful search applications. All of these cases were built on the capability of Lucene/Solr to provide innovative, high- performance, cross-platform, feature-rich search technology suitable for nearly every application. By powering diverse search applications for thousands of organizations such as AT&T, Zappos, McClatchy, Smithsonian, MTV Networks, LinkedIn, MySpace, Comcast, Monster, Netflix, and many more, Lucene/Solr has provided mission critical capability that turns search into a robust competitive advantage. For these organizations, Lucene/Solr solutions regularly index and search hundreds of millions of documents with subsecond response time, unencumbered by costly licensing or vendor lock-in. Together they represent a compelling argument for the broad applicability of Lucene/Solr across the full range of business opportunities and search needs. Business use case studies we’ll review include: • Yellow Pages, Local Search, and Searching Classifieds • Media • E-commerce • Job and Career Sites • Libraries, Archives, and Museums (LAMs) Search • Social Media Search • Enterprise (Intranet) Search The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 7
  • 11. Yellow Pages, Local Search, and Searching Requirements Classifieds In the business of online local search, geographic-based (location) • Intelligent results going relevance generates competitive advantage. Online directories beyond keyword search need to provide a rich, interactive search experience to users to • Deeper, faceted increase site views and stickiness, which in turn translates into navigation increased advertising revenue. Simplified location-based search, • Seamless integration intuitive faceted query response, and data mashups are a few with latest Web 2.0 features that define search functionality for an online directory. tools Lucene/Solr solutions offer accurate search results, factoring in • Lower IT-related costs location, users’ reviews, and ratings, alongside paid advertising. By • Geocentric user taking advantage of Solr’s open source model—with search experience algorithms that are completely transparent—companies can invest • Search numeric values in configuring their search solutions to match their business logic, rather than trying to infer or pay for exposure proprietary back- Solr Solution end logic. • Customizable Search Index which can be Internet Yellow pages and local tuned transparently to online search is forecast to account for key findability drivers grow to $27.8 billion in 2011. • Drop down filters for The Kelsey Report1 narrowing or widening the scope of search Success Stories • Seamless integration • YP.com, a division of AT&T Interactive with existing technologies • Zvents.com, local event search service • Yelp.com, the community local search site • Native numeric encoding and search capabilities • Reduced server footprint for lower TCO than most commercial vendors 1The Kelsey Group’s Global Print Yellow Pages, Internet Yellow Pages and Local Search Five Year Outlook The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 8
  • 12. Case Study 1 yp.com by AT&T Interactive AT&T Interactive is an online and mobile search and advertising company. Their leading-edge portal, yp.com—an online business listing and advertising site—was originally implemented with a commercial proprietary search application. It faced issues of scalability, vendor lock-in, and performance. With help from Lucid Imagination, AT&T successfully migrated to a Solr-based search solution that leveraged the flexibility of open source without compromising features and functionality. And they did so with a much smaller budget. Business Needs • Addressing the need to factor in location to support geographic search, and include relevant comments • Striking a balance between organic search and advertised content • Indexing highly unstructured content such as user comments • Increasing relevancy of results and boosting paid search results for preferential placement of advertisers • Linguistic support to enable search experience, such as spellchecking, synonyms, find-similar, etc. • Integrating with latest Web 2.0 tools • Reducing server footprint The Solr Solution • Context-specific relevancy, geographic proximity, ad placement, and user comments • Faceting, drop down filters to narrow/widen the scope of search • Functional support for creating new features • Spell-correction, and location-optimized search results to show users businesses nearest to them first • Seamless integration with many Web 2.0 tools to create innovative features and mashups • Lowers TCO by reducing the number of search servers from 120 to two dozen servers The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 9
  • 13. Media Brand reinforcement, premium content, and easy accessibility are the main business motivators for online media and Requirements publishing companies. Relevant information improves time on • Real-time indexing of the site and encourages users to explore related content, petabytes of structured boosting subscription rates and site views. These translate into a and unstructured data virtuous cycle of additional revenue generation. • Deeper search capability Given that content is the business, the need for a robust search • Improved query application ties directly to competitive advantage. response time Lucene/Solr provides a customized, function rich solution for the • Reduced infrastructure and customization costs media and publishing industry. It addresses dynamic challenges of content diversity, content freshness, and content acquisition , Solr Solution and gives companies a platform on which to build a world-class innovative search experience to differentiate themselves in a • Reverse indexing highly competitive marketplace. • Intelligent, faceted search to enable contextual and linguistic relevance “Solr has done wonders for us. • Easy configuration for It is easy to understand and parsing structured and unstructured data deploy, and has reduced our • Easy and seamless costs drastically.” installation for lower Doug Steigerwald, TCO • Customization with open McClatchy Interactive source code Success Stories • McClatchy Newspapers • Netflix • Comcast Interactive • MTV Networks, a division of Viacom • The Motley Fool, fool.com • Fanfeedr.com, personalized sports aggregator The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 10
  • 14. Case Study 2 McClatchy—Leading Newspaper Publisher The third largest newspaper publisher in the United States, McClatchy Company owns 30 daily newspapers in 29 markets across the country. To win online, McClatchy knew it had to have a robust search solution, to empower the McClatchy audience with the information they wanted and secure loyalty from readers and sponsorships from advertisers. Working with Lucid Imagination, McClatchy migrated from proprietary search software to open source and chose Solr for its high performance, comprehensive capabilities, and superior value Requirements • Proliferating content and data sources (text, videos, audios, images), with real-time streaming • Empowering end users with ease of use • Supporting peak traffic and popular search spikes with consistent performance • Providing scalability for a database growing by orders of magnitude annually • Providing flexibility to support customization • Controlling IT costs while exceeding performance benchmarks of competition The Lucene/Solr Solution • Deeper content by indexing both structured and unstructured data in real time, effortlessly • Indexes millions of documents, with search results delivered in milliseconds • User-friendly navigation with drop down filters, faceted navigation, linguistic corrections, etc. • Excellent performance, even in peak hours, by load-balancing search requests across servers • Scalability without impact on performance • High degree of customization, since it’s open source • Integration with existing IT infrastructure and eliminates associated license fees to cut costs • 8-fold reduction in server footprint The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 11
  • 15. E-commerce E-commerce businesses must provide a compelling shopping experience Requirements in order to maintain brand equity and thrive in a very highly competitive • Multidimensional, market landscape. By reducing the time and effort required to navigate dynamic search available merchandise and find what they want, superior search • Faster results contributes directly to a satisfying buying experience for customers. • Real-time indexing Search then translates directly into higher revenues and customer of products loyalty. Instant results, intuitively organized, advanced faceting for easy browsing, synchronizing results with images, and integration with user • Faceting and browsing ratings are among the must have features of an e-commerce search application. capabilities • Seamless Lucene/Solr gives companies the ability to build their sites around the integration with concept of “searchendizing”—putting the desired merchandise at the top existing IT of the results list—which can make the difference between sales made infrastructure and sales lost. Faceting, database integration, real-time indexing, and query monitoring all enable users to find products they want, driving conversion rates and enabling a winning online experience. 2 Solr Solution • Faceted search for Online retail sales in the deeper drill down and browsing B2C market are expected • Intuitive search Success Stories to reach $340 billion by capabilities for 201321 cross-channel • Buy.com shopping • Sears.com experience • Macys.com Forrester Research • System • Zappos.com administration tools • Advanceautoparts.com for data loading, • Dollardays.com index replication, monitoring, logging, and cache management • Query monitoring for better highlighting of 2“Consumers will spend more than $340 billion online by 2013, says Forrester,” popular products Internet Retailer, 27 November 2009, http://www.internetretailer.com/dailyNews.asp?id=32630. The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 12
  • 16. Case Study 3 Zappos Zappos is the premier destination for online shoe shopping. At Zappos, the mission is excellent online customer service—customers should be able to browse shoe styles, sizes, shapes, and colors more easily than any other shoe store, on or offline. To achieve this, Zappos wanted a robust, flexible, multifunctional search solution/application. After evaluating many commercial search technologies, Zappos zeroed in on Solr, working with Lucid Imagination to ensure continued, successful deployment. Requirements • Simplified, attractive user experience that makes it easy to find and buy • Relevant results, fast • Navigation across attributes, such as size, color, and style for broader and deeper results • Indexing products as they were entered in the catalogs • Cross-functional navigation to give customers a realistic shopping experience • Intuitive intelligence to provide alternate suggestions • Analytical capabilities to drive business strategy • Facilitating control on results • Integration with existing IT infrastructure The Solr Solution • Search results in subseconds, across categories • Faceting, for easy browsing and discovery and a compelling user experience • Real-time indexing of products • Synchronization of visuals, specs, filters, and promotions to make shopping experience true to life • Information on user activity to help build strategy on product promotions • Controls to rank popular or high-stock products in results where users are more likely to buy them • Facilitates integration with heterogeneous open source environment The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 13
  • 17. Job and Career Sites Job portals are countercyclical to the economy. When the economy Requirements flourishes, posted jobs grow in number; when it sags, candidates flock in to post their resumes. Success for an online job portal is tied to the • Linguistic efficiency of its search capability—matching résumés to job listings and intelligence for vice versa—so both employers and prospective employees can zero in on more relevant just the right opportunity. results • Control search For example, an employer may want to navigate through filters to results to maintain narrow the scope of a candidate search, such as education, previous privacy employer, salary history, skillsets, etc.; a job seeker may want to expose these attributes, but keep a current employer’s name confidential. A job- • Deeper search seeker may want to apply to jobs within a particular geographic area. capability • Numeric search Lucene/Solr not only provides such flexibility but also addresses other • Faster query complexities of this industry by enabling linguistic intelligence (such as response identical acronyms that correspond to different entities; variations in • Reduced spelling, imperfectly constructed search queries); indexing unstructured infrastructure and data (résumés); and managing ever-growing data. customization costs Solr Solution “I think the breakthrough was • Intelligent, faceted when we tried it, and we search to enable realized, wow, this thing could contextual and linguistic relevance really scale.” • Easy configuration for parsing Peter Keegan, Monster.com structured and Success Stories unstructured data • Monster • Easy and seamless installation for • The Big Jobs lower TCO • eBharatJobs • Careerjet • Business process integration and Customization with open source code The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 14
  • 18. Case Study 4 Monster.com Monster is the largest job search engine in the world, with over a million jobs posted at any one time. By 2008 it had 150 million résumés in its database, serving over 63 million job seekers per month, now running on average 300 to 400 queries per second with an average response time of 40 milliseconds. To provide the highest level of service and support to their customers—both employers and job seekers—Monster has an unmatched marketplace for employment opportunities, with Lucene-based search at the heart of its business model. The Requirements • Managing high volumes of data, continually increasing by double digit percentages annually • Maintaining constant inventory updates and providing faster results • Removing technological barriers that limit the scope of information • Enabling end users to refine search and drill deeper without any performance impact • Providing security controls to ensure end user privacy • Facilitating scalability and flexibility in tandem with company’s vision and growth plans The Lucene Solution • High volumes of data by clustering data to reduce the index size • Real-time indexing for fresher, faster query results • Intuitive search to enable in-depth cross-functional job and résumé browsing • Faceted search and ‘single click’ filters for search refinement • Security controls to manage user information • Unlimited scalability and customization leveraging open source licensing The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 15
  • 19. Libraries, Archives, and Museums (LAMs) Search The core asset of educational and research institutions is knowledge Requirements archived and accumulated over decades. In the world of academic search, the diversity of information for any query—text, illustration, audio/video • Management of media, or data in any other format—makes unstructured formats a key multiple formats of aspect of the searchable archive. data and documents • Customization and Lucene/Solr gives academic and research institutions the power to turn scalability information into knowledge by going beyond keyword-driven search to • Linguistic support in expose a rich variety of results and exploration. Based on the open source queries model, it not only integrates with the existing IT infrastructure but also • Faster results leverages the existing classification hierarchies to give structure to terabytes of information spread across disparate collections, significantly reducing overhead and enabling flexible and scalable deployment. Solr Solution • Optimized index “With Solr, you can do so many things infrastructure limits size without without writing a lick of code. I hadn't compromising speed realized how easy it is to extend our or flexibility custom request handler, response • Easy customization writer, and update handler. Just move it for implementing taxonomy rules all to Solr and let it do the heavy • Faceted search to lifting.” narrow results to a specific source across Sjored Siebinga, Europeana diverse sets of data Success Stories • Instant results • Seamless integration • Smithsonian Institute with IT • Europeana, the European Union online cultural archive infrastructure for • The US Library of Congress and World Digital Library lower TCO • Stanford University Library • University of Michigan Graduate Library The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 16
  • 20. Case Study 5 Smithsonian The Smithsonian Institution is the flagship museum collection of the United States, supporting a research institute that provides “one-stop” searching for 2 million records, including nearly a quarter of a million media files (images, media files, online journals, and other resources) distributed across dozens of archives, databases, museums, and libraries. To make this treasure of information easily accessible to people, the Smithsonian needed an efficient search solution that could overcome the following challenges: The Challenges • Managing a complicated taxonomy that could no longer accommodate a growing data index • Indexing disparate types of content, including documents, videos, and images • Making information available from a large database • Providing access controls to restrict information • Integrating with existing legacy tools Smithsonian chose Lucene/Solr, and worked with Lucid Imagination to create an optimized, well-designed solution. The Solr Solution • Efficient index strategy to manage a mix of structured and unstructured data • Holistic search, by optimizing configuration to reduce the number of servers and better handling query requests • Filtering information through faceted search • Access controls to restrict information based on membership profiles • Integration with the existing IT infrastructure • Provides guidance and assistance on setting replicated search environment The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 17
  • 21. Social Media Search Requirements Search solutions must support differentiated business models matching Web 2.0 innovations, including user-generated content • Deliver search results and mashups, without compromising scalability—a challenge, given the virtually limitless content on the Internet. Success and as soon as content is differentiation is measured by how well the site provides relevant available results to grow its user base and keeps them engaged. • Deeper drill down Increasingly, the technological factors driving Web 2.0 application capabilities paradigms are finding their way into the enterprise, unlocking • Intuitive interface collaboration and productivity in new ways that challenge conventional organizational bounds—and that rely in equal measure on search to create the connections between employees Lucene/Solr Solution to enable discovery, cross-pollination, and more efficient collective effort. • Near-instant results with segmentable Lucene/Solr not only provides fast results but also facilitates flexible, intuitive navigation to help end users connect with others. indexing It boosts the reach and performance of search, while cutting • Intuitive search implementation costs and lowering barriers to innovation. • Data-driven spellchecking based on user search Success Stories “With Solr, we really treat it histories • Digg as kind of a platform where Linguistic support • Myspace we can build other kind of • LinkedIn through ‘Did you • Reddit things on top of it… We have mean" functionality • Technorati a very valuable set of data, Highlighting keywords • Scout Labs and we really want to • Deeper drill down • Xmarks.com explore new ways of with faceting building new features from • Real-time content that data set.” updating —Sammy Yu, Digg.com The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 18
  • 22. Case Study 6 Digg.com Digg displays the wisdom of the crowds. By leveraging the mass collaboration of readers distributed across the Internet—everything on Digg is submitted by the public community for the public community—it builds on the easy community findability of information valued by the marketplace of readers and consumers. Digg realized early on that to succeed in the business of information, they need to make information available to needed their audience as effortlessly as possible. They saw the following challenges as roadblocks for implementing a base search application: Requirements • Managing unstructured data (13 million documents and growing) in real time • Providing results faster • Facilitating smart navigation to provide information in digestible portions • Recognizing and eliminating duplicate content • Providing semantic and linguistic smart application • Facilitating scalability while retaining costs Digg selected Solr for its unmatched flexibility and functionality. The Solr Solution • Highly customizable and flexible • Results in subseconds, with simple-to-use pull downs to refine results seconds, simple • Fuzzy duplicate detection (by coding) uzzy • Unlimited scalability and seamless integration with the heterogeneous environment nlimited The Case for Lucene/Solr: Real World Search Applications eal A Lucid Imagination White Paper • January 2010 Page 19
  • 23. Case Study 7 LinkedIn Connecting 50 million registered users from 200 countries across 170 industries and matching them to the right professional contacts is what LinkedIn is all about. LinkedIn’s business is premised on ’s intelligent search application that could overcome the following: The Challenges • Managing an ever-growing database, with one new member joining and creating a profile every growing data second • Indexing unstructured data in real time • Giving instant query responses, even in peak traffic hours • Providing intuitive navigation and intelligent linguistic support • Integrating with other Web 2.0 tools to build user profiles that integrate data from multiple sources They chose Lucene to implement the search function at the core of their business model. model The Lucene Solution • Used index segmentation for faster results and to limit index base • Provided faceted search and intelligence support features like changing the view of search results and auto-complet completion of contacts • Calculated relative relevance, ranking results on the fly based on relationship between the user’s profile and the other profiles being searched • Integrated with the latest web tools for example, incorporating videos in search results tools; • Provided "scale as you grow” facility through the flexibility of the open source model scale grow The Case for Lucene/Solr: Real World Search Applications eal A Lucid Imagination White Paper • January 2010 Page 20
  • 24. Enterprise (Intranet) Search Enterprises today have a global footprint, which leads to the creation of Requirements multiple content types and the use of disparate applications and content management systems across business centers. The result is often silos of • Single interface to unmanaged data spread across the intranet of an enterprise—a situation access enterprise where information is omnipresent but cannot be used. data To achieve a competitive advantage, enable intelligent decisionmaking, • Faster results eliminate duplication of work, and lower the cost of ownership, • Control over search enterprises need a search application that gives structure to results unstructured data; provides a single gateway to search across multiple • Ready integration enterprise repositories, with speed, flexibility, and intuitive intelligence. with existing content Lucene/Solr is a solid match for enterprise search. As a customizable and management multifunctional search application, Lucene/Solr provides robust search software features at minimal cost. The open source development model behind Lucene/Solr integrates seamlessly with legacy tools, and brings down Solr Solution the total cost of ownership significantly. Given the sensitive nature of enterprise content, Lucene/Solr facilitates • Single gateway for all types of data document-level, role-based security. And with the transparent search algorithms and configurability for relevancy, Lucene/Solr enables • Dynamic boosting intranet search with the precise control enterprise content owners of content require, ensuring that results consistently deliver the right documents to • Transparent search the right people. algorithms and relevancy tuning • Customization and “The search and discovery easy integration software market grew 19 with open source percent in 2008 to $2.1 billion” code Sue Feldman, IDC The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 21
  • 25. Case Study 8 Food and Drug Administration The Food and Drug Administration (FDA) is a U.S. government agency responsible for regulating and supervising the safety of foods medications, veterinary products, tobacco, and cosmetics. The FDA has a large repository of information that dates back multiple decades, and exists in formats ranging from early optical character recognition to recent electronic formats. To mine this knowledge base, the FDA is developing a semantic mining framework using open source tools such as Apache Lucene and Solr. Requirements • Integrating petabytes of data highly distributed across the intranet of an enterprise • Managing multiple indices for documents stored in distributed repositories • Managing and maintaining archival data and evolving vocabularies • Indexing unstructured data in real time • Recognizing and eliminating duplicate content • Handling concurrent queries and delivering fast and relevant results • Restricting search results according to agency access control policies • Integrating with existing infrastructure without additional overhead The Lucene Solution • A single gateway to search across multiple enterprise repositories • Duplicate detection • Fast and relevant results with content analysis and query interpretation algorithms • Filters results based on access controls and security policies of an enterprise • Facilitates integration with existing enterprise infrastructure to reduce TCO The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 22
  • 26. Business Use Case Matrix To simplify mapping your search needs to existing search applications in the real world, the matrix below compares business use cases against key search requirements. While not an exhaustive list, the matrix highlights the different business use cases across sectors and business models, reflecting the adaptability of Lucene/Solr across the various domains of search applications and use cases. Users Content Content Update Frequency Access Verticals Customer Control Internal Original Aggregated High Medium Low Facing Enterprise (Intranet) √ √ √ √ Schools/ √ √ √ √ √ √ Universities Education Libraries √ √ √ √ √ Job Portals √ √ √ √ Social Networks √ √ √ √ √ News √ √ √ √ Media Media √ √ √ √ E-Commerce Sites √ √ √ √ √ √ Financial Services √ √ √ √ √ Yellow Pages √ √ √ Horizontal Portals √ √ √ √ The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 23
  • 27. Appendix: Lucene/Solr Features and Benefits Lucene and Solr are complementary technologies that offer very similar underlying capabilities. In choosing a search solution that is best suited for your requirements, key factors to consider are application scope, development environment, and software development preferences. Lucene is a Java technology-based search library that offers speed, relevancy ranking, complete query capabilities, portability, scalability, and low overhead indexes and rapid incremental indexing. Solr is the Lucene Search Server. It presents a web service layer built atop Lucene using the Lucene search library and extending it to provide application users with a ready-to-use search platform. Solr brings with it operational and administrative capabilities like web services, faceting, configurable schema, caching, replication, and administrative tools for configuration, data loading, statistics, logging, cache management, and more. Lucene presents a collection of directly callable Java libraries and requires coding and solid information retrieval experience. Solr extends the capabilities of Lucene to provide an enterprise- ready search platform, eliminating the need for extensive programming. Solr provides the starting point for most developers who are building a Lucene-based search application. It comes ready to run in a servlet container such as Tomcat or Jetty, making it ready to scale in a production Java environment. With convenient ReST-like/web-service interfaces callable over HTTP, and transparent XML-based configuration files, Solr can greatly accelerate application development and maintenance. In fact, Lucene programmers have often reported that they find Solr contains “the same features I was going to build myself as a framework for Lucene, but already very well implemented.” Using Solr, enterprises can customize the search application according to their requirements, without involving the cost and risk of writing the code from the scratch. Lucene provides greater control of your source code and works best in development environments where resources need to be controlled exclusively by Java API calls. It works best when constructing and embedding a state-of-the-art search engine, allowing programmers to assemble and compile inside a native Java application. While working with Lucene, programmers can directly control the large set of sophisticated features with low-level access, data, or state manipulation. Enterprises that do not require strict control of low-level Java libraries generally prefer Solr, as it provides ease of use and scalable search power out of the box. The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 24
  • 28. As functional siblings, Lucene and Solr have become popular alternatives for search applications; the two differ mainly in the style of application development used. Key benefits of search with Lucene/Solr include: • Search Quality: Speed, Relevance, and Precision Lucene/Solr provides near-real-time search and strong relevance ranking to deliver contextually relevant and accurate results very quickly. Tailor-made coding for relevancy ranking and sophisticated search capabilities like faceted search help users in sorting, organizing, classifying, and structuring retrieved information to ensure that search delivers desired results. Search with Lucene/Solr also provides proximity operators, wildcards, fielded searching, term/field/document weights, find-similar functions, spell checking, multilingual search, and much more. • Lower Cost and Greater Flexibility, Plug and Play Architecture Lucene/Solr reduces recurring and nonrecurring costs, lowering your TCO. As open source software, it does not require purchase of a license and is freely available for use. The open source code can be used as is, modified, customized, and updated as appropriate to your needs. Solr is easily embedded in your enterprise’s existing infrastructure, reducing costs of installation, configuration, and management. • Open Source Platform for Portability and Easy Deployment Because Lucene/Solr is an open-source software solution, it is based on open standards and community-driven development processes. It is highly portable and can run on any platform that supports Java. For instance, you can build an index on Linux and copy it to a Microsoft Windows machine and search there. This unsurpassed portability enables you to keep your search application and your company’s evolving infrastructure in tandem. Lucene, in turn, has been implemented in other environments, including C#, C, Python, and PHP. At deployment time, Solr offers very flexible options; it can be easily deployed on a single server as well as on distributed, multiserver systems. • Largest Installed Base of Applications, Increasing Customer Base Lucene/Solr is the most widely used open source search system and is installed in around 4,000 organizations worldwide. Publicly visible search sites that use Lucene/Solr include CNET, LinkedIn, Monster, Digg, Zappos, MySpace, Netflix, and Wikipedia. Lucene/Solr is also in use at Apple, HP, IBM, Iron Mountain, and Los Alamos National Laboratories. The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 25
  • 29. Large Developer Base and Adaptability As community developed software, Lucene/Solr provides transparent development and easy access to updates and releases. Developers can work with open source code and customize the software according to business-specific needs and objectives. Its open source paradigm lets Lucene/Solr provide developers with the freedom and flexibility to evolve the software with changing requirements, liberating them from the constraints of commercial vendors. • Commercial-Grade Support for Mission Critical Search Applications from Lucid Imagination Lucid Imagination provides the expertise, resources, and services that are needed to help enterprises deploy and develop Lucene-based search solutions efficiently and cost-effectively. Lucid helps enterprises achieve optimal search performance and accuracy with its broad range of expertise, which includes indexing and metadata management, content analysis, business rule application, and natural language processing. Lucid Imagination also offers certified distributions of Lucene and Solr, commercial-grade SLA-based support, training, high-level consulting and value-added software extensions to enable customers to create powerful and successful search applications. The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010 Page 26