1. Bringing Reusability to Enterprise
Search
Using Solr for building reusable enterprise search
engine.
A Collabor Labs Technology Paper, May 2011
This whitepaper discusses the high level technical aspects of using solr to bring re
usability in enterprise search implementation
Brahmaji Pusuluri
Sr. Software Engineer
www.collabor.com info@collabor.com
2. Using Lucene for enterprise search
Apache Lucene(TM) is a high-performance, full-featured text search engine library written
entirely in Java. It is suitable for nearly any application that requires full-text search.
Solr - the reusable enterprise search engine
Solr is the popular, blazing fast open source enterprise search platform from the Apache
Lucene project. HTTP request processing for indexing and querying documents. Thus, you
can have an application anywhere query and index files over the Internet via XML over
HTTP using the URL of your Solr search server. It is also a highly optimized search server
with caching and replication to other Solr search servers. It has the powerful feature of
indexing Rich text documents (e.g.: word, pdf, etc.)
Working with Solr
Once Solr is installed successfully, we need to modify the following files as per the project
requirements.
Solrconfig.xml:
Solrconfig.xml solrconfig.xml is the file that contains most of the parameters for
configuring Solr itself.
Schema.xml:
Schema.xml The schema.xml file contains all of the details about which fields your
documents can contain, and how those fields should be dealt with when adding
documents to the index, or when querying those fields.
Once the settings are done you can send an xml file to the Solr to index the data by using
curl. curl http://localhost:8983/solr/update -F stream. file=/tmp/example.xml
example.xml file containing the tags format which is defined in schema.xml.
Research by: Collabor Labs Page 2
May 2011 All trademarks belong to their respective owners
3. Index a DB table directly into Solr using DataImportHandler
Most applications store data in relational databases or XML files and searching over such
data is a common use-case. The DataImportHandler is a Solr contrib that provides a
configuration driven way to import this data into Solr in both "full builds" and using
incremental delta imports.
Edit your solrconfig.xml to add the request handler
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
The data-config.xml file contains the following.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/dbname" user="user-name" password="password"/>
<document>
<entity name="name" query="select id,name,desc from mytable">
<field column="id" name="solr_id"/>
<field column="name" name="solr_name"/>
<field column="desc" name="solr_desc"/>
<entity name="inner"
query="select details from another_table where id ='${outer.id}'">
<field column="details" name="solr_details"/>
</entity>
</entity>
</document>
</dataConfig>
Run the full-import command to index the entire database.
http://localhost:8983/solr/dataimport?command=full-import
Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/dataimport?command=delta-import
Building Reusable search engine with Solr using multi-core
Multiple cores let you have a single Solr instance with separate configurations and
indexes, with their own configuration and schema for very different applications, but still
have the convenience of unified administration. Individual indexes are still fairly isolated,
but you can manage them as a single application, create new indexes on the fly by
Research by: Collabor Labs Page 3
May 2011 All trademarks belong to their respective owners
4. spinning up new SolrCores, and even make one SolrCore replace another SolrCore
without ever restarting your Servlet Container.
Edit the solr.xml and write a snippet. See example below.
<solr persistent="true" sharedLib="lib">
<cores adminPath="/admin/cores">
<core name="application1" instanceDir="app1">
<property name="dataDir" value="/app1/data" />
<property name="configName" value="/app1/config.xml" />
<property name="schemaName" value="/app1/schema.xml" />
</core>
<core name="application2" instanceDir="app2" />
</cores>
</solr>
Run the full-import command to index the entire database in application1.
http://localhost:8983/solr/application1/dataimport?command=full-import
Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/application1/dataimport?command=delta-import
Run the full-import command to index the entire database in application2.
http://localhost:8983/solr/application2/dataimport?command=full-import
Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/application2/dataimport?command=delta-import
Searching for indexes
http://localhost:8983/solr/application1/select/?q=searchterm returns xml file with
results.
We can reuse single Solr installation to multiple enterprise search implementations.
References:
1. http://lucene.apache.org/solr/
2. Wikipedia pages -- Apache Solr
-
For more information, contact: info@collabor.com
Research by: Collabor Labs Page 4
May 2011 All trademarks belong to their respective owners