SlideShare a Scribd company logo
1 of 4
Download to read offline
Bringing Reusability to Enterprise
       Search
       Using Solr for building reusable enterprise search
       engine.

       A Collabor Labs Technology Paper, May 2011




       This whitepaper discusses the high level technical aspects of using solr to bring re
       usability in enterprise search implementation




                                                                          Brahmaji Pusuluri
                                                                       Sr. Software Engineer


www.collabor.com                                                           info@collabor.com
Using Lucene for enterprise search

Apache Lucene(TM) is a high-performance, full-featured text search engine library written
entirely in Java. It is suitable for nearly any application that requires full-text search.




Solr - the reusable enterprise search engine

Solr is the popular, blazing fast open source enterprise search platform from the Apache
Lucene project. HTTP request processing for indexing and querying documents. Thus, you
can have an application anywhere query and index files over the Internet via XML over
HTTP using the URL of your Solr search server. It is also a highly optimized search server
with caching and replication to other Solr search servers. It has the powerful feature of
indexing Rich text documents (e.g.: word, pdf, etc.)




Working with Solr

Once Solr is installed successfully, we need to modify the following files as per the project
requirements.

Solrconfig.xml:
Solrconfig.xml solrconfig.xml is the file that contains most of the parameters for
configuring Solr itself.

Schema.xml:
Schema.xml The schema.xml file contains all of the details about which fields your
documents can contain, and how those fields should be dealt with when adding
documents to the index, or when querying those fields.

Once the settings are done you can send an xml file to the Solr to index the data by using
curl. curl http://localhost:8983/solr/update -F stream. file=/tmp/example.xml
example.xml file containing the tags format which is defined in schema.xml.




Research by: Collabor Labs                                                                         Page 2
May 2011                                                   All trademarks belong to their respective owners
Index a DB table directly into Solr using DataImportHandler


Most applications store data in relational databases or XML files and searching over such
data is a common use-case. The DataImportHandler is a Solr contrib that provides a
configuration driven way to import this data into Solr in both "full builds" and using
incremental delta imports.

Edit your solrconfig.xml to add the request handler

<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
 <str name="config">data-config.xml</str>
</lst>
</requestHandler>



The data-config.xml file contains the following.

<dataConfig>
 <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/dbname" user="user-name" password="password"/>
 <document>
  <entity name="name" query="select id,name,desc from mytable">
    <field column="id" name="solr_id"/>
    <field column="name" name="solr_name"/>
    <field column="desc" name="solr_desc"/>
    <entity name="inner"
         query="select details from another_table where id ='${outer.id}'">
        <field column="details" name="solr_details"/>
    </entity>
  </entity>
 </document>
</dataConfig>


Run the full-import command to index the entire database.
http://localhost:8983/solr/dataimport?command=full-import

Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/dataimport?command=delta-import


Building Reusable search engine with Solr using multi-core

Multiple cores let you have a single Solr instance with separate configurations and
indexes, with their own configuration and schema for very different applications, but still
have the convenience of unified administration. Individual indexes are still fairly isolated,
but you can manage them as a single application, create new indexes on the fly by

Research by: Collabor Labs                                                                         Page 3
May 2011                                                   All trademarks belong to their respective owners
spinning up new SolrCores, and even make one SolrCore replace another SolrCore
without ever restarting your Servlet Container.

Edit the solr.xml and write a snippet. See example below.
<solr persistent="true" sharedLib="lib">
<cores adminPath="/admin/cores">
 <core name="application1" instanceDir="app1">
  <property name="dataDir" value="/app1/data" />
  <property name="configName" value="/app1/config.xml" />
  <property name="schemaName" value="/app1/schema.xml" />
 </core>
 <core name="application2" instanceDir="app2" />
</cores>
</solr>


Run the full-import command to index the entire database in application1.
http://localhost:8983/solr/application1/dataimport?command=full-import

Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/application1/dataimport?command=delta-import

Run the full-import command to index the entire database in application2.
http://localhost:8983/solr/application2/dataimport?command=full-import

Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/application2/dataimport?command=delta-import

Searching for indexes

http://localhost:8983/solr/application1/select/?q=searchterm returns xml file with
results.

We can reuse single Solr installation to multiple enterprise search implementations.


References:

    1.   http://lucene.apache.org/solr/
    2.   Wikipedia pages -- Apache Solr
                          -




For more information, contact: info@collabor.com




Research by: Collabor Labs                                                                          Page 4
May 2011                                                    All trademarks belong to their respective owners

More Related Content

More from Collabor Inc.

More from Collabor Inc. (10)

Beyond CRM - Collabor's Customer Engagement & Insights Software
Beyond CRM - Collabor's Customer Engagement & Insights SoftwareBeyond CRM - Collabor's Customer Engagement & Insights Software
Beyond CRM - Collabor's Customer Engagement & Insights Software
 
Collabor Tech Talk - Data Encryption 101
Collabor Tech Talk - Data Encryption 101Collabor Tech Talk - Data Encryption 101
Collabor Tech Talk - Data Encryption 101
 
Datasheet wondercrowds
Datasheet wondercrowdsDatasheet wondercrowds
Datasheet wondercrowds
 
The Cloud OS battle
The Cloud OS battleThe Cloud OS battle
The Cloud OS battle
 
Case study mywhitecoat.com
Case study mywhitecoat.comCase study mywhitecoat.com
Case study mywhitecoat.com
 
Whitepaper - Building a collaboration beehive
Whitepaper - Building a collaboration beehiveWhitepaper - Building a collaboration beehive
Whitepaper - Building a collaboration beehive
 
Work 3.0 Datasheet
Work 3.0 DatasheetWork 3.0 Datasheet
Work 3.0 Datasheet
 
Case-study FFC
Case-study FFCCase-study FFC
Case-study FFC
 
Beyond CRM - Customer Lifecycle Management
Beyond CRM - Customer Lifecycle ManagementBeyond CRM - Customer Lifecycle Management
Beyond CRM - Customer Lifecycle Management
 
Making flash work on iPhone & iPad April 2011
Making flash work on iPhone & iPad April 2011Making flash work on iPhone & iPad April 2011
Making flash work on iPhone & iPad April 2011
 

Recently uploaded

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Bringing reusability to enterprise search

  • 1. Bringing Reusability to Enterprise Search Using Solr for building reusable enterprise search engine. A Collabor Labs Technology Paper, May 2011 This whitepaper discusses the high level technical aspects of using solr to bring re usability in enterprise search implementation Brahmaji Pusuluri Sr. Software Engineer www.collabor.com info@collabor.com
  • 2. Using Lucene for enterprise search Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is suitable for nearly any application that requires full-text search. Solr - the reusable enterprise search engine Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. HTTP request processing for indexing and querying documents. Thus, you can have an application anywhere query and index files over the Internet via XML over HTTP using the URL of your Solr search server. It is also a highly optimized search server with caching and replication to other Solr search servers. It has the powerful feature of indexing Rich text documents (e.g.: word, pdf, etc.) Working with Solr Once Solr is installed successfully, we need to modify the following files as per the project requirements. Solrconfig.xml: Solrconfig.xml solrconfig.xml is the file that contains most of the parameters for configuring Solr itself. Schema.xml: Schema.xml The schema.xml file contains all of the details about which fields your documents can contain, and how those fields should be dealt with when adding documents to the index, or when querying those fields. Once the settings are done you can send an xml file to the Solr to index the data by using curl. curl http://localhost:8983/solr/update -F stream. file=/tmp/example.xml example.xml file containing the tags format which is defined in schema.xml. Research by: Collabor Labs Page 2 May 2011 All trademarks belong to their respective owners
  • 3. Index a DB table directly into Solr using DataImportHandler Most applications store data in relational databases or XML files and searching over such data is a common use-case. The DataImportHandler is a Solr contrib that provides a configuration driven way to import this data into Solr in both "full builds" and using incremental delta imports. Edit your solrconfig.xml to add the request handler <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler> The data-config.xml file contains the following. <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/dbname" user="user-name" password="password"/> <document> <entity name="name" query="select id,name,desc from mytable"> <field column="id" name="solr_id"/> <field column="name" name="solr_name"/> <field column="desc" name="solr_desc"/> <entity name="inner" query="select details from another_table where id ='${outer.id}'"> <field column="details" name="solr_details"/> </entity> </entity> </document> </dataConfig> Run the full-import command to index the entire database. http://localhost:8983/solr/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/dataimport?command=delta-import Building Reusable search engine with Solr using multi-core Multiple cores let you have a single Solr instance with separate configurations and indexes, with their own configuration and schema for very different applications, but still have the convenience of unified administration. Individual indexes are still fairly isolated, but you can manage them as a single application, create new indexes on the fly by Research by: Collabor Labs Page 3 May 2011 All trademarks belong to their respective owners
  • 4. spinning up new SolrCores, and even make one SolrCore replace another SolrCore without ever restarting your Servlet Container. Edit the solr.xml and write a snippet. See example below. <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores"> <core name="application1" instanceDir="app1"> <property name="dataDir" value="/app1/data" /> <property name="configName" value="/app1/config.xml" /> <property name="schemaName" value="/app1/schema.xml" /> </core> <core name="application2" instanceDir="app2" /> </cores> </solr> Run the full-import command to index the entire database in application1. http://localhost:8983/solr/application1/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/application1/dataimport?command=delta-import Run the full-import command to index the entire database in application2. http://localhost:8983/solr/application2/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/application2/dataimport?command=delta-import Searching for indexes http://localhost:8983/solr/application1/select/?q=searchterm returns xml file with results. We can reuse single Solr installation to multiple enterprise search implementations. References: 1. http://lucene.apache.org/solr/ 2. Wikipedia pages -- Apache Solr - For more information, contact: info@collabor.com Research by: Collabor Labs Page 4 May 2011 All trademarks belong to their respective owners