SlideShare une entreprise Scribd logo
1  sur  23
Apache Solr Technical Document
Contents
Requirements................................................................................................................................................3
Solution - Solr................................................................................................................................................3
Features ....................................................................................................................................................3
Typical Solr Setup Diagram .......................................................................................................................4
Basic Solr Concepts ...................................................................................................................................4
1. Indexing.............................................................................................................................................4
2. How Solr represents data..................................................................................................................5
Installing Solr.............................................................................................................................................7
Starting Solr...............................................................................................................................................7
Indexing Data............................................................................................................................................7
Searching...................................................................................................................................................8
Faceting.................................................................................................................................................9
Highlighting.........................................................................................................................................10
Spell Checking .....................................................................................................................................10
Relevance............................................................................................................................................10
Shutdown................................................................................................................................................10
Screen Shots............................................................................................................................................11
Apache SolrCloud........................................................................................................................................15
Features ..................................................................................................................................................15
Simple two shard cluster.........................................................................................................................15
Dealing with high volume of data...........................................................................................................18
Dealing with failure.................................................................................................................................19
Synchronization of data (added/updated in DB) with Solr.....................................................................20
Limitations ..............................................................................................................................................20
Screen Shots............................................................................................................................................21
Integration with .Net using SolrNet........................................................................................................23
Requirements
a. Fast and full text search capabilities
b. Optimization of huge data on web traffic
c. Highly and linearly scalable on demand
d. Plug with any platform
e. Near real time search and indexing
f. Flexible and Adaptable with XML,JSON,CSV configuration
Solution - Solr
Solr is a standalone enterprise search server with a REST-like API. You put documents in it
(called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and
receive XML, JSON, CSV or binary results.
Features
 Advanced Full-Text Search Capabilities
 Optimized for High Volume Web Traffic
 Standards Based Open Interfaces - XML, JSON and HTTP
 Comprehensive HTML Administration Interfaces
 Linearly scalable, auto index replication, auto failover and recovery
 Near Real-time indexing
 Flexible and Adaptable with XML configuration
 Extensible Plugin Architecture
 Easily manage multilingual support
Typical Solr Setup Diagram
Figure 1 Typical Solr Setup Diagram
Basic Solr Concepts
In this document, we'll cover the basics of what you need to know about Solr in order to use it.
1. Indexing
Solr is able to achieve fast search responses because, instead of searching the text directly, it
searches an index instead.
This is like retrieving pages in a book related to a keyword by scanning the index at the back of
a book, as opposed to searching every word of every page of the book.
This type of index is called an inverted index, because it inverts a page-centric data structure
(page->words) to a keyword-centric data structure (word->pages).
Solr stores this index in a directory called index in the data directory.
2. How Solr represents data
In Solr, a Document is the unit of search and index.
An index consists of one or more Documents, and a Document consists of one or more Fields.
Schema
Before adding documents to Solr, you need to specify the schema, represented in a file
called schema.xml. It is not advisable to change the schema after documents have been added
to the index.
The schema declares:
o what kinds of fields there are
o which field should be used as the unique/primary key
o which fields are required
o how to index and search each field
Field Types
In Solr, every field has a type.
Examples of basic field types available in Solr include:
o float
o long
o double
o date
o text
Defining a field
Here's what a field declaration looks like:
<field name="id" type="text" indexed="true" stored="true"multiValued="true"/>
o name: Name of the field
o type: Field type
o indexed: this field be added to the inverted index
o stored: the original value of this field be stored
o multivalued: this field have multiple values
The indexed and stored attributes are important.
Analysis
When data is added to Solr, it goes through a series of transformations before being added to
the index. This is called the analysis phase. Examples of transformations include lower-casing,
removing word stems etc. The end result of the analysis is a series of tokens which are then
added to the index. Tokens, not the original text, are what are searched when you perform a
search query.
Indexed fields are fields which undergo an analysis phase, and are added to the index.
Term Storage
When we displaying search results to users, they generally expect to see the original document,
not the machine-processed token.
That's the purpose of the stored attribute to tell Solr to store the original text in the index
somewhere.
Sometimes, there are fields which aren't searched, but need to display in the search results.
You accomplish that by setting the field attributes to stored=true and indexed=false.
So, why wouldn't you store all the fields all the time?
Because storing fields increases the size of the index, and the larger the index, the slower the
search. In terms of physical computing, we'd say that a larger index requires more disk seeks to
get to the same amount of data.
Installing Solr
You should also have JDK 5 or above installed.
Begin by unziping the Solr release and changing your working directory to be the "example"
directory.
unzip –q apache-solr-4.1.0.zip
cd apache-solr-4.1.0/example/
Starting Solr
Solr comes with an example directory which contains some sample files we can use.
We start this example server with java -jar start.jar.
cd example
java -jar start.jar
You should see something like this in the terminal.
2011-10-02 05:20:27.120:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2011-10-02 05:20:27.212:INFO::jetty-6.1-SNAPSHOT
....
2011-10-02 05:18:27.645:INFO::Started SocketConnector@0.0.0.0:8983
Solr is now running! You can now access the Solr Admin webapp by loading
http://localhost:8983/solr/admin/ in your web browser.
Indexing Data
We're now going to add some sample data to our Solr instance.
The exampledocs folder contains some XML files we can posting them from the command line
cd exampledocs
java -jar post.jar solr.xml monitor.xml
That produces:
SimplePostTool: POSTing files to http://localhost:8983/solr/update.
SimplePostTool: POSTing file solr.xml
SimplePostTool: POSTing file monitor.xml
SimplePostTool: COMMITting Solr index changes.
This response tells us that the POST operation was successful.
You can also index all of the sample data, using the following command (assuming your
command line shell supports the *.xml notation):
cd exampledocs
java -jar post.jar *.xml
Searching
Let's see if we can retrieve the document we just added below URL on browser.
Since Solr accepts HTTP requests, you can use your web browser to communicate with
Solr: http://localhost:8983/solr/select?q=*:*&wt=json
This returns the following JSON result:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"wt": "json",
"q": "*:*"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "3007WFP",
"name": "Dell Widescreen UltraSharp 3007WFP",
"manu": "Dell, Inc.",
"includes": "USB cable",
"weight": 401.6,
"price": 2199,
"popularity": 6,
"inStock": true,
"store": "43.17614,-90.57341",
"cat": [
"electronics",
"monitor"
],
"features": [
"30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast"
]
}
]
}
}
Faceting
Faceting is the arrangement of search results into categories based on indexed terms. Searchers
are presented with the indexed terms along with numerical counts of how many matching
documents were found were each term. Faceting makes it easy for users to explore search
results, narrowing in on exactly the results they are looking for.
Highlighting
Highlighting in Solr allows fragments of documents that match the user's query to be included
with the query response. The fragments are included in a special section of the response
(the highlighting section), and the client uses the formatting clues also included to determine
how to present the snippets to users.
Spell Checking
The Spellcheck component is designed to provide inline query suggestions based on other,
similar, terms.
Relevance
Relevance is the degree to which a query response satisfies a user who is searching for
information.
The relevance of a query response depends on the context in which the query was performed.
A single search application may be used in different contexts by users with different needs and
expectations. For example, a search engine of climate data might be used by a university
researcher studying long-term climate trends, a farmer interested in calculating the likely date
of the last frost of spring, a civil engineer interested in rainfall patterns and the frequency of
floods, and a college student planning a vacation to a region and wondering what to pack.
Because the motivations of these users vary, the relevance of any particular response to a
query will vary as well.
Shutdown
To shut down Solr, from the terminal where you launched Solr, hit Ctrl+C. This will shut down
Solr cleanly.
Link: http://lucene.apache.org/solr/3_6_2/doc-files/tutorial.html
http://www.solrtutorial.com/
https://cwiki.apache.org/confluence/display/solr/
Screen Shots
Figure 2 Solr Admin UI-Dashboard Screen
Figure 3 Solr Admin UI-Collection Detail Screen
Figure 4 Solr Admin UI-Query Result Screen
Figure 5 Solr Admin UI-Fetching Data from Database Using DataImportHandler
Figure 6 Solr Admin UI-Schema.xml Screen
Figure 7 Solr Admin UI-SolrConfig.xml Screen
Figure 8 Solr Admin UI-Core Admin Detail Screen
Figure 9 Solr Admin UI-Java Properties Screen
Apache SolrCloud
SolrCloud is the name of a set of new distributed capabilities in Solr. Passing parameters to
enable these capabilities will enable you to set up a highly available, fault tolerant cluster of
Solr servers. Use SolrCloud when you want high scale, fault tolerant, distributed indexing and
search capabilities.
Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination -
think of it as a distributed filesystem that contains information about all of the Solr servers.
Note: reset all configurations and remove documents from the tutorial before going through
the cloud features.
Features
 Centralized Apache ZooKeeper based configuration
 Automated distributed indexing/sharding - send documents to any node and it will be
forwarded to correct shard
 Near Real-Time indexing
 Transaction log ensures no updates are lost even if the documents are not yet indexed to
disk
 Automated query failover, index leader election and recovery in case of failure
 No single point of failure
Simple two shard cluster
Figure 10 Simple Two Shard Cluster Image
This example simply creates a cluster consisting of two solr servers representing two different
shards of a collection.
Since we'll need two solr servers for this example, simply make a copy of the example directory
for the second server -- making sure you don't have any data already indexed.
rm -r example/solr/collection1/data/*
cp -r example example2
This command starts up a Solr server and bootstraps a new solr cluster.
cd example
java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -
DnumShards=2 -jar start.jar
 -DzkRun causes an embedded zookeeper server to be run as part of this Solr server.
 -Dbootstrap_confdir=./solr/collection1/conf, this parameter causes the local
configuration directory ./solr/conf to be uploaded as the "myconf" config. The name
"myconf" is taken from the "collection.configName" param below.
 -Dcollection.configName=myconf sets the config to use for the new collection.
 -DnumShards=2 the number of logical partitions we plan on splitting the index into.
Browse to http://localhost:8983/solr/#/~cloud to see the state of the cluster (the zookeeper
distributed filesystem).
You can see from the zookeeper browser that the Solr configuration files were uploaded under
"myconf", and that a new document collection called "collection1" was created. Under
collection1 is a list of shards, the pieces that make up the complete collection.
Now we want to start up our second server - it will automatically be assigned to shard2 because
we don't explicitly set the shard id.
Then start the second server, pointing it at the cluster:
cd example2
java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
 -Djetty.port=7574 is just one way to tell the Jetty servlet container to use a different
port.
 -DzkHost=localhost: 9983 points to the Zookeeper ensemble containing the cluster
state. In this example we're running a single Zookeeper server embedded in the first Solr
server. By default, an embedded Zookeeper server runs at the Solr port plus 1000, so
9983.
If you refresh the zookeeper browser, you should now see both shard1 and shard2 in
collection1. View http://localhost:8983/solr/#/~cloud.
Next, index some documents.
cd exampledocs
java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar ipod_video.xml
java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar monitor.xml
java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar mem.xml
And now, a request to either server results in a distributed search that covers the entire
collection:
http://localhost:8983/solr/collection1/select?q=*:*
If at any point you wish to start over fresh or experiment with different configurations, you can
delete all of the cloud state contained within zookeeper by simply deleting the solr/zoo_data
directory after shutting down the servers.
Dealing with high volume of data
Solution: If the data volume goes high then creating more shards or splitting shard with
physical memory and storage in existing cluster cloud environment.
Figure 11 Creating Shard and Replica when volume goes high
Link: http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-from-
500000-volumes-5-million-volumes-and-beyond
Dealing with failure
Solution:
a. Failure of zookeeper: To avoid failure keeping zookeeper in two separate server so
if one goes down then other can work because zookeeper has maintain all the
cluster state and configuration information .
b. Failure of Solr shard: We can create the replica of each shard so if one shard goes
down then replica can do our job.
Figure 12 Diagram which handling failure scenario
Link:
https://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_a
nd_zookeeper_ensemble
Synchronization of data (added/updated in DB) with Solr
Solution:
a. We can create the cron job which can fetch data from database and updating
index in Solr.
b. Another option is that as and when data is added/update in frontend, after
inserting/updating data in database from business layer, we can add piece of code
which can add/update data using update Solr APIs (as we have integration with
.net we can use SolrNet library which provides such addition/updation APIs).
Link: http://wiki.apache.org/solr/DataImportHandler#Scheduling
http://stackoverflow.com/questions/6463844/how-to-index-data-in-solr-from-database-
automatically
Limitations
1. No more than 50 to 100 million documents per node.
2. No more than 250 fields per document.
3. No more than 250K characters per document.
4. No more than 25 faceted fields.
5. No more than 32 nodes in your SolrCloud cluster.
6. Don't return more than 250 results on a query.
A major driving factor for Solr performance is RAM. Solr requires sufficient memory for two
separate things: One is the Java heap, the other is "free" memory for the OS disk cache.
It is strongly recommended that Solr runs on a 64-bit Java. A 64-bit Java requires a 64-bit
operating system, and a 64-bit operating system requires a 64-bit CPU. There's nothing wrong
with 32-bit software or hardware, but a 32-bit Java is limited to a 2GB heap, which can result in
artificial limitations that don't exist with a larger heap.
Link: http://lucene.472066.n3.nabble.com/Solr-limitations-td4076250.html
https://wiki.apache.org/solr/SolrPerformanceProblems
Screen Shots
Figure 13 Solr Admin UI-Cloud Screen
Figure 14 Solr Admin UI-Zookeeper maintains Cluster State Information that is shown in Tree Screen
Figure 15 Solr Admin UI-Cloud Graph Screen
Figure 16 Solr Admin UI-Cluster Information Screen
Integration with .Net using SolrNet
Solr exposes REST apis which can be used for interacting with Solr, however it needs serialization in
converting documents retuned as search result to fill in actual object container. Solrnet is .Net library for
interacting with Solr. It provides convenient and easy apis to search, add, update data in Solr. Further
information on SolrNet is available at https://github.com/mausch/SolrNet
Figure 17 Integration with .Net

Contenu connexe

Tendances

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solrNet7
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLucidworks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsOpenSource Connections
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr introduction
Solr introductionSolr introduction
Solr introductionLap Tran
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 

Tendances (20)

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Basic Oracle Usage v1
Basic Oracle Usage v1Basic Oracle Usage v1
Basic Oracle Usage v1
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 
24sax
24sax24sax
24sax
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
 

Similaire à Apache solr tech doc

Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdfAbanti Aazmin
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction Sajindbg Dbg
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Applyinga blockcentricapproach
Applyinga blockcentricapproachApplyinga blockcentricapproach
Applyinga blockcentricapproachoracle documents
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )'Moinuddin Ahmed
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 
Getting Started with Solr
Getting Started with SolrGetting Started with Solr
Getting Started with SolrTravis Carlson
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1 GokulD
 
Oracle sql quick reference
Oracle sql quick referenceOracle sql quick reference
Oracle sql quick referencemaddy9055
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using SolrStefano Bargioni
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdutionXuan-Chao Huang
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience ReportNetcetera
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 

Similaire à Apache solr tech doc (20)

Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Applyinga blockcentricapproach
Applyinga blockcentricapproachApplyinga blockcentricapproach
Applyinga blockcentricapproach
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
Getting Started with Solr
Getting Started with SolrGetting Started with Solr
Getting Started with Solr
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
Oracle sql quick reference
Oracle sql quick referenceOracle sql quick reference
Oracle sql quick reference
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Apache solr
Apache solrApache solr
Apache solr
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience Report
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 

Dernier

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Apache solr tech doc

  • 2. Contents Requirements................................................................................................................................................3 Solution - Solr................................................................................................................................................3 Features ....................................................................................................................................................3 Typical Solr Setup Diagram .......................................................................................................................4 Basic Solr Concepts ...................................................................................................................................4 1. Indexing.............................................................................................................................................4 2. How Solr represents data..................................................................................................................5 Installing Solr.............................................................................................................................................7 Starting Solr...............................................................................................................................................7 Indexing Data............................................................................................................................................7 Searching...................................................................................................................................................8 Faceting.................................................................................................................................................9 Highlighting.........................................................................................................................................10 Spell Checking .....................................................................................................................................10 Relevance............................................................................................................................................10 Shutdown................................................................................................................................................10 Screen Shots............................................................................................................................................11 Apache SolrCloud........................................................................................................................................15 Features ..................................................................................................................................................15 Simple two shard cluster.........................................................................................................................15 Dealing with high volume of data...........................................................................................................18 Dealing with failure.................................................................................................................................19 Synchronization of data (added/updated in DB) with Solr.....................................................................20 Limitations ..............................................................................................................................................20 Screen Shots............................................................................................................................................21 Integration with .Net using SolrNet........................................................................................................23
  • 3. Requirements a. Fast and full text search capabilities b. Optimization of huge data on web traffic c. Highly and linearly scalable on demand d. Plug with any platform e. Near real time search and indexing f. Flexible and Adaptable with XML,JSON,CSV configuration Solution - Solr Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results. Features  Advanced Full-Text Search Capabilities  Optimized for High Volume Web Traffic  Standards Based Open Interfaces - XML, JSON and HTTP  Comprehensive HTML Administration Interfaces  Linearly scalable, auto index replication, auto failover and recovery  Near Real-time indexing  Flexible and Adaptable with XML configuration  Extensible Plugin Architecture  Easily manage multilingual support
  • 4. Typical Solr Setup Diagram Figure 1 Typical Solr Setup Diagram Basic Solr Concepts In this document, we'll cover the basics of what you need to know about Solr in order to use it. 1. Indexing Solr is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. This is like retrieving pages in a book related to a keyword by scanning the index at the back of a book, as opposed to searching every word of every page of the book. This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages). Solr stores this index in a directory called index in the data directory.
  • 5. 2. How Solr represents data In Solr, a Document is the unit of search and index. An index consists of one or more Documents, and a Document consists of one or more Fields. Schema Before adding documents to Solr, you need to specify the schema, represented in a file called schema.xml. It is not advisable to change the schema after documents have been added to the index. The schema declares: o what kinds of fields there are o which field should be used as the unique/primary key o which fields are required o how to index and search each field Field Types In Solr, every field has a type. Examples of basic field types available in Solr include: o float o long o double o date o text Defining a field Here's what a field declaration looks like: <field name="id" type="text" indexed="true" stored="true"multiValued="true"/> o name: Name of the field o type: Field type o indexed: this field be added to the inverted index
  • 6. o stored: the original value of this field be stored o multivalued: this field have multiple values The indexed and stored attributes are important. Analysis When data is added to Solr, it goes through a series of transformations before being added to the index. This is called the analysis phase. Examples of transformations include lower-casing, removing word stems etc. The end result of the analysis is a series of tokens which are then added to the index. Tokens, not the original text, are what are searched when you perform a search query. Indexed fields are fields which undergo an analysis phase, and are added to the index. Term Storage When we displaying search results to users, they generally expect to see the original document, not the machine-processed token. That's the purpose of the stored attribute to tell Solr to store the original text in the index somewhere. Sometimes, there are fields which aren't searched, but need to display in the search results. You accomplish that by setting the field attributes to stored=true and indexed=false. So, why wouldn't you store all the fields all the time? Because storing fields increases the size of the index, and the larger the index, the slower the search. In terms of physical computing, we'd say that a larger index requires more disk seeks to get to the same amount of data.
  • 7. Installing Solr You should also have JDK 5 or above installed. Begin by unziping the Solr release and changing your working directory to be the "example" directory. unzip –q apache-solr-4.1.0.zip cd apache-solr-4.1.0/example/ Starting Solr Solr comes with an example directory which contains some sample files we can use. We start this example server with java -jar start.jar. cd example java -jar start.jar You should see something like this in the terminal. 2011-10-02 05:20:27.120:INFO::Logging to STDERR via org.mortbay.log.StdErrLog 2011-10-02 05:20:27.212:INFO::jetty-6.1-SNAPSHOT .... 2011-10-02 05:18:27.645:INFO::Started SocketConnector@0.0.0.0:8983 Solr is now running! You can now access the Solr Admin webapp by loading http://localhost:8983/solr/admin/ in your web browser. Indexing Data We're now going to add some sample data to our Solr instance. The exampledocs folder contains some XML files we can posting them from the command line cd exampledocs java -jar post.jar solr.xml monitor.xml
  • 8. That produces: SimplePostTool: POSTing files to http://localhost:8983/solr/update. SimplePostTool: POSTing file solr.xml SimplePostTool: POSTing file monitor.xml SimplePostTool: COMMITting Solr index changes. This response tells us that the POST operation was successful. You can also index all of the sample data, using the following command (assuming your command line shell supports the *.xml notation): cd exampledocs java -jar post.jar *.xml Searching Let's see if we can retrieve the document we just added below URL on browser. Since Solr accepts HTTP requests, you can use your web browser to communicate with Solr: http://localhost:8983/solr/select?q=*:*&wt=json This returns the following JSON result: { "responseHeader": { "status": 0, "QTime": 0, "params": { "wt": "json", "q": "*:*" } }, "response": {
  • 9. "numFound": 1, "start": 0, "docs": [ { "id": "3007WFP", "name": "Dell Widescreen UltraSharp 3007WFP", "manu": "Dell, Inc.", "includes": "USB cable", "weight": 401.6, "price": 2199, "popularity": 6, "inStock": true, "store": "43.17614,-90.57341", "cat": [ "electronics", "monitor" ], "features": [ "30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast" ] } ] } } Faceting Faceting is the arrangement of search results into categories based on indexed terms. Searchers are presented with the indexed terms along with numerical counts of how many matching documents were found were each term. Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for.
  • 10. Highlighting Highlighting in Solr allows fragments of documents that match the user's query to be included with the query response. The fragments are included in a special section of the response (the highlighting section), and the client uses the formatting clues also included to determine how to present the snippets to users. Spell Checking The Spellcheck component is designed to provide inline query suggestions based on other, similar, terms. Relevance Relevance is the degree to which a query response satisfies a user who is searching for information. The relevance of a query response depends on the context in which the query was performed. A single search application may be used in different contexts by users with different needs and expectations. For example, a search engine of climate data might be used by a university researcher studying long-term climate trends, a farmer interested in calculating the likely date of the last frost of spring, a civil engineer interested in rainfall patterns and the frequency of floods, and a college student planning a vacation to a region and wondering what to pack. Because the motivations of these users vary, the relevance of any particular response to a query will vary as well. Shutdown To shut down Solr, from the terminal where you launched Solr, hit Ctrl+C. This will shut down Solr cleanly. Link: http://lucene.apache.org/solr/3_6_2/doc-files/tutorial.html http://www.solrtutorial.com/ https://cwiki.apache.org/confluence/display/solr/
  • 11. Screen Shots Figure 2 Solr Admin UI-Dashboard Screen Figure 3 Solr Admin UI-Collection Detail Screen
  • 12. Figure 4 Solr Admin UI-Query Result Screen Figure 5 Solr Admin UI-Fetching Data from Database Using DataImportHandler
  • 13. Figure 6 Solr Admin UI-Schema.xml Screen Figure 7 Solr Admin UI-SolrConfig.xml Screen
  • 14. Figure 8 Solr Admin UI-Core Admin Detail Screen Figure 9 Solr Admin UI-Java Properties Screen
  • 15. Apache SolrCloud SolrCloud is the name of a set of new distributed capabilities in Solr. Passing parameters to enable these capabilities will enable you to set up a highly available, fault tolerant cluster of Solr servers. Use SolrCloud when you want high scale, fault tolerant, distributed indexing and search capabilities. Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination - think of it as a distributed filesystem that contains information about all of the Solr servers. Note: reset all configurations and remove documents from the tutorial before going through the cloud features. Features  Centralized Apache ZooKeeper based configuration  Automated distributed indexing/sharding - send documents to any node and it will be forwarded to correct shard  Near Real-Time indexing  Transaction log ensures no updates are lost even if the documents are not yet indexed to disk  Automated query failover, index leader election and recovery in case of failure  No single point of failure Simple two shard cluster Figure 10 Simple Two Shard Cluster Image
  • 16. This example simply creates a cluster consisting of two solr servers representing two different shards of a collection. Since we'll need two solr servers for this example, simply make a copy of the example directory for the second server -- making sure you don't have any data already indexed. rm -r example/solr/collection1/data/* cp -r example example2 This command starts up a Solr server and bootstraps a new solr cluster. cd example java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun - DnumShards=2 -jar start.jar  -DzkRun causes an embedded zookeeper server to be run as part of this Solr server.  -Dbootstrap_confdir=./solr/collection1/conf, this parameter causes the local configuration directory ./solr/conf to be uploaded as the "myconf" config. The name "myconf" is taken from the "collection.configName" param below.  -Dcollection.configName=myconf sets the config to use for the new collection.  -DnumShards=2 the number of logical partitions we plan on splitting the index into. Browse to http://localhost:8983/solr/#/~cloud to see the state of the cluster (the zookeeper distributed filesystem). You can see from the zookeeper browser that the Solr configuration files were uploaded under "myconf", and that a new document collection called "collection1" was created. Under collection1 is a list of shards, the pieces that make up the complete collection. Now we want to start up our second server - it will automatically be assigned to shard2 because we don't explicitly set the shard id. Then start the second server, pointing it at the cluster: cd example2 java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar  -Djetty.port=7574 is just one way to tell the Jetty servlet container to use a different port.
  • 17.  -DzkHost=localhost: 9983 points to the Zookeeper ensemble containing the cluster state. In this example we're running a single Zookeeper server embedded in the first Solr server. By default, an embedded Zookeeper server runs at the Solr port plus 1000, so 9983. If you refresh the zookeeper browser, you should now see both shard1 and shard2 in collection1. View http://localhost:8983/solr/#/~cloud. Next, index some documents. cd exampledocs java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar ipod_video.xml java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar monitor.xml java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar mem.xml And now, a request to either server results in a distributed search that covers the entire collection: http://localhost:8983/solr/collection1/select?q=*:* If at any point you wish to start over fresh or experiment with different configurations, you can delete all of the cloud state contained within zookeeper by simply deleting the solr/zoo_data directory after shutting down the servers.
  • 18. Dealing with high volume of data Solution: If the data volume goes high then creating more shards or splitting shard with physical memory and storage in existing cluster cloud environment. Figure 11 Creating Shard and Replica when volume goes high Link: http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-from- 500000-volumes-5-million-volumes-and-beyond
  • 19. Dealing with failure Solution: a. Failure of zookeeper: To avoid failure keeping zookeeper in two separate server so if one goes down then other can work because zookeeper has maintain all the cluster state and configuration information . b. Failure of Solr shard: We can create the replica of each shard so if one shard goes down then replica can do our job. Figure 12 Diagram which handling failure scenario Link: https://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_a nd_zookeeper_ensemble
  • 20. Synchronization of data (added/updated in DB) with Solr Solution: a. We can create the cron job which can fetch data from database and updating index in Solr. b. Another option is that as and when data is added/update in frontend, after inserting/updating data in database from business layer, we can add piece of code which can add/update data using update Solr APIs (as we have integration with .net we can use SolrNet library which provides such addition/updation APIs). Link: http://wiki.apache.org/solr/DataImportHandler#Scheduling http://stackoverflow.com/questions/6463844/how-to-index-data-in-solr-from-database- automatically Limitations 1. No more than 50 to 100 million documents per node. 2. No more than 250 fields per document. 3. No more than 250K characters per document. 4. No more than 25 faceted fields. 5. No more than 32 nodes in your SolrCloud cluster. 6. Don't return more than 250 results on a query. A major driving factor for Solr performance is RAM. Solr requires sufficient memory for two separate things: One is the Java heap, the other is "free" memory for the OS disk cache. It is strongly recommended that Solr runs on a 64-bit Java. A 64-bit Java requires a 64-bit operating system, and a 64-bit operating system requires a 64-bit CPU. There's nothing wrong with 32-bit software or hardware, but a 32-bit Java is limited to a 2GB heap, which can result in artificial limitations that don't exist with a larger heap. Link: http://lucene.472066.n3.nabble.com/Solr-limitations-td4076250.html https://wiki.apache.org/solr/SolrPerformanceProblems
  • 21. Screen Shots Figure 13 Solr Admin UI-Cloud Screen Figure 14 Solr Admin UI-Zookeeper maintains Cluster State Information that is shown in Tree Screen
  • 22. Figure 15 Solr Admin UI-Cloud Graph Screen Figure 16 Solr Admin UI-Cluster Information Screen
  • 23. Integration with .Net using SolrNet Solr exposes REST apis which can be used for interacting with Solr, however it needs serialization in converting documents retuned as search result to fill in actual object container. Solrnet is .Net library for interacting with Solr. It provides convenient and easy apis to search, add, update data in Solr. Further information on SolrNet is available at https://github.com/mausch/SolrNet Figure 17 Integration with .Net