SlideShare a Scribd company logo
1 of 18
Open Source Search
             Search:
What’s New
in Apache Solr 1.4
A Lucid Imagination
Technical White Paper
© 2009 by Lucid Imagination, Inc. under the terms of Creative Commons license, as detailed at
http://www.lucidimagination.com/Copyrights-and-Disclaimers/. Version 1.02, published 26 October 2009.
Solr, Lucene, Apachecon and their logos are trademarks of the Apache Software Foundation.




What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                        Page ii
Abstract
Apache Solr is the definitive application development implementation for Lucene, and it is the
leading open source search platform.

Solr 1.3 set a high bar for functionality, extensibility, and performance. As time marches on, Solr
committers and contributors have been hard at work engineering to make a good thing even
better.

This white paper describes the new features and improvements in the latest version, Apache Solr
1.4. In the simplest terms, Solr is now faster and better than before. Central components of Solr
have been improved to cut the time needed for processing queries and indexing documents. The
goal: to provide a powerful, versatile search application server with ever better scalability,
performance and relevancy. New features include streamlined caching, smarter handling of index
changes, faster faceting, enhanced data import capabilities, speedier numeric range queries,
duplicate detection and more.




What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                   Page iii
Table of Contents
Introduction ............................................................................................................................................................ 1
Performance Improvements............................................................................................................................. 2
   Streamlined Caching ........................................................................................................................................ 2
   Scalable Concurrent File Access .................................................................................................................. 2
   Smarter Handling of Index Changes .......................................................................................................... 3
   Faster Faceting ................................................................................................................................................... 4
   Streaming Updates for SolrJ .......................................................................................................................... 4
   What Else Is New for Solr 1.4 Performance ............................................................................................ 5
Feature Improvements ....................................................................................................................................... 5
   Solr Becomes an Omnivore ........................................................................................................................... 5
   DataImportHandler Enhancements ........................................................................................................... 6
   Smoother Replication ...................................................................................................................................... 7
   More Choices for Logging .............................................................................................................................. 8
   Multiselect Faceting ......................................................................................................................................... 9
   Speedier Range Queries .................................................................................................................................. 9
   Duplicate Detection ....................................................................................................................................... 10
   New Request Handler Components ........................................................................................................ 11
   What Else Is New with Solr 1.4 Features .............................................................................................. 11
Get Started & Resources .................................................................................................................................. 12
Next Steps ............................................................................................................................................................. 12
APPENDIX: Choosing Lucene or Solr .......................................................................................................... 13




What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                                                                                      Page iv
Introduction
Apache Solr is the definitive application development implementation for Apache Lucene,
and it is the leading open source search platform. If you imagine Lucene as a high-
performance race car engine, then Solr is all the things that make that engine usable, such
as a chassis, gas pedal, steering wheel, seat, and much more.

Solr makes it easy to develop sophisticated, fast search applications with advanced features
such as faceting. Solr builds on another open source search technology, Lucene, which
provides indexing and search technology, as well as spellchecking, hit highlighting, and
advanced processing capabilities. Both Solr and Lucene are developed at the Apache
Software Foundation.

Lucene currently ranks among the top 15 open source projects and is one of the top 5
Apache projects, with installations at over 4,000 companies. Lucene and Solr downloads
have grown nearly tenfold over the past three years; Solr is the fastest-growing Lucene
subproject. Lucene and Solr offer an attractive alternative to proprietary licensed search
and discovery software vendors.1.

Solr 1.3 set a high bar for functionality, extensibility, and performance. As time marches on,
Solr engineers have been hard at work making a good thing even better. This white paper
describes the new features and
improvements in the latest
version, Solr 1.4. In the
simplest terms, Solr is now
faster and better than before.
Central components of Solr
have been improved to cut the
time needed for processing
queries and indexing
documents. Many new features




1   See the Appendix for a discussion of when to choose Lucene or Solr.

What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                               Page 1
have been added, all with the goal of providing users with the information they want as fast
as possible.


Performance Improvements
Solr 1.4 increases Solr’s speed with numerous improvements in key areas. Some of these
enhancements are high-performance replacements for standard off-the-shelf Java platform
components. Much as a car hobbyist replaces stock parts of an engine, the architects and
programmers working on Solr have replaced crucial components to make Solr 1.4 run
faster than ever for many common operations.


Streamlined Caching
Solr caches data from its index as an optimization, because reading from memory is always
faster than reading from the file system. Over the duration of a single faceting request, the
cache might be accessed hundreds or even thousands of times. Previously, the cache
implementation was a synchronized LinkedHashMap from the Java platform API.
Solr 1.4 uses a new class, ConcurrentLRUCache, which is specifically designed to
minimize the overhead of synchronization. Anecdotal evidence suggests that this
implementation can double query throughput in some circumstances.


Scalable Concurrent File Access
In the past, Solr used the Java platform’s RandomAccessFile to read data from index
files. Reading a portion of a file involves calling seek() to find the right part of the file, and
read() to actually retrieve the data.
Multithreaded access to the same file has meant that the seek() and read() pairs must
be synchronized. If the data to be read isn’t already in the operating system cache, things
get worse: the synchronization causes all other reading threads to wait while the data is
retrieved from disk.




What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                   Page 2
The Java Nonblocking Input/Output (NIO) API offers a much better solution. NIO’s
FileChannel includes a read() method that, in essence, performs a seek() and a
read() in a single operation.
    public int read(ByteBuffer dst, long position)
Solr 1.4 uses this NIO method (via Lucene’s NIOFSDirectory) to read index files.2


Smarter Handling of Index Changes
Solr generally keeps a big pile of documents in an existing index. New documents are
periodically added, but usually the number of new documents is small compared with the
size of the index. Solr (via Lucene) stores the index as a collection of segments; as new
documents are added, most of the segments will remain unchanged.
Solr 1.4 is very much aware that, for the most part, index segments don’t change.
Consequently, Solr is much smarter about reusing unchanged segments, which results in
less memory churn, less disk access, and better performance.
                                                  reopen()


                                          Index        New index




                                          Index segments on disk




2       On Windows, the older RandomAccessFile implementation is used because of a bug in the Windows
NIO implementation.

What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                       Page 3
One example is reloading an index. Previously, the entire index was loaded again, which is
expensive in time and resources. Now, Solr 1.4 is smart enough to reuse index segments
that haven’t changed, resulting in a much more efficient reload of a modified index.
This means that adding new documents to an index and making them available comes at a
lower resource cost. The figure above illustrates the mechanism.
Many other optimizations have been made with respect to index segments. The field cache,
for example, is now split so there is one field cache per segment. Again, this results in much
more efficient processing of index updates, because the field caches for every unchanged
segment do not need to be touched.


Faster Faceting
One of Solr’s killer features is faceting, the ability to quickly narrow and drill down into
search results by categories. Solr uses UnInvertedField to keep mapping between
documents and field values so it can provide faceting information in response to queries.
For multivalued fields, Solr 1.4 includes a new implementation of UnInvertedField that
can be 50 times faster and 5 times smaller than its predecessor. Single value fields still use
either the enum or fieldcache method.


Streaming Updates for SolrJ
SolrJ is the API that Java client applications use to work with Solr. The Solr 1.4 version of
SolrJ includes an optimized implementation, StreamingUpdateSolrServer, which is
useful for indexing many documents at a time.


                                      In one simple test, the number of
                                      documents indexed per second jumped
                                      from 231 to 25,000 using the new
                                      implementation.




What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                Page 4
For bulk updates, consider switching to the new implementation. In one simple test, the
number of documents indexed per second jumped from 231 to 25,000 when using the new
implementation.


What Else Is New for Solr 1.4 Performance
In addition to these important performance enhancements in Solr 1.4, there are several
more, including:
       Binary format for updates, much more compact than XML, now available for SolrJ.
       OmitTermFreqAndPositions can be applied to a field so that Solr does not
       compute the number of terms and list of positions for that field, which saves time
       and space for nontext fields.
       Queries that don’t sort by score can eliminate scoring, which speeds up queries.
       Filters now apply before the main query, which makes queries 300% faster in some
       cases.
       New filter implementation for small results sets, so it runs smaller and faster.


Feature Improvements
Aside from performance improvements, Solr 1.4 sports a variety of great new features. As
an open source project, Solr 1.4 is largely created by the people who use it, so the new
features are the ones that the community cares about most passionately.


Solr Becomes an Omnivore
Solr can’t give you good results unless you give it good data. Normally you feed Solr XML
documents corresponding to the structure of your schema. This works fine, and if all your
data consists of XML documents, they can be fed directly to Solr or easily transformed to
the correct input.
Of course, reality is always messy. Chances are that many documents you want to include in
your Solr index are in other file formats, like PDF or Microsoft Word. Fortunately, Solr 1.4
knows how to deal with the mess.



What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                             Page 5
Solr 1.4 can now ingest these other types of documents using a feature called Solr Cell.3 Solr
Cell uses another open source project, Tika, to read documents in a variety of formats and
convert them to an XHTML stream. Solr parses the stream to produce a document, which is
then indexed.
Here are a few of the formats that Tika understands:
        •    PDF
        •    OpenDocument (OpenOffice formats)
        •    Microsoft OLE 2 Compound Document (Word, PowerPoint, Excel, Visio, etc.)
        •    HTML
        •    RTF
        •    gzip
        •    ZIP
        •    Java Archive (JAR) files


DataImportHandler Enhancements
DataImportHandler knows how to index data pulled from relational databases or XML
files. The details of what is indexed and how it happens are configured in solrconfig.xml.
Solr 1.4 contains some extremely useful upgrades to DataImportHandler.
The first is the ability to push data into DataImportHandler. In Solr 1.3,
DataImportHandler was pull-only. This meant that the only possibly way to push data
to Solr was to use the update XML or CSV format, which meant you couldn’t take advantage
of any of DataImportHandler’s capabilities. In the Solr 1.4 world, a new component
called ContentStreamDataSource allows you to use DataImportHandler’s features
for indexing content.
Another powerful enhancement in Solr 1.4 is the ability to listen for import events. All you
need to do is provide an implementation of the EventListener interface and let Solr



3        The name is based on the acronym Content Extraction Library (CEL). This feature is also known by its
more technical name ExtractingRequestHandler.

What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                                 Page 6
know about it in solrconfig.xml. When importing begins and ends, your listener will be
notified.
Solr 1.4 also brings the ability to control error handling in DataImportHandler. For
each entity, you can control what happens when an error occurs via solrconfig.xml. The
choices for error handling are as follows:
           abort : The import is stopped and all changes are rolled back.
           skip : The current document is skipped.
           continue : Import continues as if the error did not occur.
DataImportHandler contains many more enhancements and optimizations in Solr 1.4,
including new data sources, new entity processors, and new transformers.


Smoother Replication
Replication is a fancy name for making a copy of a Solr index, which at its heart is just a
matter of copying files. Making copies of an index is useful for two reasons. The first is
simply to create a backup. The second reason is to place the same index on multiple Solr
servers, which is necessary if you want to distribute incoming requests to improve
performance.
Prior to Solr 1.4, replication was implemented with shell scripts, and consequently would
only work effectively on platforms with a shell, like Linux; it relied on the Unix rsync file
utility and it relied on the OS providing hard links, which could require cumbersome
scripting, excluding tiered deployments on Windows platforms.
In Solr 1.4, replication has been abstracted and implemented entirely at the Java platform
layer, which means it will work (and work the same) wherever the Java platform runs. This
is great news for anyone using Solr because it means that backups can be performed in the
same way on a Solr instance, regardless of hardware or operating system, and it means that
configuring replication across multiple Solr instances is similarly uniform. Replication does
not require a backup and the index is copied from one live index to another.
Replication and backups are configured in solrconfig.xml. Add a couple lines if you just want
to make a backup—you can choose to backup upon Solr startup or after every commit or
optimize. In addition, you can use an http command to request a backup at any time.




What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                Page 7
If you need to replicate an index across multiple servers, the configuration is pretty simple.
Set it up on the master server’s solrconfig.xml like this:
   <requestHandler name="/replication" class="solr.ReplicationHandler">
      <lst name="master">
         <str name="replicateAfter">commit</str>
         <str name="confFiles">schema.xml,stopwords.txt</str>
      </lst>
   </requestHandler>
You can choose to replicate on startup, after commits, or after optimization. The
confFiles element specifies configuration files you want to replicate to slaves.
Once the server configuration is done, point the slaves at the master, something like this:
   <requestHandler name="/replication" class="solr.ReplicationHandler">
      <lst name="slave">
         <str name="masterUrl">
            http://masterhostname:8983/solr/replication
         </str>
         <str name="pollInterval">00:00:60</str>
      </lst>
   </requestHandler>
The slaves periodically query the master to see if the index has changed. If so, they pull
down the changes and apply them. That’s all!


More Choices for Logging
Logging is a crucial capability in a server application. Administrators examine logs to
monitor Solr instances and figure out how to make them run optimally. Up until now, Solr
used the logging facility included with the Java Development Kit (JDK).
Solr 1.4 uses a more flexible logging framework, SLF4J. SLF4J can bind to several logging
implementations, including log4j, Jakarta Commons Logging (JCL), and JDK logging. This
binding can be changed at runtime simply by switching JAR files around.



What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                Page 8
This is the best possible kind of upgrade. The default configuration, binding SLF4J to JDK
logging, provides the same functionality as previous releases of Solr. However, you now
have the option of easily plugging in log4j or JCL if you prefer.


Multiselect Faceting
Faceting is the ability to group search results by certain fields. Solr 1.4 adds support for
multiselect faceting, which is the ability to narrow search results by multiple facets.
Solr’s support is generic and includes the ability to tag filters and to exclude filters by tag
when faceting. A sample query string might look like this:
   q=index replication&facet=true
      &fq={!tag=proj}project:(lucene OR solr)
      &facet.field={!ex=proj}project
      &facet.field={!ex=src}source
To see this in action, check out the search facility that Lucid Imagination provides to search
technical knowledge resources on Solr along with Lucene and all its subprojects:
http://search.lucidimagination.com/.


Speedier Range Queries
Solr can process queries that include numeric ranges, which means it can answer questions
like “Which hats are between size 56 and 64?” and “Which swimming pools are less than 10
meters long?”
In Solr 1.4, standard range queries now use a prefix tree or trie. Numbers are placed into
the tree based on their digits, which makes range queries faster than comparing each
complete number. Thus, for example, 175 is indexed as hundreds:1 tens:17 ones:175.
The results have been observed at up to 40 times faster than standard range queries
To take advantage of fast range queries, use the TrieField type in your schema. The
implementation takes care of the details, and you will notice that range queries are
significantly faster.
The illustration below shows an Example of a Prefix Tree, where the leaves of the tree hold
the actual term values and all the descendants of a node have a common prefix associated
with the node. Bold circles mark all relevant nodes to retrieve a range from 215 to 977.

What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                   Page 9
Let’s look at another example, this time in the schema. The type attribute in the schema’s
field type declaration tells Solr which numeric type you will represent with TrieField.
Here are a few declarations that show how to use TrieField for various numeric types:
   <fieldType name="tint" class="solr.TrieField" type="integer"
       omitNorms="true"
       positionIncrementGap="0" indexed="true" stored="false" />
   <fieldType name="tlong" class="solr.TrieField" type="long"
       omitNorms="true"
       positionIncrementGap="0" indexed="true" stored="false" />
   <fieldType name="tdouble" class="solr.TrieField" type="double"
       omitNorms="true"
       positionIncrementGap="0" indexed="true" stored="false" />


Duplicate Detection
With large sets of documents to be indexed, it is important to detect documents that are
identical or nearly identical so that the document only gets added to the index once.
Solr 1.4 offers this capability, named document duplicate detection or deduplication. The
more technical name is SignatureUpdateProcessor.
SignatureUpdateProcessor creates a message digest or hash value from some or all of
the fields of a document. The hash value acts like a fingerprint for the document and can be
quickly compared to the hash values for other documents.




What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                              Page 10
Several hashing algorithms are available: MD5Signature and Lookup3Signature are
both useful for exact matching, while TextProfileSignature (from the Apache Nutch
project) is a fuzzy hashing implementation to detect documents that are nearly equivalent.


New Request Handler Components
New request handler components are now available in Solr 1.4:
       ClusteringComponent uses Carrot2 to dynamically cluster the top N search
       results, something like dynamically discovered facets.
       TermsComponent returns indexed terms and document frequency in a field, useful
       for auto-suggest, etc.
       TermVectorComponent returns term information per document (term
       frequency, positions).
       StatsComponent computes statistics on numeric fields: min, max, sum,
       sumOfSquares, count, missing, mean, stddev.


What Else Is New with Solr 1.4 Features
Solr 1.4 has many other new features. A few of them are listed here:
   •   Ranges over arbitrary functions: {!frange l=1 u=2}sqrt(sum(a,b))
   •   Nested queries, for function queries too
   •   solrjs: JavaScript client library
   •   commitWithin: doc must be committed within x milliseconds
   •   Binary field type
   •   Merge one index into another
   •   SolrJ client for load balancing and failover
   •   Field globbing for some params: hl.fl=*_text
   •   Doublemetaphone, Arabic stemmer, etc.
   •   VelocityResponseWriter: template responses using Velocity




What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                            Page 11
Get Started & Resources
http://www.lucidimagination.com/blog/2009/02/05/looking-forward-to-new-features-
in-solr-14/
http://wiki.apache.org/solr/SolrReplication
http://wiki.apache.org/solr/ExtractingRequestHandler
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-
Extraction-Tika
http://www.lucidimagination.com/blog/tag/range-queries/
http://www.slf4j.org/manual.html
http://wiki.apache.org/solr/Deduplication
http://shalinsays.blogspot.com/2009/09/whats-new-in-dataimporthandler-in-solr.html


Next Steps
For more information on how Lucid Imagination can help your employees, customers, and
partners find the information they need more quickly, effectively, and at lower cost, please
visit http://www.lucidimagination.com/ to access blog posts, articles, and reviews of
dozens of successful implementations.
Certified Distributions from Lucid Imagination are complete, supported bundles of
software which include additional bug fixes, performance enhancements, along with our
free 30-day Get Started program. Coupled with one of our support subscriptions, a Certified
Distribution can provide a complete environment to develop, deploy, and maintain
commercial-grade search applications. Certified Distributions are available at
www.lucidimagination.com/Downloads.
Please e-mail specific questions to:
Support and Service: support@lucidimagination.com
Sales and Commercial: sales@lucidimagination.com
Consulting: consulting@lucidimagination.com
Or call: 1.650.353.4057

What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                             Page 12
APPENDIX: Choosing Lucene or Solr
The great improvements in the capabilities of Lucene and Solr open source search
technology have created rapidly growing interest in using them as alternatives to other
search applications. As is often the case with open-source technology, online community
documentation provides rich details on features and variations, but does little to provide
explicit direction on which technologies would be the best choice. So when is Lucene
preferable to Solr and vice versa?
There is in fact no single answer, as Lucene and Solr bring very similar underlying
technology to bear on somewhat distinct problems. Solr is versatile and powerful, a full-
featured, production-ready search application server requiring little formal software
programming. Lucene presents a collection of directly callable Java libraries, with fine-
grained control of machine functions and independence from higher-level protocols.
In choosing which might be best for your search solution, the key questions to consider are
application scope, deployment environment, and software development preferences.
If you are new to developing search applications, you should start with Solr. Solr provides
scalable search power out of the box, whereas Lucene requires solid information retrieval
experience and some meaningful heavy lifting in Java to take advantage of its capabilities.
In many instances, Solr doesn’t even require any real programming.
Solr is essentially the “serverization” of Lucene, and many of its abstract functions are
highly similar, if not just the same. If you are building an app for the enterprise sector, for
instance, you will find Solr an almost 100% match to your business requirements: it comes
ready to run in a servlet container such as Tomcat or Jetty, and ready to scale in a
production Java environment. Its RESTful interfaces and XML-based configuration files can
greatly accelerate application development and maintenance. In fact, Lucene programmers
have often reported that they find Solr to contain “the same features I was going to build
myself as a framework for Lucene, but already very-well implemented.” Once you start
with Solr, and you find yourself using a lot of the features Solr provides out of the box, you
will likely be better off using Solr’s well-organized extension mechanisms instead of
starting from scratch using Apache Lucene.
If, on the other hand, you don’t want to make any calls via HTTP, and want to have all of
your resources controlled exclusively by Java API calls that you write, Lucene may be a
better choice. Lucene works best when constructing and embedding a state-of-the-art
search engine, allowing programmers to assemble and compile inside a native Java

What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                                Page 13
application. Some programmers set aside the convenience of Solr in order to more directly
control the large set of sophisticated features with low-level access, data, or state
manipulation, and choose Lucene instead, for example with byte-level manipulation of
segments or intervention in data I/O. Investment at the lower level enables development of
extremely sophisticated, cutting edge text search and retrieval capabilities.
As for features, the latest version of Solr generally encapsulates the latest version of
Lucene. As the two are in many ways functional siblings, spending time on gaining a solid
understanding how Lucene works internally can help you understand Apache Solr and its
extension of Lucene's workings.
No matter which you choose, the power of open source search is yours to harness. More
information on both Lucene and Solr can be found at http://www.lucidimagination.com.




What’s New in Solr 1.4
A Lucid Imagination Technical White Paper • October 2009                           Page 14

More Related Content

Viewers also liked

Impact of open source search on the intelligence community
Impact of open source search on the intelligence communityImpact of open source search on the intelligence community
Impact of open source search on the intelligence communityLucidworks (Archived)
 
Artist Update8 11
Artist Update8 11Artist Update8 11
Artist Update8 11LaRue
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Presentacion Ingles
Presentacion InglesPresentacion Ingles
Presentacion Inglestanica
 
Network Forensics Puzzle Contest に挑戦 #1
Network Forensics Puzzle Contest に挑戦 #1Network Forensics Puzzle Contest に挑戦 #1
Network Forensics Puzzle Contest に挑戦 #1彰 村地
 
Presentation
PresentationPresentation
Presentationtarodnova
 
Oslb office365
Oslb office365Oslb office365
Oslb office365彰 村地
 
Amazing grace[1]
Amazing grace[1]Amazing grace[1]
Amazing grace[1]tanica
 
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search PerformanceLucidworks (Archived)
 
Hellosong
HellosongHellosong
Hellosongtanica
 
Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2彰 村地
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Pista American Idiot
Pista American IdiotPista American Idiot
Pista American Idiottanica
 
最新ブラウザー UI 比較
最新ブラウザー UI 比較最新ブラウザー UI 比較
最新ブラウザー UI 比較彰 村地
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchLucidworks (Archived)
 
20101023 ie9 cache
20101023 ie9 cache20101023 ie9 cache
20101023 ie9 cache彰 村地
 
Cancer
CancerCancer
Cancertanica
 

Viewers also liked (20)

Impact of open source search on the intelligence community
Impact of open source search on the intelligence communityImpact of open source search on the intelligence community
Impact of open source search on the intelligence community
 
Artist Update8 11
Artist Update8 11Artist Update8 11
Artist Update8 11
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Presentacion Ingles
Presentacion InglesPresentacion Ingles
Presentacion Ingles
 
Network Forensics Puzzle Contest に挑戦 #1
Network Forensics Puzzle Contest に挑戦 #1Network Forensics Puzzle Contest に挑戦 #1
Network Forensics Puzzle Contest に挑戦 #1
 
Presentation
PresentationPresentation
Presentation
 
Oslb office365
Oslb office365Oslb office365
Oslb office365
 
Amazing grace[1]
Amazing grace[1]Amazing grace[1]
Amazing grace[1]
 
Crazy
CrazyCrazy
Crazy
 
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search Performance
 
Hellosong
HellosongHellosong
Hellosong
 
Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2Network Forensics Puzzle Contest に挑戦 #2
Network Forensics Puzzle Contest に挑戦 #2
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Pista American Idiot
Pista American IdiotPista American Idiot
Pista American Idiot
 
最新ブラウザー UI 比較
最新ブラウザー UI 比較最新ブラウザー UI 比較
最新ブラウザー UI 比較
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source Search
 
20101023 ie9 cache
20101023 ie9 cache20101023 ie9 cache
20101023 ie9 cache
 
Cancer
CancerCancer
Cancer
 
All Data Big and Small
All Data Big and SmallAll Data Big and Small
All Data Big and Small
 

Similar to What’s new in apache solr 1.4

Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdfAbanti Aazmin
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.inovex GmbH
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1YI-CHING WU
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 reportKoji Kawamura
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience ReportNetcetera
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platformLuis Cabaceira
 
Getting started with Lucidworks Enterprise
Getting started with Lucidworks EnterpriseGetting started with Lucidworks Enterprise
Getting started with Lucidworks EnterpriseLucidworks (Archived)
 

Similar to What’s new in apache solr 1.4 (20)

Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9
 
What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9
 
What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9
 
Solr 4
Solr 4Solr 4
Solr 4
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 report
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience Report
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platform
 
What’s New in Apache Lucene 3.0
What’s New in Apache Lucene 3.0What’s New in Apache Lucene 3.0
What’s New in Apache Lucene 3.0
 
What’s New in Apache Lucene 3.0
What’s New in Apache Lucene 3.0What’s New in Apache Lucene 3.0
What’s New in Apache Lucene 3.0
 
What’s new in apache lucene 3.0
What’s new in apache lucene 3.0What’s new in apache lucene 3.0
What’s new in apache lucene 3.0
 
Getting started with Lucidworks Enterprise
Getting started with Lucidworks EnterpriseGetting started with Lucidworks Enterprise
Getting started with Lucidworks Enterprise
 

More from Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucidworks (Archived)
 

More from Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

What’s new in apache solr 1.4

  • 1. Open Source Search Search: What’s New in Apache Solr 1.4 A Lucid Imagination Technical White Paper
  • 2. © 2009 by Lucid Imagination, Inc. under the terms of Creative Commons license, as detailed at http://www.lucidimagination.com/Copyrights-and-Disclaimers/. Version 1.02, published 26 October 2009. Solr, Lucene, Apachecon and their logos are trademarks of the Apache Software Foundation. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page ii
  • 3. Abstract Apache Solr is the definitive application development implementation for Lucene, and it is the leading open source search platform. Solr 1.3 set a high bar for functionality, extensibility, and performance. As time marches on, Solr committers and contributors have been hard at work engineering to make a good thing even better. This white paper describes the new features and improvements in the latest version, Apache Solr 1.4. In the simplest terms, Solr is now faster and better than before. Central components of Solr have been improved to cut the time needed for processing queries and indexing documents. The goal: to provide a powerful, versatile search application server with ever better scalability, performance and relevancy. New features include streamlined caching, smarter handling of index changes, faster faceting, enhanced data import capabilities, speedier numeric range queries, duplicate detection and more. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page iii
  • 4. Table of Contents Introduction ............................................................................................................................................................ 1 Performance Improvements............................................................................................................................. 2 Streamlined Caching ........................................................................................................................................ 2 Scalable Concurrent File Access .................................................................................................................. 2 Smarter Handling of Index Changes .......................................................................................................... 3 Faster Faceting ................................................................................................................................................... 4 Streaming Updates for SolrJ .......................................................................................................................... 4 What Else Is New for Solr 1.4 Performance ............................................................................................ 5 Feature Improvements ....................................................................................................................................... 5 Solr Becomes an Omnivore ........................................................................................................................... 5 DataImportHandler Enhancements ........................................................................................................... 6 Smoother Replication ...................................................................................................................................... 7 More Choices for Logging .............................................................................................................................. 8 Multiselect Faceting ......................................................................................................................................... 9 Speedier Range Queries .................................................................................................................................. 9 Duplicate Detection ....................................................................................................................................... 10 New Request Handler Components ........................................................................................................ 11 What Else Is New with Solr 1.4 Features .............................................................................................. 11 Get Started & Resources .................................................................................................................................. 12 Next Steps ............................................................................................................................................................. 12 APPENDIX: Choosing Lucene or Solr .......................................................................................................... 13 What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page iv
  • 5. Introduction Apache Solr is the definitive application development implementation for Apache Lucene, and it is the leading open source search platform. If you imagine Lucene as a high- performance race car engine, then Solr is all the things that make that engine usable, such as a chassis, gas pedal, steering wheel, seat, and much more. Solr makes it easy to develop sophisticated, fast search applications with advanced features such as faceting. Solr builds on another open source search technology, Lucene, which provides indexing and search technology, as well as spellchecking, hit highlighting, and advanced processing capabilities. Both Solr and Lucene are developed at the Apache Software Foundation. Lucene currently ranks among the top 15 open source projects and is one of the top 5 Apache projects, with installations at over 4,000 companies. Lucene and Solr downloads have grown nearly tenfold over the past three years; Solr is the fastest-growing Lucene subproject. Lucene and Solr offer an attractive alternative to proprietary licensed search and discovery software vendors.1. Solr 1.3 set a high bar for functionality, extensibility, and performance. As time marches on, Solr engineers have been hard at work making a good thing even better. This white paper describes the new features and improvements in the latest version, Solr 1.4. In the simplest terms, Solr is now faster and better than before. Central components of Solr have been improved to cut the time needed for processing queries and indexing documents. Many new features 1 See the Appendix for a discussion of when to choose Lucene or Solr. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 1
  • 6. have been added, all with the goal of providing users with the information they want as fast as possible. Performance Improvements Solr 1.4 increases Solr’s speed with numerous improvements in key areas. Some of these enhancements are high-performance replacements for standard off-the-shelf Java platform components. Much as a car hobbyist replaces stock parts of an engine, the architects and programmers working on Solr have replaced crucial components to make Solr 1.4 run faster than ever for many common operations. Streamlined Caching Solr caches data from its index as an optimization, because reading from memory is always faster than reading from the file system. Over the duration of a single faceting request, the cache might be accessed hundreds or even thousands of times. Previously, the cache implementation was a synchronized LinkedHashMap from the Java platform API. Solr 1.4 uses a new class, ConcurrentLRUCache, which is specifically designed to minimize the overhead of synchronization. Anecdotal evidence suggests that this implementation can double query throughput in some circumstances. Scalable Concurrent File Access In the past, Solr used the Java platform’s RandomAccessFile to read data from index files. Reading a portion of a file involves calling seek() to find the right part of the file, and read() to actually retrieve the data. Multithreaded access to the same file has meant that the seek() and read() pairs must be synchronized. If the data to be read isn’t already in the operating system cache, things get worse: the synchronization causes all other reading threads to wait while the data is retrieved from disk. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 2
  • 7. The Java Nonblocking Input/Output (NIO) API offers a much better solution. NIO’s FileChannel includes a read() method that, in essence, performs a seek() and a read() in a single operation. public int read(ByteBuffer dst, long position) Solr 1.4 uses this NIO method (via Lucene’s NIOFSDirectory) to read index files.2 Smarter Handling of Index Changes Solr generally keeps a big pile of documents in an existing index. New documents are periodically added, but usually the number of new documents is small compared with the size of the index. Solr (via Lucene) stores the index as a collection of segments; as new documents are added, most of the segments will remain unchanged. Solr 1.4 is very much aware that, for the most part, index segments don’t change. Consequently, Solr is much smarter about reusing unchanged segments, which results in less memory churn, less disk access, and better performance. reopen() Index New index Index segments on disk 2 On Windows, the older RandomAccessFile implementation is used because of a bug in the Windows NIO implementation. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 3
  • 8. One example is reloading an index. Previously, the entire index was loaded again, which is expensive in time and resources. Now, Solr 1.4 is smart enough to reuse index segments that haven’t changed, resulting in a much more efficient reload of a modified index. This means that adding new documents to an index and making them available comes at a lower resource cost. The figure above illustrates the mechanism. Many other optimizations have been made with respect to index segments. The field cache, for example, is now split so there is one field cache per segment. Again, this results in much more efficient processing of index updates, because the field caches for every unchanged segment do not need to be touched. Faster Faceting One of Solr’s killer features is faceting, the ability to quickly narrow and drill down into search results by categories. Solr uses UnInvertedField to keep mapping between documents and field values so it can provide faceting information in response to queries. For multivalued fields, Solr 1.4 includes a new implementation of UnInvertedField that can be 50 times faster and 5 times smaller than its predecessor. Single value fields still use either the enum or fieldcache method. Streaming Updates for SolrJ SolrJ is the API that Java client applications use to work with Solr. The Solr 1.4 version of SolrJ includes an optimized implementation, StreamingUpdateSolrServer, which is useful for indexing many documents at a time. In one simple test, the number of documents indexed per second jumped from 231 to 25,000 using the new implementation. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 4
  • 9. For bulk updates, consider switching to the new implementation. In one simple test, the number of documents indexed per second jumped from 231 to 25,000 when using the new implementation. What Else Is New for Solr 1.4 Performance In addition to these important performance enhancements in Solr 1.4, there are several more, including: Binary format for updates, much more compact than XML, now available for SolrJ. OmitTermFreqAndPositions can be applied to a field so that Solr does not compute the number of terms and list of positions for that field, which saves time and space for nontext fields. Queries that don’t sort by score can eliminate scoring, which speeds up queries. Filters now apply before the main query, which makes queries 300% faster in some cases. New filter implementation for small results sets, so it runs smaller and faster. Feature Improvements Aside from performance improvements, Solr 1.4 sports a variety of great new features. As an open source project, Solr 1.4 is largely created by the people who use it, so the new features are the ones that the community cares about most passionately. Solr Becomes an Omnivore Solr can’t give you good results unless you give it good data. Normally you feed Solr XML documents corresponding to the structure of your schema. This works fine, and if all your data consists of XML documents, they can be fed directly to Solr or easily transformed to the correct input. Of course, reality is always messy. Chances are that many documents you want to include in your Solr index are in other file formats, like PDF or Microsoft Word. Fortunately, Solr 1.4 knows how to deal with the mess. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 5
  • 10. Solr 1.4 can now ingest these other types of documents using a feature called Solr Cell.3 Solr Cell uses another open source project, Tika, to read documents in a variety of formats and convert them to an XHTML stream. Solr parses the stream to produce a document, which is then indexed. Here are a few of the formats that Tika understands: • PDF • OpenDocument (OpenOffice formats) • Microsoft OLE 2 Compound Document (Word, PowerPoint, Excel, Visio, etc.) • HTML • RTF • gzip • ZIP • Java Archive (JAR) files DataImportHandler Enhancements DataImportHandler knows how to index data pulled from relational databases or XML files. The details of what is indexed and how it happens are configured in solrconfig.xml. Solr 1.4 contains some extremely useful upgrades to DataImportHandler. The first is the ability to push data into DataImportHandler. In Solr 1.3, DataImportHandler was pull-only. This meant that the only possibly way to push data to Solr was to use the update XML or CSV format, which meant you couldn’t take advantage of any of DataImportHandler’s capabilities. In the Solr 1.4 world, a new component called ContentStreamDataSource allows you to use DataImportHandler’s features for indexing content. Another powerful enhancement in Solr 1.4 is the ability to listen for import events. All you need to do is provide an implementation of the EventListener interface and let Solr 3 The name is based on the acronym Content Extraction Library (CEL). This feature is also known by its more technical name ExtractingRequestHandler. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 6
  • 11. know about it in solrconfig.xml. When importing begins and ends, your listener will be notified. Solr 1.4 also brings the ability to control error handling in DataImportHandler. For each entity, you can control what happens when an error occurs via solrconfig.xml. The choices for error handling are as follows: abort : The import is stopped and all changes are rolled back. skip : The current document is skipped. continue : Import continues as if the error did not occur. DataImportHandler contains many more enhancements and optimizations in Solr 1.4, including new data sources, new entity processors, and new transformers. Smoother Replication Replication is a fancy name for making a copy of a Solr index, which at its heart is just a matter of copying files. Making copies of an index is useful for two reasons. The first is simply to create a backup. The second reason is to place the same index on multiple Solr servers, which is necessary if you want to distribute incoming requests to improve performance. Prior to Solr 1.4, replication was implemented with shell scripts, and consequently would only work effectively on platforms with a shell, like Linux; it relied on the Unix rsync file utility and it relied on the OS providing hard links, which could require cumbersome scripting, excluding tiered deployments on Windows platforms. In Solr 1.4, replication has been abstracted and implemented entirely at the Java platform layer, which means it will work (and work the same) wherever the Java platform runs. This is great news for anyone using Solr because it means that backups can be performed in the same way on a Solr instance, regardless of hardware or operating system, and it means that configuring replication across multiple Solr instances is similarly uniform. Replication does not require a backup and the index is copied from one live index to another. Replication and backups are configured in solrconfig.xml. Add a couple lines if you just want to make a backup—you can choose to backup upon Solr startup or after every commit or optimize. In addition, you can use an http command to request a backup at any time. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 7
  • 12. If you need to replicate an index across multiple servers, the configuration is pretty simple. Set it up on the master server’s solrconfig.xml like this: <requestHandler name="/replication" class="solr.ReplicationHandler"> <lst name="master"> <str name="replicateAfter">commit</str> <str name="confFiles">schema.xml,stopwords.txt</str> </lst> </requestHandler> You can choose to replicate on startup, after commits, or after optimization. The confFiles element specifies configuration files you want to replicate to slaves. Once the server configuration is done, point the slaves at the master, something like this: <requestHandler name="/replication" class="solr.ReplicationHandler"> <lst name="slave"> <str name="masterUrl"> http://masterhostname:8983/solr/replication </str> <str name="pollInterval">00:00:60</str> </lst> </requestHandler> The slaves periodically query the master to see if the index has changed. If so, they pull down the changes and apply them. That’s all! More Choices for Logging Logging is a crucial capability in a server application. Administrators examine logs to monitor Solr instances and figure out how to make them run optimally. Up until now, Solr used the logging facility included with the Java Development Kit (JDK). Solr 1.4 uses a more flexible logging framework, SLF4J. SLF4J can bind to several logging implementations, including log4j, Jakarta Commons Logging (JCL), and JDK logging. This binding can be changed at runtime simply by switching JAR files around. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 8
  • 13. This is the best possible kind of upgrade. The default configuration, binding SLF4J to JDK logging, provides the same functionality as previous releases of Solr. However, you now have the option of easily plugging in log4j or JCL if you prefer. Multiselect Faceting Faceting is the ability to group search results by certain fields. Solr 1.4 adds support for multiselect faceting, which is the ability to narrow search results by multiple facets. Solr’s support is generic and includes the ability to tag filters and to exclude filters by tag when faceting. A sample query string might look like this: q=index replication&facet=true &fq={!tag=proj}project:(lucene OR solr) &facet.field={!ex=proj}project &facet.field={!ex=src}source To see this in action, check out the search facility that Lucid Imagination provides to search technical knowledge resources on Solr along with Lucene and all its subprojects: http://search.lucidimagination.com/. Speedier Range Queries Solr can process queries that include numeric ranges, which means it can answer questions like “Which hats are between size 56 and 64?” and “Which swimming pools are less than 10 meters long?” In Solr 1.4, standard range queries now use a prefix tree or trie. Numbers are placed into the tree based on their digits, which makes range queries faster than comparing each complete number. Thus, for example, 175 is indexed as hundreds:1 tens:17 ones:175. The results have been observed at up to 40 times faster than standard range queries To take advantage of fast range queries, use the TrieField type in your schema. The implementation takes care of the details, and you will notice that range queries are significantly faster. The illustration below shows an Example of a Prefix Tree, where the leaves of the tree hold the actual term values and all the descendants of a node have a common prefix associated with the node. Bold circles mark all relevant nodes to retrieve a range from 215 to 977. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 9
  • 14. Let’s look at another example, this time in the schema. The type attribute in the schema’s field type declaration tells Solr which numeric type you will represent with TrieField. Here are a few declarations that show how to use TrieField for various numeric types: <fieldType name="tint" class="solr.TrieField" type="integer" omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" /> <fieldType name="tlong" class="solr.TrieField" type="long" omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" /> <fieldType name="tdouble" class="solr.TrieField" type="double" omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" /> Duplicate Detection With large sets of documents to be indexed, it is important to detect documents that are identical or nearly identical so that the document only gets added to the index once. Solr 1.4 offers this capability, named document duplicate detection or deduplication. The more technical name is SignatureUpdateProcessor. SignatureUpdateProcessor creates a message digest or hash value from some or all of the fields of a document. The hash value acts like a fingerprint for the document and can be quickly compared to the hash values for other documents. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 10
  • 15. Several hashing algorithms are available: MD5Signature and Lookup3Signature are both useful for exact matching, while TextProfileSignature (from the Apache Nutch project) is a fuzzy hashing implementation to detect documents that are nearly equivalent. New Request Handler Components New request handler components are now available in Solr 1.4: ClusteringComponent uses Carrot2 to dynamically cluster the top N search results, something like dynamically discovered facets. TermsComponent returns indexed terms and document frequency in a field, useful for auto-suggest, etc. TermVectorComponent returns term information per document (term frequency, positions). StatsComponent computes statistics on numeric fields: min, max, sum, sumOfSquares, count, missing, mean, stddev. What Else Is New with Solr 1.4 Features Solr 1.4 has many other new features. A few of them are listed here: • Ranges over arbitrary functions: {!frange l=1 u=2}sqrt(sum(a,b)) • Nested queries, for function queries too • solrjs: JavaScript client library • commitWithin: doc must be committed within x milliseconds • Binary field type • Merge one index into another • SolrJ client for load balancing and failover • Field globbing for some params: hl.fl=*_text • Doublemetaphone, Arabic stemmer, etc. • VelocityResponseWriter: template responses using Velocity What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 11
  • 16. Get Started & Resources http://www.lucidimagination.com/blog/2009/02/05/looking-forward-to-new-features- in-solr-14/ http://wiki.apache.org/solr/SolrReplication http://wiki.apache.org/solr/ExtractingRequestHandler http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content- Extraction-Tika http://www.lucidimagination.com/blog/tag/range-queries/ http://www.slf4j.org/manual.html http://wiki.apache.org/solr/Deduplication http://shalinsays.blogspot.com/2009/09/whats-new-in-dataimporthandler-in-solr.html Next Steps For more information on how Lucid Imagination can help your employees, customers, and partners find the information they need more quickly, effectively, and at lower cost, please visit http://www.lucidimagination.com/ to access blog posts, articles, and reviews of dozens of successful implementations. Certified Distributions from Lucid Imagination are complete, supported bundles of software which include additional bug fixes, performance enhancements, along with our free 30-day Get Started program. Coupled with one of our support subscriptions, a Certified Distribution can provide a complete environment to develop, deploy, and maintain commercial-grade search applications. Certified Distributions are available at www.lucidimagination.com/Downloads. Please e-mail specific questions to: Support and Service: support@lucidimagination.com Sales and Commercial: sales@lucidimagination.com Consulting: consulting@lucidimagination.com Or call: 1.650.353.4057 What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 12
  • 17. APPENDIX: Choosing Lucene or Solr The great improvements in the capabilities of Lucene and Solr open source search technology have created rapidly growing interest in using them as alternatives to other search applications. As is often the case with open-source technology, online community documentation provides rich details on features and variations, but does little to provide explicit direction on which technologies would be the best choice. So when is Lucene preferable to Solr and vice versa? There is in fact no single answer, as Lucene and Solr bring very similar underlying technology to bear on somewhat distinct problems. Solr is versatile and powerful, a full- featured, production-ready search application server requiring little formal software programming. Lucene presents a collection of directly callable Java libraries, with fine- grained control of machine functions and independence from higher-level protocols. In choosing which might be best for your search solution, the key questions to consider are application scope, deployment environment, and software development preferences. If you are new to developing search applications, you should start with Solr. Solr provides scalable search power out of the box, whereas Lucene requires solid information retrieval experience and some meaningful heavy lifting in Java to take advantage of its capabilities. In many instances, Solr doesn’t even require any real programming. Solr is essentially the “serverization” of Lucene, and many of its abstract functions are highly similar, if not just the same. If you are building an app for the enterprise sector, for instance, you will find Solr an almost 100% match to your business requirements: it comes ready to run in a servlet container such as Tomcat or Jetty, and ready to scale in a production Java environment. Its RESTful interfaces and XML-based configuration files can greatly accelerate application development and maintenance. In fact, Lucene programmers have often reported that they find Solr to contain “the same features I was going to build myself as a framework for Lucene, but already very-well implemented.” Once you start with Solr, and you find yourself using a lot of the features Solr provides out of the box, you will likely be better off using Solr’s well-organized extension mechanisms instead of starting from scratch using Apache Lucene. If, on the other hand, you don’t want to make any calls via HTTP, and want to have all of your resources controlled exclusively by Java API calls that you write, Lucene may be a better choice. Lucene works best when constructing and embedding a state-of-the-art search engine, allowing programmers to assemble and compile inside a native Java What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 13
  • 18. application. Some programmers set aside the convenience of Solr in order to more directly control the large set of sophisticated features with low-level access, data, or state manipulation, and choose Lucene instead, for example with byte-level manipulation of segments or intervention in data I/O. Investment at the lower level enables development of extremely sophisticated, cutting edge text search and retrieval capabilities. As for features, the latest version of Solr generally encapsulates the latest version of Lucene. As the two are in many ways functional siblings, spending time on gaining a solid understanding how Lucene works internally can help you understand Apache Solr and its extension of Lucene's workings. No matter which you choose, the power of open source search is yours to harness. More information on both Lucene and Solr can be found at http://www.lucidimagination.com. What’s New in Solr 1.4 A Lucid Imagination Technical White Paper • October 2009 Page 14