SlideShare une entreprise Scribd logo
1  sur  15
Search Platform
 Features & Use Cases
SOLR


●
    SOLR is an standalone search server, that can scale separatedly from
    the application that uses it
    ●
        i.e. Avoid the case where an e-commerce server is slowed down by the
        users searching their product catalog
●
    SOLR is accessed using HTTP/XML REST-like and JSON APIs
    ●
        Multi-platform, multi-language and client-independent
    ●
        Results in XML, CSV, or JSON (with custom variations for
        Ruby,Python,PHP)
●
    100% Opensource, written in Java, runs in JVM
●
    Apache Foundation top-level project
●
    Most widely-used search server in industry
SOLR : A Lucene server

●
     Solr is a search platform that provides all the features of Lucene search engine *
        ●
            high-performance indexing
        ●
            Incremental and batch indexing
        ●
            Small footprint (RAM and disk)




●
    And has all of Lucene features
    ●
        Ranked searching
    ●
        Many query types (phrase, wildcard, regexp, range, geospatial proximity)
    ●
        Many field types, meaningful sorting
    ●
        Multi-index search and merge of results
    ●   Faceting
    ●
        Language recognition (stemming)
    ●   Suggestions



                                             * (both projects are actually merged since SOLR 3.1, March 2010)
Simple SOLR Example


●
    Index a product catalog (i.e. IPod Video)
●
    Data in XML format
    <doc>
      <field   name="id">MA147LL/A</field>
      <field   name="name">Apple 60 GB iPod with Video Playback Black</field>
      <field   name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field>
      <field   name="features">Up to 20 hours of battery life</field>
      <field   name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field>
      <field   name="price">399.00</field>
      <field   name="inStock">true</field>
      <field   name="store">37.7752,-100.0232</field>   <!-- Dodge City store -->
    </doc>




●
    Schema configuration
    <field
    <field
             name="id" type="string" indexed="true" stored="true"/>
             name="name" type="text" indexed="true" stored="true"/>
    <field   name="features" type="text" indexed="true" stored="true" multiValued="true"/>
    <field   name="price" type="float" indexed="true" stored="true"/>
    <field   name="inStock" type="boolean" indexed="true" stored="true" />
    <field   name="store" type="location" indexed="true" stored="true"/>
Simple SOLR Example

●
     Query
       ●
              Return all products with « video » in any field, sorted by descendant
              price, show just the name,price,inStock
curl "http://localhost:8983/solr/collection1/select?q=video&sort=price+desc&fl=name,price,instock&indent=true"
<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params">
    <str name="fl">name,price</str>
    <str name="sort">price desc</str>
    <str name="indent">true</str>
    <str name="q">video</str>
  </lst>
</lst>
<result name="response" numFound="3" start="0">
  <doc>
    <str name="name">ATI Radeon X1900 XTX 512 MB PCIE Video Card</str>
    <float name="price">649.99</float>
    <bool name="inStock">false</bool></doc>
  <doc>
    <str name="name">ASUS Extreme N7800GTX/2DHTV (256 MB)</str>
    <float name="price">479.95</float>
    <bool name="inStock">false</bool></doc>
  <doc>
    <str name="name">Apple 60 GB iPod with Video Playback Black</str>
    <float name="price">399.0</float>
    <bool name="inStock">true</bool></doc>
</result>
</response>
Simple SOLR Example

●
    Query Facets
    ●
         Add facets options and desired category

        Facet : inStock                     Facet : price, from 0 to 1000$, in 100$ gaps
        q=video&sort=price+desc&facet=tru   q=video&sort=price+desc&facet=true&facet.range=pr
        e&facet.field=inStock               ice&facet.range.gap=100&facet.range.start=0.0&fac
                                            et.range.end=1000
        <lst name="facet_counts">
        <lst name="facet_queries"/>         <lst name="counts">
        <lst name="facet_fields">           <int name="0.0">0</int>
        <lst name="inStock">                <int name="100.0">0</int>
        <int name="false">2</int>           <int name="200.0">0</int>
        <int name="true">1</int>            <int name="300.0">1</int> (Apple Ipod 399$)
        </lst>                              <int name="400.0">1</int> (Asus Extreme 479$)
        </lst>                              <int name="500.0">0</int>
        <lst name="facet_dates"/>           <int name="600.0">1</int> (ATI Radeon 649$)
        <lst name="facet_ranges"/>          <int name="700.0">0</int>
        </lst>                              <int name="800.0">0</int>
                                            <int name="900.0">0</int>
                                            </lst>
Simple SOLR Example

●
    Filter Query
    ●
        Uses different cache than Search Cache (useful for big results)

    Filter Query : all products priced from 300 to 499 USD
    q=*&fl=name,price&fq=price:[300 TO 499]

    <result name="response" numFound="4" start="0">
    <doc>
      <str name="name">Maxtor DiamondMax 11 - hard drive - 500 GB – SATA-300</str>
      <float name="price">350.0</float>
    </doc>
    <doc>
      <str name="name">Apple 60 GB iPod with Video Playback Black</str>
      <float name="price">399.0</float>
    </doc>
    <doc>
      <str name="name">Canon PowerShot SD500</str>
      <float name="price">329.95</float>
    </doc>
    <doc>
      <str name="name">ASUS Extreme N7800GTX/2DHTV (256 MB)</str>
      <float name="price">479.95</float>
    </doc>
    </result>
Simple SOLR Example

●
    Spatial Query
    ●
        Store data:
          –   <field name="store">45.17614,-93.87341</field>           <!-- Buffalo store -->
          –   <field name="store">40.7143,-74.006</field>              <!-- NYC store -->
          –   <field name="store">37.7752,-122.4232</field>            <!-- San Francisco store -->

    ●
        We are at 45.15,-93.85 (at 3.437 km from the Buffalo store)
    ●
        Find all products in a store within 5km of our position:
    QUERY : &fl=name,store&q=*:*&fq={!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5} »

    "response":{"numFound":3,"start":0,"docs":[
          {
            "name":"Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300",
            "store":"45.17614,-93.87341"},
          {
            "name":"Belkin Mobile Power Cord for iPod w/ Dock",
            "store":"45.18014,-93.87741"},
          {
            "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM",
            "store":"45.18414,-93.88141"}]
      }
SOLR Features


●
    SOLR Cloud
    ●
        Cluster configuration using zookeper
    ●
        Easy sharding and failover management
    ●
        Self-healing, no single point of failure
●
    SOLR Cell (aka RequestImportHandler)
    ●
        TIKA integration for binary document parsing
    ●
        Parses DOC, PDF, XLS, MIME, etc
●
    DataImportHandlers
    ●
        Automatically fetch and index SQL Databases, E-mails, RSS feeds,
        Files in folder, etc.
SOLR Features


●
    Multiple Solr Core
    ●
        Many index collections in the same server
    ●
        Different schema definitions for each collection
    ●
        Different configurations for storage, replication, etc
●
    Caching
    ●
        Recurrent searches are cached, improves speed
    ●
        Advanced warming techniques
    ●
        Adding content triggers just a partial cache update
●   Advanced
    ●
        Language detection
    ●
        Natural Language Processing
    ●
        Clustering to scale both search and document retrieval
SOLR CLoud
SOLR TIKA integration


●
    SOLRCell embeds TIKA for binary file parsing
●
    TIKA parses DOC, PDF, XLSX, HTML... and represent it
    using XHTML, JSON or CSV
    ●
        Full list of accepted formats :
        http://tika.apache.org/1.3/formats.html
    ●
        For some files, it can just index metadata (MP3, JPG, AVI)
●
    SOLRCell will internally recover the TIKA output and store it so
    we can search it
●
    SOLR does not store the original binary file
SOLR Addons


●
    Admin Interface
SOLR Addons


●
    Web Interface (SOLRitas)
SOLR Use Cases

●
    Liferay Search
    ●
        As liferay already uses Lucene, we can connect it to a SOLR server
    ●
        Leverages the Liferay server and lets the SOLR cluster handle all the
        user searches in the portal
●
    Magento E-Commerce .
    ●
        Avoids using MySQL for searching
    ●
        Better search results
    ●
        Better overall performance
●
    Alfresco Search
    ●
        Currently, Alfresco recommends to setup SOLR from the beginning
    ●
        By default, Lucene+Tika is used internally

Contenu connexe

Tendances

10x Performance Improvements
10x Performance Improvements10x Performance Improvements
10x Performance Improvements
Ronald Bradford
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
FromDual GmbH
 
MySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD TourMySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD Tour
Ronald Bradford
 
Tracking Page Changes for Your Database and Bitmap Backups
Tracking Page Changes for Your Database and Bitmap BackupsTracking Page Changes for Your Database and Bitmap Backups
Tracking Page Changes for Your Database and Bitmap Backups
Laurynas Biveinis
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
Roland Bouman
 
FIXING BLOCK CORRUPTION (RMAN) on 11G
FIXING BLOCK CORRUPTION (RMAN) on 11GFIXING BLOCK CORRUPTION (RMAN) on 11G
FIXING BLOCK CORRUPTION (RMAN) on 11G
N/A
 

Tendances (18)

Demystifying PostgreSQL (Zendcon 2010)
Demystifying PostgreSQL (Zendcon 2010)Demystifying PostgreSQL (Zendcon 2010)
Demystifying PostgreSQL (Zendcon 2010)
 
Hostingultraso com (14)
Hostingultraso com (14)Hostingultraso com (14)
Hostingultraso com (14)
 
New features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionNew features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in action
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL Performance
 
10x Performance Improvements
10x Performance Improvements10x Performance Improvements
10x Performance Improvements
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
 
Hostingultraso australia
Hostingultraso australiaHostingultraso australia
Hostingultraso australia
 
2012 summarytables
2012 summarytables2012 summarytables
2012 summarytables
 
MySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD TourMySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD Tour
 
My sql administration
My sql administrationMy sql administration
My sql administration
 
Performance Schema for MySQL troubleshooting
Performance Schema for MySQL troubleshootingPerformance Schema for MySQL troubleshooting
Performance Schema for MySQL troubleshooting
 
Tracking Page Changes for Your Database and Bitmap Backups
Tracking Page Changes for Your Database and Bitmap BackupsTracking Page Changes for Your Database and Bitmap Backups
Tracking Page Changes for Your Database and Bitmap Backups
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
FIXING BLOCK CORRUPTION (RMAN) on 11G
FIXING BLOCK CORRUPTION (RMAN) on 11GFIXING BLOCK CORRUPTION (RMAN) on 11G
FIXING BLOCK CORRUPTION (RMAN) on 11G
 

En vedette

Solarponics Case Study
Solarponics Case StudySolarponics Case Study
Solarponics Case Study
Whizbang
 
MobileDiagnosis Onlus il tuo 5 x 1000 2016
MobileDiagnosis Onlus  il tuo 5 x 1000 2016         MobileDiagnosis Onlus  il tuo 5 x 1000 2016
MobileDiagnosis Onlus il tuo 5 x 1000 2016
MobileDiagnosis Non Profit Association
 
Argument writing
Argument writingArgument writing
Argument writing
Tammy Ward
 
Claim Based Authentication in SharePoint 2010 for Community Day 2011
Claim Based Authentication in SharePoint 2010 for Community Day 2011Claim Based Authentication in SharePoint 2010 for Community Day 2011
Claim Based Authentication in SharePoint 2010 for Community Day 2011
Joris Poelmans
 
POJER E SANDRI Vinix 15.08 2011
 POJER E SANDRI Vinix  15.08 2011  POJER E SANDRI Vinix  15.08 2011
POJER E SANDRI Vinix 15.08 2011
Daniel Cerami
 
Apps for Office Introduction
Apps for Office IntroductionApps for Office Introduction
Apps for Office Introduction
Joris Poelmans
 
Au psy492 m6 a2 thompson b doc
Au psy492 m6 a2 thompson b docAu psy492 m6 a2 thompson b doc
Au psy492 m6 a2 thompson b doc
Baroness Thompson
 
Il tempo dei se e dei ma è finito
Il tempo dei se e dei ma è finitoIl tempo dei se e dei ma è finito
Il tempo dei se e dei ma è finito
Daniel Cerami
 
Associated Partners - Business Profile
Associated Partners - Business ProfileAssociated Partners - Business Profile
Associated Partners - Business Profile
Dhananjay Walke
 
Twitter for local business
Twitter for local businessTwitter for local business
Twitter for local business
Whizbang
 

En vedette (20)

Solarponics Case Study
Solarponics Case StudySolarponics Case Study
Solarponics Case Study
 
Rm632 2012 mins_ac
Rm632 2012 mins_acRm632 2012 mins_ac
Rm632 2012 mins_ac
 
Portfólio Povos Indígenas
Portfólio Povos IndígenasPortfólio Povos Indígenas
Portfólio Povos Indígenas
 
Short information about Finland via EBook
Short information about Finland via EBookShort information about Finland via EBook
Short information about Finland via EBook
 
Renaissance man tweet meet leuven
Renaissance man tweet meet leuvenRenaissance man tweet meet leuven
Renaissance man tweet meet leuven
 
Livia bellina atlanta global summit 2015
Livia bellina atlanta global summit 2015Livia bellina atlanta global summit 2015
Livia bellina atlanta global summit 2015
 
MobileDiagnosis Onlus il tuo 5 x 1000 2016
MobileDiagnosis Onlus  il tuo 5 x 1000 2016         MobileDiagnosis Onlus  il tuo 5 x 1000 2016
MobileDiagnosis Onlus il tuo 5 x 1000 2016
 
Prevencao saudehigienesegurancanotrabalho
Prevencao saudehigienesegurancanotrabalhoPrevencao saudehigienesegurancanotrabalho
Prevencao saudehigienesegurancanotrabalho
 
Argument writing
Argument writingArgument writing
Argument writing
 
What we do in CASA UCL
What we do in CASA UCLWhat we do in CASA UCL
What we do in CASA UCL
 
Claim Based Authentication in SharePoint 2010 for Community Day 2011
Claim Based Authentication in SharePoint 2010 for Community Day 2011Claim Based Authentication in SharePoint 2010 for Community Day 2011
Claim Based Authentication in SharePoint 2010 for Community Day 2011
 
POJER E SANDRI Vinix 15.08 2011
 POJER E SANDRI Vinix  15.08 2011  POJER E SANDRI Vinix  15.08 2011
POJER E SANDRI Vinix 15.08 2011
 
Apps for Office Introduction
Apps for Office IntroductionApps for Office Introduction
Apps for Office Introduction
 
Md® all done 2014
Md® all done  2014Md® all done  2014
Md® all done 2014
 
Au psy492 m6 a2 thompson b doc
Au psy492 m6 a2 thompson b docAu psy492 m6 a2 thompson b doc
Au psy492 m6 a2 thompson b doc
 
Inserire ultimi articoli su blogger
Inserire ultimi articoli su bloggerInserire ultimi articoli su blogger
Inserire ultimi articoli su blogger
 
Il tempo dei se e dei ma è finito
Il tempo dei se e dei ma è finitoIl tempo dei se e dei ma è finito
Il tempo dei se e dei ma è finito
 
Associated Partners - Business Profile
Associated Partners - Business ProfileAssociated Partners - Business Profile
Associated Partners - Business Profile
 
Twitter for local business
Twitter for local businessTwitter for local business
Twitter for local business
 
What is science
What is scienceWhat is science
What is science
 

Similaire à Solr features

Pontos para criar_instancia_data guard_11g
Pontos para criar_instancia_data guard_11gPontos para criar_instancia_data guard_11g
Pontos para criar_instancia_data guard_11g
Leandro Santos
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 

Similaire à Solr features (20)

Presentation oracle super cluster t5-8 technical deep dive
Presentation   oracle super cluster t5-8 technical deep divePresentation   oracle super cluster t5-8 technical deep dive
Presentation oracle super cluster t5-8 technical deep dive
 
Why_Oracle_Hardware.ppt
Why_Oracle_Hardware.pptWhy_Oracle_Hardware.ppt
Why_Oracle_Hardware.ppt
 
Oracle on AWS RDS Migration - 성기명
Oracle on AWS RDS Migration - 성기명Oracle on AWS RDS Migration - 성기명
Oracle on AWS RDS Migration - 성기명
 
Ordina Oracle Open World
Ordina Oracle Open WorldOrdina Oracle Open World
Ordina Oracle Open World
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
 
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
 
Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )  Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )
 
Pontos para criar_instancia_data guard_11g
Pontos para criar_instancia_data guard_11gPontos para criar_instancia_data guard_11g
Pontos para criar_instancia_data guard_11g
 
Security Best Practice: Oracle passwords, but secure!
Security Best Practice: Oracle passwords, but secure!Security Best Practice: Oracle passwords, but secure!
Security Best Practice: Oracle passwords, but secure!
 
2018 Infortrend All Flash Arrays Introduction (GS3025A)
2018 Infortrend All Flash Arrays Introduction (GS3025A)2018 Infortrend All Flash Arrays Introduction (GS3025A)
2018 Infortrend All Flash Arrays Introduction (GS3025A)
 
Oracle Enterprise Manager 12c - OEM12c Presentation
Oracle Enterprise Manager 12c - OEM12c PresentationOracle Enterprise Manager 12c - OEM12c Presentation
Oracle Enterprise Manager 12c - OEM12c Presentation
 
Webinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e softwareWebinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e software
 
MySQL & Expression Engine EEUK2013
MySQL & Expression Engine EEUK2013MySQL & Expression Engine EEUK2013
MySQL & Expression Engine EEUK2013
 
Обзор новой СХД EMC Unity. Планирование обновления с VNX\VNX2, Тимофей Григор...
Обзор новой СХД EMC Unity. Планирование обновления с VNX\VNX2, Тимофей Григор...Обзор новой СХД EMC Unity. Планирование обновления с VNX\VNX2, Тимофей Григор...
Обзор новой СХД EMC Unity. Планирование обновления с VNX\VNX2, Тимофей Григор...
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Session 307 ravi pendekanti engineered systems
Session 307  ravi pendekanti engineered systemsSession 307  ravi pendekanti engineered systems
Session 307 ravi pendekanti engineered systems
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Intro to goldilocks inmemory db - low latency
Intro to goldilocks inmemory db - low latencyIntro to goldilocks inmemory db - low latency
Intro to goldilocks inmemory db - low latency
 
Entenda de onde vem toda a potência do Intel® Xeon Phi™
Entenda de onde vem toda a potência do Intel® Xeon Phi™ Entenda de onde vem toda a potência do Intel® Xeon Phi™
Entenda de onde vem toda a potência do Intel® Xeon Phi™
 

Plus de Marcos García (6)

Puts and calls in stock options
Puts and calls in stock optionsPuts and calls in stock options
Puts and calls in stock options
 
Cloud economics design, capacity and operational concerns
Cloud economics  design, capacity and operational concernsCloud economics  design, capacity and operational concerns
Cloud economics design, capacity and operational concerns
 
Welcome to icehouse
Welcome to icehouseWelcome to icehouse
Welcome to icehouse
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)
 
Initial presentation of openstack (for montreal user group)
Initial presentation of openstack (for montreal user group)Initial presentation of openstack (for montreal user group)
Initial presentation of openstack (for montreal user group)
 
Quick introduction to Java Garbage Collector (JVM GC)
Quick introduction to Java Garbage Collector (JVM GC)Quick introduction to Java Garbage Collector (JVM GC)
Quick introduction to Java Garbage Collector (JVM GC)
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Solr features

  • 2. SOLR ● SOLR is an standalone search server, that can scale separatedly from the application that uses it ● i.e. Avoid the case where an e-commerce server is slowed down by the users searching their product catalog ● SOLR is accessed using HTTP/XML REST-like and JSON APIs ● Multi-platform, multi-language and client-independent ● Results in XML, CSV, or JSON (with custom variations for Ruby,Python,PHP) ● 100% Opensource, written in Java, runs in JVM ● Apache Foundation top-level project ● Most widely-used search server in industry
  • 3. SOLR : A Lucene server ● Solr is a search platform that provides all the features of Lucene search engine * ● high-performance indexing ● Incremental and batch indexing ● Small footprint (RAM and disk) ● And has all of Lucene features ● Ranked searching ● Many query types (phrase, wildcard, regexp, range, geospatial proximity) ● Many field types, meaningful sorting ● Multi-index search and merge of results ● Faceting ● Language recognition (stemming) ● Suggestions * (both projects are actually merged since SOLR 3.1, March 2010)
  • 4. Simple SOLR Example ● Index a product catalog (i.e. IPod Video) ● Data in XML format <doc> <field name="id">MA147LL/A</field> <field name="name">Apple 60 GB iPod with Video Playback Black</field> <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field> <field name="features">Up to 20 hours of battery life</field> <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field> <field name="price">399.00</field> <field name="inStock">true</field> <field name="store">37.7752,-100.0232</field> <!-- Dodge City store --> </doc> ● Schema configuration <field <field name="id" type="string" indexed="true" stored="true"/> name="name" type="text" indexed="true" stored="true"/> <field name="features" type="text" indexed="true" stored="true" multiValued="true"/> <field name="price" type="float" indexed="true" stored="true"/> <field name="inStock" type="boolean" indexed="true" stored="true" /> <field name="store" type="location" indexed="true" stored="true"/>
  • 5. Simple SOLR Example ● Query ● Return all products with « video » in any field, sorted by descendant price, show just the name,price,inStock curl "http://localhost:8983/solr/collection1/select?q=video&sort=price+desc&fl=name,price,instock&indent=true" <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="fl">name,price</str> <str name="sort">price desc</str> <str name="indent">true</str> <str name="q">video</str> </lst> </lst> <result name="response" numFound="3" start="0"> <doc> <str name="name">ATI Radeon X1900 XTX 512 MB PCIE Video Card</str> <float name="price">649.99</float> <bool name="inStock">false</bool></doc> <doc> <str name="name">ASUS Extreme N7800GTX/2DHTV (256 MB)</str> <float name="price">479.95</float> <bool name="inStock">false</bool></doc> <doc> <str name="name">Apple 60 GB iPod with Video Playback Black</str> <float name="price">399.0</float> <bool name="inStock">true</bool></doc> </result> </response>
  • 6. Simple SOLR Example ● Query Facets ● Add facets options and desired category Facet : inStock Facet : price, from 0 to 1000$, in 100$ gaps q=video&sort=price+desc&facet=tru q=video&sort=price+desc&facet=true&facet.range=pr e&facet.field=inStock ice&facet.range.gap=100&facet.range.start=0.0&fac et.range.end=1000 <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="counts"> <lst name="facet_fields"> <int name="0.0">0</int> <lst name="inStock"> <int name="100.0">0</int> <int name="false">2</int> <int name="200.0">0</int> <int name="true">1</int> <int name="300.0">1</int> (Apple Ipod 399$) </lst> <int name="400.0">1</int> (Asus Extreme 479$) </lst> <int name="500.0">0</int> <lst name="facet_dates"/> <int name="600.0">1</int> (ATI Radeon 649$) <lst name="facet_ranges"/> <int name="700.0">0</int> </lst> <int name="800.0">0</int> <int name="900.0">0</int> </lst>
  • 7. Simple SOLR Example ● Filter Query ● Uses different cache than Search Cache (useful for big results) Filter Query : all products priced from 300 to 499 USD q=*&fl=name,price&fq=price:[300 TO 499] <result name="response" numFound="4" start="0"> <doc> <str name="name">Maxtor DiamondMax 11 - hard drive - 500 GB – SATA-300</str> <float name="price">350.0</float> </doc> <doc> <str name="name">Apple 60 GB iPod with Video Playback Black</str> <float name="price">399.0</float> </doc> <doc> <str name="name">Canon PowerShot SD500</str> <float name="price">329.95</float> </doc> <doc> <str name="name">ASUS Extreme N7800GTX/2DHTV (256 MB)</str> <float name="price">479.95</float> </doc> </result>
  • 8. Simple SOLR Example ● Spatial Query ● Store data: – <field name="store">45.17614,-93.87341</field> <!-- Buffalo store --> – <field name="store">40.7143,-74.006</field> <!-- NYC store --> – <field name="store">37.7752,-122.4232</field> <!-- San Francisco store --> ● We are at 45.15,-93.85 (at 3.437 km from the Buffalo store) ● Find all products in a store within 5km of our position: QUERY : &fl=name,store&q=*:*&fq={!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5} » "response":{"numFound":3,"start":0,"docs":[ { "name":"Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300", "store":"45.17614,-93.87341"}, { "name":"Belkin Mobile Power Cord for iPod w/ Dock", "store":"45.18014,-93.87741"}, { "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM", "store":"45.18414,-93.88141"}] }
  • 9. SOLR Features ● SOLR Cloud ● Cluster configuration using zookeper ● Easy sharding and failover management ● Self-healing, no single point of failure ● SOLR Cell (aka RequestImportHandler) ● TIKA integration for binary document parsing ● Parses DOC, PDF, XLS, MIME, etc ● DataImportHandlers ● Automatically fetch and index SQL Databases, E-mails, RSS feeds, Files in folder, etc.
  • 10. SOLR Features ● Multiple Solr Core ● Many index collections in the same server ● Different schema definitions for each collection ● Different configurations for storage, replication, etc ● Caching ● Recurrent searches are cached, improves speed ● Advanced warming techniques ● Adding content triggers just a partial cache update ● Advanced ● Language detection ● Natural Language Processing ● Clustering to scale both search and document retrieval
  • 12. SOLR TIKA integration ● SOLRCell embeds TIKA for binary file parsing ● TIKA parses DOC, PDF, XLSX, HTML... and represent it using XHTML, JSON or CSV ● Full list of accepted formats : http://tika.apache.org/1.3/formats.html ● For some files, it can just index metadata (MP3, JPG, AVI) ● SOLRCell will internally recover the TIKA output and store it so we can search it ● SOLR does not store the original binary file
  • 13. SOLR Addons ● Admin Interface
  • 14. SOLR Addons ● Web Interface (SOLRitas)
  • 15. SOLR Use Cases ● Liferay Search ● As liferay already uses Lucene, we can connect it to a SOLR server ● Leverages the Liferay server and lets the SOLR cluster handle all the user searches in the portal ● Magento E-Commerce . ● Avoids using MySQL for searching ● Better search results ● Better overall performance ● Alfresco Search ● Currently, Alfresco recommends to setup SOLR from the beginning ● By default, Lucene+Tika is used internally