SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
Small wins In a small
time with Apache Solr
Who am I?


    My (Buddhist) name is Upayavira

    Consultant with Sourcesense, specialising in
    search and operational technologies

    A member of the Apache Software Foundation
Who are Sourcesense?


    Open Source integrator, specialising in:
    
        Search
    
        Business Intelligence
    
        Content Management
    
        Application Lifecycle Management

    Offices in London, Amsterdam, Milan and Rome
Committers and Contributors

     Search:
     
            Lucene/Solr – contributor
     
            Hibernate Search – committer
     
            Lucene Infinispan integration – lead developer
     
            Apache UIMA – committer

     CMS:
     
            Apache Chemistry – contributor
     
            Apache Jackrabbit – contributor
     
            JBoss GateIn Portal – committer
     
            OpenSSO-Alfresco - contributor
What is Lucene?


    Lucene is a Java information retrieval library

    Provides free text search facilities

    Started in 2000, by Doug Cutting

    A project of the Apache Software Foundation

    It is designed to be embedded in Java apps
What is Solr?


    Solr is an enterprise search server based on
    Lucene

    Wraps Lucene with a RESTful web interface

    Provides configurable schema

    Provides replication functionality
Solr Design
                                       User queries




     Solr          SearchHandler
     instance


                       Lucene
                        index



                UpdateRequestHandler



                                        content
                                       application
Prerequisites



    Java, preferably Java 6

    Apache Solr 1.4.1

    http://www.sourcesense.com/dev8d-solr.zip
Prerequisites

    Extract your Solr distribution

    At a command prompt:
    – cd into the unzipped distribution directory
    – cd into the example directory
    – Enter: java -jar start.jar

    Visit http://localhost:8983/solr/ in a browser. If you see a
    welcome message, your Solr works

    Unpack your dev8d-solr.zip file

    At another command prompt, cd into your dev8d-solr
    directory
Checking Solr Works


    Visit http://localhost:8983/solr/admin/

    You should see the Solr admin page.

    Click statistics link

    You'll see NumDocs: 0

    There's nothing in the index, so searches won't show
    much

    So we need to index some sample content
Indexing Sample Content



    In your dev8d-solr directory (extracted from the zip), at
    a command prompt:

    Java -jar post.jar wikipedia-basic.xml
Searching




    http://localhost:8983/solr/select?q=*:*
Searching




    http://localhost:8983/solr/select?q=computers
Searching




    http://localhost:8983/solr/select?q=computer systems
Searching




     http://localhost:8983/solr/select?q=computers OR systems
Searching




     http://localhost:8983/solr/select?q=computers AND systems
Searching




     http://localhost:8983/solr/select?q="computer systems"
Searching




     http://localhost:8983/solr/select?q="computer systems"~10
Searching




     http://localhost:8983/solr/select?q=computers NOT data
Searching




     http://localhost:8983/solr/select?q=computers -data
Searching




     http://localhost:8983/solr/select/?q=computers&fl=title
Searching




     http://localhost:8983/solr/select/?q=computers&fq=author:yobot
Searching



     http://localhost:8983/solr/select/?
     q=computers&fq=author:yobot&fl=title,author
Searching



     http://localhost:8983/solr/select/?
     q=computers&rows=10&start=10&fl=title
Searching




     http://localhost:8983/solr/select/?q=title:system&fl=title
Searching



     http://localhost:8983/solr/select/?
     q=computers&fl=title,author&sort=author+desc
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author&rows=0
     &facet.sort=lex
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author&rows=0&
     facet.sort=count
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author&rows=0&
     facet.sort=count&facet.mincount=2
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author&rows=0&
     facet.sort=count&facet.limit=3
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author&rows=0&
     facet.sort=count&facet.limit=3&debugQuery=true
Searching




     http://localhost:8983/solr/select?q=computer&wt=json
Searching




     http://localhost:8983/solr/select?q=computer&wt=javabin
Indexing
Indexing



     Load wikipedia-basic.xml into a text editor or web browser

     Load wikipedia-enhanced.xml into a text editor or browser

     Load example/solr/conf/schema.xml into a text editor
Indexing



     schema.xml defines field types and fields used in Solr

     Equivalent to your database schema in a RDBMS
Indexing


     Change these two fields in schema.xml to be of type “string”
     and add multiValued=”true” for each.
      <field name="links" type="string" indexed="true"
     stored="true" multiValued="true"/>
      <field name="category" type="string" indexed="true"
     stored="true" multiValued="true"/>
Indexing


     Now add this to the <fields> section of solrconfig.xml:

     <field name="source" type="string" indexed="true"
     stored="true" multiValued="false"/>

     <field name="textgen" type="textgen" indexed="true"
     stored="true" multiValued="true"/>

     Now search for the “textgen” field type definition, further up
     in the file.
Indexing



     At the bottom of solrconfig.xml add the following:
     <copyField source="text" dest="textgen"/>
Indexing



     At your command prompt, in the dev8d directory, execute:

     java -jar post.jar wikipedia-enhanced.xml
More Advanced Searching



     http://localhost:8983/solr/select?q=computers%20AND
     %20babbage&facet=true&facet.field=category&facet.mincount=
     1
More Advanced Searching



     http://localhost:8983/solr/terms?
     terms.fl=text&terms=true&terms.limit=20
More Advanced Searching



     http://localhost:8983/solr/terms?
     terms.fl=textgen&terms=true&terms.limit=20
More Advanced Searching



     http://localhost:8983/solr/terms?
     terms.fl=textgen&terms=true&terms.limit=20&terms.prefix=at
thank you
upayavira@sourcesense.com
Solr Host Configuration

       shard 1



       shard 2   searches



       shard 3
Solr Host Configuration

        shard 1



        shard 2



        shard 3




      co-ordinator
Solr Host Configuration

        shard 1



        shard 2



        shard 3




      co-ordinator




                     load balancer
Solr Host Configuration

        shard 1                      shard 1



        shard 2                      shard 2



        shard 3                      shard 3




      co-ordinator               co-ordinator




                     load balancer
Solr Host Configuration

        shard 1                      shard 1



        shard 2                      shard 2



        shard 3                      shard 3




      co-ordinator               co-ordinator




                     load balancer

Contenu connexe

Tendances

Deep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceDeep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceAmazon Web Services
 
DRUPAL Search API Solr
DRUPAL Search API SolrDRUPAL Search API Solr
DRUPAL Search API SolrAndrew Siz
 
Deep Dive into AWS CLI - the command line interface
Deep Dive into AWS CLI - the command line interfaceDeep Dive into AWS CLI - the command line interface
Deep Dive into AWS CLI - the command line interfaceJohn Varghese
 
Lightweight Webservices with Sinatra and RestClient
Lightweight Webservices with Sinatra and RestClientLightweight Webservices with Sinatra and RestClient
Lightweight Webservices with Sinatra and RestClientAdam Wiggins
 
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...Puppet
 
Deep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceDeep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceAmazon Web Services
 
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014Amazon Web Services
 
Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3makoto tsuyuki
 
Ethiopian multiplication in Perl6
Ethiopian multiplication in Perl6Ethiopian multiplication in Perl6
Ethiopian multiplication in Perl6Workhorse Computing
 
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...Building Modern and Secure PHP Applications – Codementor Office Hours with Be...
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...Arc & Codementor
 
用Tornado开发RESTful API运用
用Tornado开发RESTful API运用用Tornado开发RESTful API运用
用Tornado开发RESTful API运用Felinx Lee
 
Terraform infraestructura como código
Terraform infraestructura como códigoTerraform infraestructura como código
Terraform infraestructura como códigoVictor Adsuar
 
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)Puppet Camp Portland 2015: Introduction to Hiera (Beginner)
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)Puppet
 
Refactor Dance - Puppet Labs 'Best Practices'
Refactor Dance - Puppet Labs 'Best Practices'Refactor Dance - Puppet Labs 'Best Practices'
Refactor Dance - Puppet Labs 'Best Practices'Gary Larizza
 
Real time server
Real time serverReal time server
Real time serverthepian
 
Keeping it Small: Getting to know the Slim Micro Framework
Keeping it Small: Getting to know the Slim Micro FrameworkKeeping it Small: Getting to know the Slim Micro Framework
Keeping it Small: Getting to know the Slim Micro FrameworkJeremy Kendall
 
To Batch Or Not To Batch
To Batch Or Not To BatchTo Batch Or Not To Batch
To Batch Or Not To BatchLuca Mearelli
 
Controlling The Cloud With Python
Controlling The Cloud With PythonControlling The Cloud With Python
Controlling The Cloud With PythonLuca Mearelli
 

Tendances (20)

Apache Hacks
Apache HacksApache Hacks
Apache Hacks
 
Deep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceDeep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line Interface
 
DRUPAL Search API Solr
DRUPAL Search API SolrDRUPAL Search API Solr
DRUPAL Search API Solr
 
Deep Dive into AWS CLI - the command line interface
Deep Dive into AWS CLI - the command line interfaceDeep Dive into AWS CLI - the command line interface
Deep Dive into AWS CLI - the command line interface
 
Lightweight Webservices with Sinatra and RestClient
Lightweight Webservices with Sinatra and RestClientLightweight Webservices with Sinatra and RestClient
Lightweight Webservices with Sinatra and RestClient
 
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...
 
Deep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceDeep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line Interface
 
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014
 
Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3
 
Ethiopian multiplication in Perl6
Ethiopian multiplication in Perl6Ethiopian multiplication in Perl6
Ethiopian multiplication in Perl6
 
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...Building Modern and Secure PHP Applications – Codementor Office Hours with Be...
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...
 
用Tornado开发RESTful API运用
用Tornado开发RESTful API运用用Tornado开发RESTful API运用
用Tornado开发RESTful API运用
 
Terraform infraestructura como código
Terraform infraestructura como códigoTerraform infraestructura como código
Terraform infraestructura como código
 
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)Puppet Camp Portland 2015: Introduction to Hiera (Beginner)
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)
 
Refactor Dance - Puppet Labs 'Best Practices'
Refactor Dance - Puppet Labs 'Best Practices'Refactor Dance - Puppet Labs 'Best Practices'
Refactor Dance - Puppet Labs 'Best Practices'
 
CodeIgniter 3.0
CodeIgniter 3.0CodeIgniter 3.0
CodeIgniter 3.0
 
Real time server
Real time serverReal time server
Real time server
 
Keeping it Small: Getting to know the Slim Micro Framework
Keeping it Small: Getting to know the Slim Micro FrameworkKeeping it Small: Getting to know the Slim Micro Framework
Keeping it Small: Getting to know the Slim Micro Framework
 
To Batch Or Not To Batch
To Batch Or Not To BatchTo Batch Or Not To Batch
To Batch Or Not To Batch
 
Controlling The Cloud With Python
Controlling The Cloud With PythonControlling The Cloud With Python
Controlling The Cloud With Python
 

Similaire à Small wins in a small time with Apache Solr

Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solrNet7
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineDavid Keener
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 
Enterprise search with apache solr
Enterprise search with apache solrEnterprise search with apache solr
Enterprise search with apache solrsenthil0809
 
Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2longkeyy
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in ScalaAlex Payne
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 

Similaire à Small wins in a small time with Apache Solr (20)

Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search Engine
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Enterprise search with apache solr
Enterprise search with apache solrEnterprise search with apache solr
Enterprise search with apache solr
 
Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Solr 8 interview
Solr 8 interview Solr 8 interview
Solr 8 interview
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Laravel 4 presentation
Laravel 4 presentationLaravel 4 presentation
Laravel 4 presentation
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 

Plus de Sourcesense

Atlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiAtlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiSourcesense
 
Atlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionAtlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionSourcesense
 
Atlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesAtlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesSourcesense
 
Atlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introAtlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introSourcesense
 
Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Sourcesense
 
Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense
 
Sharded Solr setup with master
Sharded Solr setup with masterSharded Solr setup with master
Sharded Solr setup with masterSourcesense
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StorySourcesense
 

Plus de Sourcesense (8)

Atlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiAtlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad Cavalcanti
 
Atlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionAtlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps Session
 
Atlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesAtlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense References
 
Atlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introAtlassian Roadshow 2016 intro
Atlassian Roadshow 2016 intro
 
Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015
 
Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015
 
Sharded Solr setup with master
Sharded Solr setup with masterSharded Solr setup with master
Sharded Solr setup with master
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents Story
 

Dernier

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

Dernier (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Small wins in a small time with Apache Solr

  • 1. Small wins In a small time with Apache Solr
  • 2. Who am I?  My (Buddhist) name is Upayavira  Consultant with Sourcesense, specialising in search and operational technologies  A member of the Apache Software Foundation
  • 3. Who are Sourcesense?  Open Source integrator, specialising in:  Search  Business Intelligence  Content Management  Application Lifecycle Management  Offices in London, Amsterdam, Milan and Rome
  • 4. Committers and Contributors  Search:  Lucene/Solr – contributor  Hibernate Search – committer  Lucene Infinispan integration – lead developer  Apache UIMA – committer  CMS:  Apache Chemistry – contributor  Apache Jackrabbit – contributor  JBoss GateIn Portal – committer  OpenSSO-Alfresco - contributor
  • 5. What is Lucene?  Lucene is a Java information retrieval library  Provides free text search facilities  Started in 2000, by Doug Cutting  A project of the Apache Software Foundation  It is designed to be embedded in Java apps
  • 6. What is Solr?  Solr is an enterprise search server based on Lucene  Wraps Lucene with a RESTful web interface  Provides configurable schema  Provides replication functionality
  • 7. Solr Design User queries Solr SearchHandler instance Lucene index UpdateRequestHandler content application
  • 8. Prerequisites  Java, preferably Java 6  Apache Solr 1.4.1  http://www.sourcesense.com/dev8d-solr.zip
  • 9. Prerequisites  Extract your Solr distribution  At a command prompt: – cd into the unzipped distribution directory – cd into the example directory – Enter: java -jar start.jar  Visit http://localhost:8983/solr/ in a browser. If you see a welcome message, your Solr works  Unpack your dev8d-solr.zip file  At another command prompt, cd into your dev8d-solr directory
  • 10. Checking Solr Works  Visit http://localhost:8983/solr/admin/  You should see the Solr admin page.  Click statistics link  You'll see NumDocs: 0  There's nothing in the index, so searches won't show much  So we need to index some sample content
  • 11. Indexing Sample Content  In your dev8d-solr directory (extracted from the zip), at a command prompt:  Java -jar post.jar wikipedia-basic.xml
  • 12. Searching  http://localhost:8983/solr/select?q=*:*
  • 13. Searching  http://localhost:8983/solr/select?q=computers
  • 14. Searching  http://localhost:8983/solr/select?q=computer systems
  • 15. Searching  http://localhost:8983/solr/select?q=computers OR systems
  • 16. Searching  http://localhost:8983/solr/select?q=computers AND systems
  • 17. Searching  http://localhost:8983/solr/select?q="computer systems"
  • 18. Searching  http://localhost:8983/solr/select?q="computer systems"~10
  • 19. Searching  http://localhost:8983/solr/select?q=computers NOT data
  • 20. Searching  http://localhost:8983/solr/select?q=computers -data
  • 21. Searching  http://localhost:8983/solr/select/?q=computers&fl=title
  • 22. Searching  http://localhost:8983/solr/select/?q=computers&fq=author:yobot
  • 23. Searching  http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author
  • 24. Searching  http://localhost:8983/solr/select/? q=computers&rows=10&start=10&fl=title
  • 25. Searching  http://localhost:8983/solr/select/?q=title:system&fl=title
  • 26. Searching  http://localhost:8983/solr/select/? q=computers&fl=title,author&sort=author+desc
  • 27. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author
  • 28. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0 &facet.sort=lex
  • 29. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count
  • 30. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.mincount=2
  • 31. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.limit=3
  • 32. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.limit=3&debugQuery=true
  • 33. Searching  http://localhost:8983/solr/select?q=computer&wt=json
  • 34. Searching  http://localhost:8983/solr/select?q=computer&wt=javabin
  • 36. Indexing  Load wikipedia-basic.xml into a text editor or web browser  Load wikipedia-enhanced.xml into a text editor or browser  Load example/solr/conf/schema.xml into a text editor
  • 37. Indexing  schema.xml defines field types and fields used in Solr  Equivalent to your database schema in a RDBMS
  • 38. Indexing  Change these two fields in schema.xml to be of type “string” and add multiValued=”true” for each. <field name="links" type="string" indexed="true" stored="true" multiValued="true"/> <field name="category" type="string" indexed="true" stored="true" multiValued="true"/>
  • 39. Indexing  Now add this to the <fields> section of solrconfig.xml:  <field name="source" type="string" indexed="true" stored="true" multiValued="false"/>  <field name="textgen" type="textgen" indexed="true" stored="true" multiValued="true"/>  Now search for the “textgen” field type definition, further up in the file.
  • 40. Indexing  At the bottom of solrconfig.xml add the following: <copyField source="text" dest="textgen"/>
  • 41. Indexing  At your command prompt, in the dev8d directory, execute:  java -jar post.jar wikipedia-enhanced.xml
  • 42. More Advanced Searching  http://localhost:8983/solr/select?q=computers%20AND %20babbage&facet=true&facet.field=category&facet.mincount= 1
  • 43. More Advanced Searching  http://localhost:8983/solr/terms? terms.fl=text&terms=true&terms.limit=20
  • 44. More Advanced Searching  http://localhost:8983/solr/terms? terms.fl=textgen&terms=true&terms.limit=20
  • 45. More Advanced Searching  http://localhost:8983/solr/terms? terms.fl=textgen&terms=true&terms.limit=20&terms.prefix=at
  • 47. Solr Host Configuration shard 1 shard 2 searches shard 3
  • 48. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator
  • 49. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer
  • 50. Solr Host Configuration shard 1 shard 1 shard 2 shard 2 shard 3 shard 3 co-ordinator co-ordinator load balancer
  • 51. Solr Host Configuration shard 1 shard 1 shard 2 shard 2 shard 3 shard 3 co-ordinator co-ordinator load balancer