SlideShare une entreprise Scribd logo
1  sur  16
Solr 4.1



      Abhey Gupta
Software Engineer (Java)
Value First Digital Pvt Ltd
Outline
This presentation will guide from series of question in try to answer
    usability of Solr in MIS 3

–   What is Solr ?
–   Why use Solr ?
–   Current Scenairo
–   Scope of Improvement
–   Indexing Data
–   Import MIS Data
–   Challenges
–   Demo
–   Query Example
What is Solr?
Solr is the popular, blazing fast open source enterprise search platform
  from the Apache Lucene project.

     •    Its major features include powerful full-text search, hit
         highlighting, faceted search, dynamic clustering, database
         integration, rich document (e.g., Word, PDF) handling, and
         geospatial search.

     •   Solr is highly scalable, providing distributed search and index
         replication, and it powers the search and navigation features
         of many of the world's largest internet sites.
What is Lucene?
Apache Lucene is a high-performance, full-featured text search engine library
written entirely in Java. It is a technology suitable for nearly any application
that requires full-text search, especially cross-platform

     –   An open source Java-based IR library with best practice indexing
         and query capabilities, fast and lightweight search and indexing.
     –   100% Java (.NET, Perl and other versions too).
     –   Stable, mature API.
     –   Continuously improved and tuned over more than 10 years.
     –   Cleanly implemented, easy to embed in an application.
     –   Compact, portable index representation.
     –   Programmable text analyzers, spell checking and highlighting.
     –   Not a crawler or a text extraction tool.
Who uses Lucene/Solr?
Here are five noteworthy public sites that use Solr to handle search:

–   WhiteHouse.gov – The Obama administration's keystone web site is
    Drupal and Solr!
–   Netflix – Solr powers basic movie searching on this extremely busy
    site.
–   Internet Archive – Search this vast repository of music, documents
    and video using Solr.
–   StubHub.com – This ticket reseller uses Solr to help visitors search
    for concerts and sporting events.
–   The Smithsonian Institution – Search the Smithsonian’s collection of
    over 4 million items.
Solr indexing options
Why uses Solr?
 Assuming the user has a relational DB, why use Solr? If your use case
requires a person to type words into a search box, you want a text search
engine like Solr.

Databases and Solr have complementary strengths and weaknesses.

SQL supports very simple wildcard-based text search with some simple
normalization like matching upper case to lower case. The problem is that
these are full table scans. In Solr all searchable words are stored in an
"inverse index", which searches orders of magnitude faster.

For Deatils Please consult below link
                   –   http://wiki.apache.org/solr/WhyUseSolr
Current Scenario
In current ,MIS 3 use mysql FULL TEXT search for text based search which
lacks behind solr in terms of Query Speed & Text Search



                1. Full Text            2. Full Text Search
                Search         MIS UI   Query
     USER
                                (80)

                                                                   MYSQL
                                            4. Result




                                                              3. Full table Scan for text
                                                              Search
Scope of Improvement
Instead of quering MYSQL for text search , we can deploy Solr inbetween ,
which will return result , being inverted index , this quering is fast and
efficient.


                1. Full Text
                Search                 MIS UI
     USER
                                        (80)

                                                                                           MYSQL


                                                  4. Result
                 2. Full Text Search
                 Query



                                                SOLR
                                                              3. Scan for tokenized text
                                                              Search in inverted index
Indexing Data in Solr
A Solr index can accept data from many different sources, including XML
files, comma-separated value (CSV) files, data extracted from tables in a
database, and files in common file formats such as Microsoft Word or PDF.

Here are the most common ways of loading data into a Solr index:

    –    Uploading Structured Data Store Data with the Data Import Handler

    –    Using the Solr Cell framework built on Apache Tika for ingesting
         binary files or structured files such as Office, Word, PDF, and other
         proprietary formats.

    –    Uploading XML files by sending HTTP requests to the Solr server
         from any environment where such requests can be generated.

    –    Writing a custom Java application to ingest data through Solr's Java
Indexing MIS Data
MIS has structured data on MIS server and Structured files on services
server , so this way we can index data in two ways , These are following

    –    Data Import Handler on MIS Database
           • This has benefit of manageability , as this needs to be
               deployed on MIS servers only,which are very few.
           • We can import data on delta incremental.

    –    Script to import CSV files from services
           • This will increase in manageability and deployability of scripts
               on services
           • Need to implement partial import for DLRLOG data.
Indexing Bean
Solr can also import bean type for indexing , in Services we build bean of
Every MT and DLR , we can Directly import them on Solr.

This could increase into unneccesary load , as API will index bean per
messages.


                     Index Data call per MT and DLR
      API 15                                          SOLR
Data Import Handler
We can import data to index in solr from mysql , we can do this in two ways ,
 disctributed or centeral
                   SOLR
    MYSQL                                           MYSQL




                    SOLR
     MYSQL                                           MYSQL
                                                                        SOLR




                    SOLR                             MYSQL
     MYSQL




     MYSQL         SOLR                              MYSQL
Import CSV
We can import data to index in solr from each services , we can also do this in
 two ways , disctributed or centeral
                  SCRIPT
     Service 1               SOLR



                  SCRIPT
     Service 1               SOLR



     Service 1    SCRIPT     SOLR




    …..........



     Service 1    SCRIPT
                             SOLR
Challenges
Every Import Scenario advantages trade off with some disadvantages and
challenges .

For Example

    –   DIH : Data Import handler require joins with sql query to import data
        from mttextlog,mtlog and dlrlog.

           •   Or we can get messageid from Solr and query again to mysql
               for complete data with in clause query for message id.

    –   CSV Import : It requires scripts to be deployed on every service
        server and lots of managebablity of files proccessed or not
        proccessed.

    –   BEAN Import : It requires changes at API level and could result into
Thank You

Contenu connexe

Similaire à Solr 4

Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2
longkeyy
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
lucenerevolution
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
lucenerevolution
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
Abanti Aazmin
 

Similaire à Solr 4 (20)

Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
 
Solr -
Solr - Solr -
Solr -
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Apache Solr Web Development: Unlocking the Power of Search
Apache Solr Web Development: Unlocking the Power of SearchApache Solr Web Development: Unlocking the Power of Search
Apache Solr Web Development: Unlocking the Power of Search
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
OUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA SuiteOUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA Suite
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Solr 4

  • 1. Solr 4.1 Abhey Gupta Software Engineer (Java) Value First Digital Pvt Ltd
  • 2. Outline This presentation will guide from series of question in try to answer usability of Solr in MIS 3 – What is Solr ? – Why use Solr ? – Current Scenairo – Scope of Improvement – Indexing Data – Import MIS Data – Challenges – Demo – Query Example
  • 3. What is Solr? Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. • Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. • Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
  • 4. What is Lucene? Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform – An open source Java-based IR library with best practice indexing and query capabilities, fast and lightweight search and indexing. – 100% Java (.NET, Perl and other versions too). – Stable, mature API. – Continuously improved and tuned over more than 10 years. – Cleanly implemented, easy to embed in an application. – Compact, portable index representation. – Programmable text analyzers, spell checking and highlighting. – Not a crawler or a text extraction tool.
  • 5. Who uses Lucene/Solr? Here are five noteworthy public sites that use Solr to handle search: – WhiteHouse.gov – The Obama administration's keystone web site is Drupal and Solr! – Netflix – Solr powers basic movie searching on this extremely busy site. – Internet Archive – Search this vast repository of music, documents and video using Solr. – StubHub.com – This ticket reseller uses Solr to help visitors search for concerts and sporting events. – The Smithsonian Institution – Search the Smithsonian’s collection of over 4 million items.
  • 7. Why uses Solr? Assuming the user has a relational DB, why use Solr? If your use case requires a person to type words into a search box, you want a text search engine like Solr. Databases and Solr have complementary strengths and weaknesses. SQL supports very simple wildcard-based text search with some simple normalization like matching upper case to lower case. The problem is that these are full table scans. In Solr all searchable words are stored in an "inverse index", which searches orders of magnitude faster. For Deatils Please consult below link – http://wiki.apache.org/solr/WhyUseSolr
  • 8. Current Scenario In current ,MIS 3 use mysql FULL TEXT search for text based search which lacks behind solr in terms of Query Speed & Text Search 1. Full Text 2. Full Text Search Search MIS UI Query USER (80) MYSQL 4. Result 3. Full table Scan for text Search
  • 9. Scope of Improvement Instead of quering MYSQL for text search , we can deploy Solr inbetween , which will return result , being inverted index , this quering is fast and efficient. 1. Full Text Search MIS UI USER (80) MYSQL 4. Result 2. Full Text Search Query SOLR 3. Scan for tokenized text Search in inverted index
  • 10. Indexing Data in Solr A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF. Here are the most common ways of loading data into a Solr index: – Uploading Structured Data Store Data with the Data Import Handler – Using the Solr Cell framework built on Apache Tika for ingesting binary files or structured files such as Office, Word, PDF, and other proprietary formats. – Uploading XML files by sending HTTP requests to the Solr server from any environment where such requests can be generated. – Writing a custom Java application to ingest data through Solr's Java
  • 11. Indexing MIS Data MIS has structured data on MIS server and Structured files on services server , so this way we can index data in two ways , These are following – Data Import Handler on MIS Database • This has benefit of manageability , as this needs to be deployed on MIS servers only,which are very few. • We can import data on delta incremental. – Script to import CSV files from services • This will increase in manageability and deployability of scripts on services • Need to implement partial import for DLRLOG data.
  • 12. Indexing Bean Solr can also import bean type for indexing , in Services we build bean of Every MT and DLR , we can Directly import them on Solr. This could increase into unneccesary load , as API will index bean per messages. Index Data call per MT and DLR API 15 SOLR
  • 13. Data Import Handler We can import data to index in solr from mysql , we can do this in two ways , disctributed or centeral SOLR MYSQL MYSQL SOLR MYSQL MYSQL SOLR SOLR MYSQL MYSQL MYSQL SOLR MYSQL
  • 14. Import CSV We can import data to index in solr from each services , we can also do this in two ways , disctributed or centeral SCRIPT Service 1 SOLR SCRIPT Service 1 SOLR Service 1 SCRIPT SOLR ….......... Service 1 SCRIPT SOLR
  • 15. Challenges Every Import Scenario advantages trade off with some disadvantages and challenges . For Example – DIH : Data Import handler require joins with sql query to import data from mttextlog,mtlog and dlrlog. • Or we can get messageid from Solr and query again to mysql for complete data with in clause query for message id. – CSV Import : It requires scripts to be deployed on every service server and lots of managebablity of files proccessed or not proccessed. – BEAN Import : It requires changes at API level and could result into