SlideShare une entreprise Scribd logo
1  sur  11
Apache Lucene
Full text search
Marcelo
What’s that?
 API created on 00’s
 Apache owns that
 Indexing
 Searching
 Available on Java, .NET, C++
Why is that so good?
 Enhance user experience
 More inteligent products
 Speed processing
 Relevance
 Efficient
 Suggestions
Indexing
 IndexWritter
1. Directory implementation
2. Analizer
 Create documents
 Add these document to IndexWritter
 Optimize (merge segments)
 Close writter
Indexing
Searching
 Directory
 IndexSearcher
 QueryParser
 Query(“my search”)
 TopDocs
Searching
How does that work?
 Inverted Index
 Term Normalization
1. Similar words (merge)
2. Stop words (remove)
3. +relevance –size on disk
Term Document Ids
And 1,2,3
Big 2,4,7
Fire 1
Keep 7,8
keeper 3,4
the 1,8
Analyzers
“@Andy52 went to school yesterday!”
 StandardAnalyzer
[@Andy52] [went] [school] [yesterday!]
 StopAnalyzer
[Andy] [went] [school] [yesterday]
 SimpleAnalyzer
[andy] [went] [to] [school] [yesterday]
 WhitespaceAnalyzer
[@Andy52] [went] [to] [school] [yesterday]
 KeywordAnalyzer
[@Andy52 went to school yesterday!]
What known apps use that?
 Twitter
 Linked In
 My Space
That’s all, thanks!

Contenu connexe

Similaire à Apache lucene - full text search

Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
GokulD
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearch
TO THE NEW | Technology
 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.net
Willem Meints
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
WO Community
 

Similaire à Apache lucene - full text search (20)

Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearch
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
ProjectHub
ProjectHubProjectHub
ProjectHub
 
Episerver and search engines
Episerver and search enginesEpiserver and search engines
Episerver and search engines
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Elastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case EviraElastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case Evira
 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.net
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 

Plus de Marcelo Cure

SciPy - Scientific Computing Tool
SciPy - Scientific Computing ToolSciPy - Scientific Computing Tool
SciPy - Scientific Computing Tool
Marcelo Cure
 
Test driven development
Test driven developmentTest driven development
Test driven development
Marcelo Cure
 

Plus de Marcelo Cure (16)

Api design
Api designApi design
Api design
 
Zero mq
Zero mqZero mq
Zero mq
 
Dev ops engineering and chatbots
Dev ops engineering and chatbotsDev ops engineering and chatbots
Dev ops engineering and chatbots
 
Versioning APIs
Versioning APIsVersioning APIs
Versioning APIs
 
Building restful ap is with harvester js
Building restful ap is with harvester jsBuilding restful ap is with harvester js
Building restful ap is with harvester js
 
Cqrs, event sourcing and microservices
Cqrs, event sourcing and microservicesCqrs, event sourcing and microservices
Cqrs, event sourcing and microservices
 
Immutability and immutable js
Immutability and immutable jsImmutability and immutable js
Immutability and immutable js
 
Functional programming with python
Functional programming with pythonFunctional programming with python
Functional programming with python
 
Polymer
PolymerPolymer
Polymer
 
Hexagonal Architecture
Hexagonal ArchitectureHexagonal Architecture
Hexagonal Architecture
 
What's the value of the metrics
What's the value of the metricsWhat's the value of the metrics
What's the value of the metrics
 
Scala
ScalaScala
Scala
 
SciPy - Scientific Computing Tool
SciPy - Scientific Computing ToolSciPy - Scientific Computing Tool
SciPy - Scientific Computing Tool
 
Test driven development
Test driven developmentTest driven development
Test driven development
 
Usability testing
Usability testingUsability testing
Usability testing
 
Corona
CoronaCorona
Corona
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Apache lucene - full text search

  • 1. Apache Lucene Full text search Marcelo
  • 2. What’s that?  API created on 00’s  Apache owns that  Indexing  Searching  Available on Java, .NET, C++
  • 3. Why is that so good?  Enhance user experience  More inteligent products  Speed processing  Relevance  Efficient  Suggestions
  • 4. Indexing  IndexWritter 1. Directory implementation 2. Analizer  Create documents  Add these document to IndexWritter  Optimize (merge segments)  Close writter
  • 6. Searching  Directory  IndexSearcher  QueryParser  Query(“my search”)  TopDocs
  • 8. How does that work?  Inverted Index  Term Normalization 1. Similar words (merge) 2. Stop words (remove) 3. +relevance –size on disk Term Document Ids And 1,2,3 Big 2,4,7 Fire 1 Keep 7,8 keeper 3,4 the 1,8
  • 9. Analyzers “@Andy52 went to school yesterday!”  StandardAnalyzer [@Andy52] [went] [school] [yesterday!]  StopAnalyzer [Andy] [went] [school] [yesterday]  SimpleAnalyzer [andy] [went] [to] [school] [yesterday]  WhitespaceAnalyzer [@Andy52] [went] [to] [school] [yesterday]  KeywordAnalyzer [@Andy52 went to school yesterday!]
  • 10. What known apps use that?  Twitter  Linked In  My Space