SlideShare une entreprise Scribd logo
1  sur  11
Introduction to Apache Lucene
by Shrikrishna parab
AGENDA
What is Apache Lucene ?
Focus of Apache Lucene
Lucene Architecture
Analyzers
Analysis Example
Demo
WHAT IS APACHE LUCENE?
 Apache Lucene is an open source Java based full-
text search engine.
 Lucene is not a Web application, but rather a code
library and API that can easily be used to add search
capabilities to applications.
 It is also known as Information Retrieval Library.
 Lucene is independent of the file format. Text from
PDFs, HTML, Word document can be indexed as
long as their textual information can be extracted.
FOCUS
 Indexing Documents
 Searching Documents
INDEXING DOCUMENTS
 What is Indexing?
1. Conversion to Plain text (for PDF, html files etc.)
2. Analysis (Convert the text into Tokens)
3. Index (Map the tokens into indexes)
SEARCHING DOCUMENTS
 What is Searching?
1. Take the User Input
2. Create a query
3. Query the index
4. Return the results
LUCENE ARCHITECTURE
ANALYZER
 Tokenizes the input text
 Common Analyzers
1. WhitespaceAnalyzer
Splits tokens on whitespace
2. SimpleAnalyzer
Splits tokens on non-letters, and then lowercases
3. StopAnalyzer
Same as SimpleAnalyzer, but also removes stop words
4. StandardAnalyzer
Most sophisticated analyzer that knows about certain token types,
lowercases, removes stop words
ANALYSIS EXAMPLES
“Boost is the Secrete of our Energy”
 Whitespace Analyzer
[Boost][is][the][Secrete][of][our][Energy]
 Simple Analyzer
[boost][is][the][secrete][of][our][energy]
 Stop Analyzer
[boost][secrete][energy]
 Standard Analyzer
[boost][secrete][energy]
DEMO OF SIMPLE INDEXING AND SEARCHING
USING APACHE LUCENE
Thank You

Contenu connexe

Tendances

DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simonlucenerevolution
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into ElasticsearchKnoldus Inc.
 
LinkedList vs ArrayList in Java | Edureka
LinkedList vs ArrayList in Java | EdurekaLinkedList vs ArrayList in Java | Edureka
LinkedList vs ArrayList in Java | EdurekaEdureka!
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014Cognitum
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNodeXperts
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Ontology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondOntology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondPeter Geil
 

Tendances (20)

ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Bert
BertBert
Bert
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
LinkedList vs ArrayList in Java | Edureka
LinkedList vs ArrayList in Java | EdurekaLinkedList vs ArrayList in Java | Edureka
LinkedList vs ArrayList in Java | Edureka
 
MongoDB
MongoDBMongoDB
MongoDB
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
Postgresql Federation
Postgresql FederationPostgresql Federation
Postgresql Federation
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Ontology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondOntology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyond
 

En vedette

Building a Lambda Architecture with Elasticsearch at Yieldbot
Building a Lambda Architecture with Elasticsearch at YieldbotBuilding a Lambda Architecture with Elasticsearch at Yieldbot
Building a Lambda Architecture with Elasticsearch at Yieldbotyieldbot
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 

En vedette (8)

Building a Lambda Architecture with Elasticsearch at Yieldbot
Building a Lambda Architecture with Elasticsearch at YieldbotBuilding a Lambda Architecture with Elasticsearch at Yieldbot
Building a Lambda Architecture with Elasticsearch at Yieldbot
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 

Similaire à Introduction to apache lucene

Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneSwapnil & Patil
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1 GokulD
 
Lucene - The Powerful Information Retrieval Library
Lucene - The Powerful Information Retrieval  LibraryLucene - The Powerful Information Retrieval  Library
Lucene - The Powerful Information Retrieval LibraryLucky Sharma
 
Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索longkeyy
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and SolrGrant Ingersoll
 
JavaEdge09 : Java Indexing and Searching
JavaEdge09 : Java Indexing and SearchingJavaEdge09 : Java Indexing and Searching
JavaEdge09 : Java Indexing and SearchingShay Sofer
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerIOSR Journals
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 
SwiftRiver 2011 Overview
SwiftRiver 2011 OverviewSwiftRiver 2011 Overview
SwiftRiver 2011 OverviewUshahidi
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Swift Web Services Overiview
Swift Web Services OveriviewSwift Web Services Overiview
Swift Web Services OveriviewUshahidi
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to KibanaVineet .
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonChetan Giridhar
 

Similaire à Introduction to apache lucene (20)

Apache lucene
Apache luceneApache lucene
Apache lucene
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
Lucene - The Powerful Information Retrieval Library
Lucene - The Powerful Information Retrieval  LibraryLucene - The Powerful Information Retrieval  Library
Lucene - The Powerful Information Retrieval Library
 
Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
JavaEdge09 : Java Indexing and Searching
JavaEdge09 : Java Indexing and SearchingJavaEdge09 : Java Indexing and Searching
JavaEdge09 : Java Indexing and Searching
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal Computer
 
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
SwiftRiver 2011 Overview
SwiftRiver 2011 OverviewSwiftRiver 2011 Overview
SwiftRiver 2011 Overview
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Swift Web Services Overiview
Swift Web Services OveriviewSwift Web Services Overiview
Swift Web Services Overiview
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to Kibana
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
 

Plus de Shrikrishna Parab

Plus de Shrikrishna Parab (8)

Extended WordNet
Extended WordNetExtended WordNet
Extended WordNet
 
BabelNet 3.0
BabelNet 3.0BabelNet 3.0
BabelNet 3.0
 
News articles classification
News articles classificationNews articles classification
News articles classification
 
Play with probability
Play with probabilityPlay with probability
Play with probability
 
Network scanner
Network  scannerNetwork  scanner
Network scanner
 
Indestructible self healing circuits
Indestructible self healing circuitsIndestructible self healing circuits
Indestructible self healing circuits
 
Gamification
GamificationGamification
Gamification
 
Embedded dram
Embedded dramEmbedded dram
Embedded dram
 

Dernier

Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdfKamal Acharya
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoninghotman30312
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2T.D. Shashikala
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoAbhimanyu Sangale
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024EMMANUELLEFRANCEHELI
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdfKamal Acharya
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineJulioCesarSalazarHer1
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineAftabkhan575376
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Lovely Professional University
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)NareenAsad
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdfKamal Acharya
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdfKamal Acharya
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfragupathi90
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfMadan Karki
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashidFaiyazSheikh
 
BORESCOPE INSPECTION for engins CFM56.pdf
BORESCOPE INSPECTION for engins CFM56.pdfBORESCOPE INSPECTION for engins CFM56.pdf
BORESCOPE INSPECTION for engins CFM56.pdfomarzaboub1997
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxRashidFaridChishti
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxalijaker017
 

Dernier (20)

Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoning
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of Arduino
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission line
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
BORESCOPE INSPECTION for engins CFM56.pdf
BORESCOPE INSPECTION for engins CFM56.pdfBORESCOPE INSPECTION for engins CFM56.pdf
BORESCOPE INSPECTION for engins CFM56.pdf
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 

Introduction to apache lucene

  • 1. Introduction to Apache Lucene by Shrikrishna parab
  • 2. AGENDA What is Apache Lucene ? Focus of Apache Lucene Lucene Architecture Analyzers Analysis Example Demo
  • 3. WHAT IS APACHE LUCENE?  Apache Lucene is an open source Java based full- text search engine.  Lucene is not a Web application, but rather a code library and API that can easily be used to add search capabilities to applications.  It is also known as Information Retrieval Library.  Lucene is independent of the file format. Text from PDFs, HTML, Word document can be indexed as long as their textual information can be extracted.
  • 4. FOCUS  Indexing Documents  Searching Documents
  • 5. INDEXING DOCUMENTS  What is Indexing? 1. Conversion to Plain text (for PDF, html files etc.) 2. Analysis (Convert the text into Tokens) 3. Index (Map the tokens into indexes)
  • 6. SEARCHING DOCUMENTS  What is Searching? 1. Take the User Input 2. Create a query 3. Query the index 4. Return the results
  • 8. ANALYZER  Tokenizes the input text  Common Analyzers 1. WhitespaceAnalyzer Splits tokens on whitespace 2. SimpleAnalyzer Splits tokens on non-letters, and then lowercases 3. StopAnalyzer Same as SimpleAnalyzer, but also removes stop words 4. StandardAnalyzer Most sophisticated analyzer that knows about certain token types, lowercases, removes stop words
  • 9. ANALYSIS EXAMPLES “Boost is the Secrete of our Energy”  Whitespace Analyzer [Boost][is][the][Secrete][of][our][Energy]  Simple Analyzer [boost][is][the][secrete][of][our][energy]  Stop Analyzer [boost][secrete][energy]  Standard Analyzer [boost][secrete][energy]
  • 10. DEMO OF SIMPLE INDEXING AND SEARCHING USING APACHE LUCENE