SlideShare une entreprise Scribd logo
1  sur  15
Lucene in Action
Применение Lucene для
построения
высокопроизволительных систем
Гавриленко Евгений
Ведущий разработчик Artezio
Lucene
• Что же это такое?
• Twitter 1млрд запросов в день
• hh.ru 400 запросов в секунду
• LinkedIn, FedEx…
Основные компоненты индексации
• IndexWriter
• Directory (FSDirectory, RAMDirectory)
• Analyzer
• Document
• Field / Multivalued fields
Построение индекса
var directory = new RAMDirectory();
//var directory = FSDirectory.Open("/tmp/testindex");
var analyzer = new RussianAnalyzer(Version.LUCENE_30);
using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
for (var i = 0; i < 1000000; i++)
{
var doc = new Document();
doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));
doc.Add(new Field("text",string.Format("{0} строка 2.", i),Field.Store.YES,Field.Index.ANALYZED));
writer.AddDocument(doc);
if (i%100000 == 0)
Console.WriteLine("[{1}]: {0} документов сохранено.",i,DateTime.Now);
}
writer.Optimize();
}
Схема данных
var doc1 = new Document();
doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc1.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));
var field = new NumericField(“numericField1”, Field.Store.NO, true);
doc1.Add(field.SetDoubleValue(value));
var doc2 = new Document();
doc2.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc2.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));
doc2.Add(new Field(“blablaFild1", “blabla-body",Field.Store.YES,Field.Index.ANALYZED));
Основные компоненты поиска
• IndexSearcher/MultiSearcher/ParallelMultiSearcher
• Term
• Query
• TermQuery
• TopDocs
Query
• TermQuery
• MultiFieldQueryParser
• BooleanQuery
• NumericRangeQuery
• SpanQuery
• …
• QueryParser
Поиск
var reader = IndexReader.Open(directory, true);
var searcher = new IndexSearcher(reader);
var parser = new QueryParser(Version.LUCENE_30, "text", analyzer);
var query = parser.Parse("20 строку");
var hits = searcher.Search(query, 100);
Console.WriteLine("total hits: {0}", hits.TotalHits);
if (hits.TotalHits == 0) return;
var rdoc = reader.Document(hits.ScoreDocs[0].Doc);
Console.WriteLine("value:{0}", rdoc.Get("text"));
Поиск с сортировкой
switch (sl)
{
case "barcode":
case "code":
indexSort = new Sort(new SortField(sl, SortField.STRING,indexDir));
break;
case "price":
indexSort = new Sort(new SortField(sl, SortField.DOUBLE, indexDir));
break;
default:
indexSort = new Sort(new SortField(sl, SortField.STRING, indexDir));
break;
}
...
searcher.SetDefaultFieldSortScoring(true,false);
var hits = searcher.Search(query, filter, count, indexSort);
Paging
Анализаторы
• StandardAnalyzer
• SnowballAnalyzer
• KeywordAnalyzer
• WhitespaceAnalyzer
• RussianAnalyzer ()
Применение в E-Commerce
Ecommerce
DB
Service/
Daemon
Lucene
Index
search
service
Search
backend
Linq to Lucene
public class Article
{
[Field(Analyzer = typeof(StandardAnalyzer))]
public string Author { get; set; }
[Field(Analyzer = typeof(StandardAnalyzer))]
public string Title { get; set; }
public DateTimeOffset PublishDate { get; set; }
[NumericField]
public long Id { get; set; }
[Field(IndexMode.NotIndexed, Store = StoreMode.Yes)]
public string BodyText { get; set; }
[Field("text", Store = StoreMode.No, Analyzer = typeof(PorterStemAnalyzer))]
public string SearchText
{
get { return string.Join(" ", new[] {Author, Title, BodyText}); }
}
}
Linq to Lucene
var directory = new RAMDirectory();
var provider = new LuceneDataProvider(directory, Version.LUCENE_30);
using (var session = provider.OpenSession<Article>())
{
session.Add(new Article {Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow});
}
var articles = provider.AsQueryable<Article>();
var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30));
var articlesByJohn = from a in articles
where a.Author == "John Doe" && a.PublishDate > threshold
orderby a.Title
select a;
Console.WriteLine("Articles by John Doe: " + articlesByJohn.Count());
var searchResults = from a in articles
where a.SearchText == "some search query"
select a;
Console.WriteLine("Search Results: " + searchResults.Count());
Полезные ресурсы
• Lucene http://lucene.apache.org/
• Lucene.Net http://lucenenet.apache.org
• Linq to Lucene https://github.com/themotleyfool/Lucene.Net.Linq
• “Lucene in Action” http://it-ebooks.info/book/2112

Contenu connexe

Tendances

Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptLucidworks
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkFlink Forward
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Lucidworks
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
SH 2 - SES 3 -  MongoDB Aggregation Framework.pptxSH 2 - SES 3 -  MongoDB Aggregation Framework.pptx
SH 2 - SES 3 - MongoDB Aggregation Framework.pptxMongoDB
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Roy Russo
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)MongoDB
 
Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkMichael Häusler
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014Roy Russo
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
ElasticSearch Basics
ElasticSearch BasicsElasticSearch Basics
ElasticSearch BasicsAmresh Singh
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Lucidworks
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query OptimizationMongoDB
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsOpenSource Connections
 

Tendances (20)

Dapper performance
Dapper performanceDapper performance
Dapper performance
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
SH 2 - SES 3 -  MongoDB Aggregation Framework.pptxSH 2 - SES 3 -  MongoDB Aggregation Framework.pptx
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and Flink
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
ElasticSearch Basics
ElasticSearch BasicsElasticSearch Basics
ElasticSearch Basics
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 

En vedette

Database reverse engineering
Database reverse engineeringDatabase reverse engineering
Database reverse engineeringDevOWL Meetup
 
SEO basics for developers
SEO basics for developersSEO basics for developers
SEO basics for developersDevOWL Meetup
 
Startup tactics for developers: A, B, C
Startup tactics for developers: A, B, CStartup tactics for developers: A, B, C
Startup tactics for developers: A, B, CDevOWL Meetup
 
Easily create apps using Phonegap
Easily create apps using PhonegapEasily create apps using Phonegap
Easily create apps using PhonegapDevOWL Meetup
 
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.jsTrainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.jsDevOWL Meetup
 
Как и зачем мы тестируем UI
Как и зачем мы тестируем UIКак и зачем мы тестируем UI
Как и зачем мы тестируем UIVyacheslav Lyalkin
 
ECMAScript 5 Features
ECMAScript 5 FeaturesECMAScript 5 Features
ECMAScript 5 FeaturesDevOWL Meetup
 
Потоковая репликация PostgreSQL
Потоковая репликация PostgreSQLПотоковая репликация PostgreSQL
Потоковая репликация PostgreSQLDevOWL Meetup
 
Async Module Definition via RequireJS
Async Module Definition via RequireJSAsync Module Definition via RequireJS
Async Module Definition via RequireJSDevOWL Meetup
 
AngularJS basics & theory
AngularJS basics & theoryAngularJS basics & theory
AngularJS basics & theoryDevOWL Meetup
 
Miscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentationMiscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentationVasilii Diachenko
 
Reactивная тяга
Reactивная тягаReactивная тяга
Reactивная тягаVitebsk Miniq
 
Как оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Как оценить время на тестирование. Александр Зиновьев, Test Lead SoftengiКак оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Как оценить время на тестирование. Александр Зиновьев, Test Lead SoftengiSoftengi
 

En vedette (17)

Database reverse engineering
Database reverse engineeringDatabase reverse engineering
Database reverse engineering
 
devOWL coffee-break
devOWL coffee-breakdevOWL coffee-break
devOWL coffee-break
 
SEO basics for developers
SEO basics for developersSEO basics for developers
SEO basics for developers
 
Startup tactics for developers: A, B, C
Startup tactics for developers: A, B, CStartup tactics for developers: A, B, C
Startup tactics for developers: A, B, C
 
HR VS DEV
HR VS DEVHR VS DEV
HR VS DEV
 
Bootstrap3 basics
Bootstrap3 basicsBootstrap3 basics
Bootstrap3 basics
 
Testing is coming
Testing is comingTesting is coming
Testing is coming
 
Easily create apps using Phonegap
Easily create apps using PhonegapEasily create apps using Phonegap
Easily create apps using Phonegap
 
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.jsTrainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
Trainspotting Transporting: RabbitMQ, Akka.NET, Rx, MVI, Cycle.js
 
Как и зачем мы тестируем UI
Как и зачем мы тестируем UIКак и зачем мы тестируем UI
Как и зачем мы тестируем UI
 
ECMAScript 5 Features
ECMAScript 5 FeaturesECMAScript 5 Features
ECMAScript 5 Features
 
Потоковая репликация PostgreSQL
Потоковая репликация PostgreSQLПотоковая репликация PostgreSQL
Потоковая репликация PostgreSQL
 
Async Module Definition via RequireJS
Async Module Definition via RequireJSAsync Module Definition via RequireJS
Async Module Definition via RequireJS
 
AngularJS basics & theory
AngularJS basics & theoryAngularJS basics & theory
AngularJS basics & theory
 
Miscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentationMiscosoft Singularity - konkurs presentation
Miscosoft Singularity - konkurs presentation
 
Reactивная тяга
Reactивная тягаReactивная тяга
Reactивная тяга
 
Как оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Как оценить время на тестирование. Александр Зиновьев, Test Lead SoftengiКак оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
Как оценить время на тестирование. Александр Зиновьев, Test Lead Softengi
 

Similaire à Lucene in Action

Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)Kira
 
Java Search Engine Framework
Java Search Engine FrameworkJava Search Engine Framework
Java Search Engine FrameworkAppsterdam Milan
 
DIY Percolator
DIY PercolatorDIY Percolator
DIY Percolatorjdhok
 
Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Rob Windsor
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-stepsMatteo Moci
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMJBug Italy
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An AnalysisJustin Finkelstein
 
Coherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherenceCoherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherencearagozin
 
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...SPTechCon
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 

Similaire à Lucene in Action (20)

Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Fast track to lucene
Fast track to luceneFast track to lucene
Fast track to lucene
 
Java Search Engine Framework
Java Search Engine FrameworkJava Search Engine Framework
Java Search Engine Framework
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
DIY Percolator
DIY PercolatorDIY Percolator
DIY Percolator
 
Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGM
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An Analysis
 
Coherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherenceCoherence SIG: Advanced usage of indexes in coherence
Coherence SIG: Advanced usage of indexes in coherence
 
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 

Plus de DevOWL Meetup

Что такое современная Frontend разработка
Что такое современная Frontend разработкаЧто такое современная Frontend разработка
Что такое современная Frontend разработкаDevOWL Meetup
 
CQRS and EventSourcing
CQRS and EventSourcingCQRS and EventSourcing
CQRS and EventSourcingDevOWL Meetup
 
Cага о сагах
Cага о сагахCага о сагах
Cага о сагахDevOWL Meetup
 
MeetupCamp Витебский летний митап 5-6 июля
MeetupCamp Витебский летний митап 5-6 июляMeetupCamp Витебский летний митап 5-6 июля
MeetupCamp Витебский летний митап 5-6 июляDevOWL Meetup
 
Обзор Haxe & OpenFl
Обзор Haxe & OpenFlОбзор Haxe & OpenFl
Обзор Haxe & OpenFlDevOWL Meetup
 
Recommerce изнутри
Recommerce изнутриRecommerce изнутри
Recommerce изнутриDevOWL Meetup
 
Google map markers with Symfony2
Google map markers with Symfony2Google map markers with Symfony2
Google map markers with Symfony2DevOWL Meetup
 

Plus de DevOWL Meetup (7)

Что такое современная Frontend разработка
Что такое современная Frontend разработкаЧто такое современная Frontend разработка
Что такое современная Frontend разработка
 
CQRS and EventSourcing
CQRS and EventSourcingCQRS and EventSourcing
CQRS and EventSourcing
 
Cага о сагах
Cага о сагахCага о сагах
Cага о сагах
 
MeetupCamp Витебский летний митап 5-6 июля
MeetupCamp Витебский летний митап 5-6 июляMeetupCamp Витебский летний митап 5-6 июля
MeetupCamp Витебский летний митап 5-6 июля
 
Обзор Haxe & OpenFl
Обзор Haxe & OpenFlОбзор Haxe & OpenFl
Обзор Haxe & OpenFl
 
Recommerce изнутри
Recommerce изнутриRecommerce изнутри
Recommerce изнутри
 
Google map markers with Symfony2
Google map markers with Symfony2Google map markers with Symfony2
Google map markers with Symfony2
 

Dernier

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Lucene in Action

  • 1. Lucene in Action Применение Lucene для построения высокопроизволительных систем Гавриленко Евгений Ведущий разработчик Artezio
  • 2. Lucene • Что же это такое? • Twitter 1млрд запросов в день • hh.ru 400 запросов в секунду • LinkedIn, FedEx…
  • 3. Основные компоненты индексации • IndexWriter • Directory (FSDirectory, RAMDirectory) • Analyzer • Document • Field / Multivalued fields
  • 4. Построение индекса var directory = new RAMDirectory(); //var directory = FSDirectory.Open("/tmp/testindex"); var analyzer = new RussianAnalyzer(Version.LUCENE_30); using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED)) { for (var i = 0; i < 1000000; i++) { var doc = new Document(); doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc.Add(new Field("text",string.Format("{0} строка 2.", i),Field.Store.YES,Field.Index.ANALYZED)); writer.AddDocument(doc); if (i%100000 == 0) Console.WriteLine("[{1}]: {0} документов сохранено.",i,DateTime.Now); } writer.Optimize(); }
  • 5. Схема данных var doc1 = new Document(); doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc1.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); var field = new NumericField(“numericField1”, Field.Store.NO, true); doc1.Add(field.SetDoubleValue(value)); var doc2 = new Document(); doc2.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc2.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc2.Add(new Field(“blablaFild1", “blabla-body",Field.Store.YES,Field.Index.ANALYZED));
  • 6. Основные компоненты поиска • IndexSearcher/MultiSearcher/ParallelMultiSearcher • Term • Query • TermQuery • TopDocs
  • 7. Query • TermQuery • MultiFieldQueryParser • BooleanQuery • NumericRangeQuery • SpanQuery • … • QueryParser
  • 8. Поиск var reader = IndexReader.Open(directory, true); var searcher = new IndexSearcher(reader); var parser = new QueryParser(Version.LUCENE_30, "text", analyzer); var query = parser.Parse("20 строку"); var hits = searcher.Search(query, 100); Console.WriteLine("total hits: {0}", hits.TotalHits); if (hits.TotalHits == 0) return; var rdoc = reader.Document(hits.ScoreDocs[0].Doc); Console.WriteLine("value:{0}", rdoc.Get("text"));
  • 9. Поиск с сортировкой switch (sl) { case "barcode": case "code": indexSort = new Sort(new SortField(sl, SortField.STRING,indexDir)); break; case "price": indexSort = new Sort(new SortField(sl, SortField.DOUBLE, indexDir)); break; default: indexSort = new Sort(new SortField(sl, SortField.STRING, indexDir)); break; } ... searcher.SetDefaultFieldSortScoring(true,false); var hits = searcher.Search(query, filter, count, indexSort);
  • 11. Анализаторы • StandardAnalyzer • SnowballAnalyzer • KeywordAnalyzer • WhitespaceAnalyzer • RussianAnalyzer ()
  • 13. Linq to Lucene public class Article { [Field(Analyzer = typeof(StandardAnalyzer))] public string Author { get; set; } [Field(Analyzer = typeof(StandardAnalyzer))] public string Title { get; set; } public DateTimeOffset PublishDate { get; set; } [NumericField] public long Id { get; set; } [Field(IndexMode.NotIndexed, Store = StoreMode.Yes)] public string BodyText { get; set; } [Field("text", Store = StoreMode.No, Analyzer = typeof(PorterStemAnalyzer))] public string SearchText { get { return string.Join(" ", new[] {Author, Title, BodyText}); } } }
  • 14. Linq to Lucene var directory = new RAMDirectory(); var provider = new LuceneDataProvider(directory, Version.LUCENE_30); using (var session = provider.OpenSession<Article>()) { session.Add(new Article {Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow}); } var articles = provider.AsQueryable<Article>(); var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30)); var articlesByJohn = from a in articles where a.Author == "John Doe" && a.PublishDate > threshold orderby a.Title select a; Console.WriteLine("Articles by John Doe: " + articlesByJohn.Count()); var searchResults = from a in articles where a.SearchText == "some search query" select a; Console.WriteLine("Search Results: " + searchResults.Count());
  • 15. Полезные ресурсы • Lucene http://lucene.apache.org/ • Lucene.Net http://lucenenet.apache.org • Linq to Lucene https://github.com/themotleyfool/Lucene.Net.Linq • “Lucene in Action” http://it-ebooks.info/book/2112