SlideShare une entreprise Scribd logo
Angel Borroy
Software Engineer
March 2020
A Practical
Introduction to
Apache SOLR
CODELAB
22
Requirements
Java Runtime Environment 1.8+
$ java -version
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
Supported Operating Systems
• Linux
• MacOS
• Windows
https://lucene.apache.org/solr/downloads.html
https://www.slideshare.net/angelborroy/a-practical-introduction-to-apache-solr
33
A Practical Introduction to Apache SOLR
• Open Source
• What is SOLR
• Key SOLR Concepts
• SOLR Lab
• Quick References
NEOCOM 2020
44
5
Why should you use Open Source?
• State of the Art Technologies
• Community Support
• Vast Documentation
• Code is accessible
• Customizable
• Mostly free licensing
6
Why should you contribute to Open Source?
• Share Knowledge and Ideas
• Improve established Technologies
• Become part of a Community
• Not only code, all your skills are relevant
• Be useful to the World
77
8
What is SOLR
• A Search Engine
• A REST-like API
• Built on Lucene
• Open Source
• Blazing-fast
• Scalable
• Fault tolerant
9
Why SOLR
Scalable
Solr scales by distributing work (indexing and query processing) to multiple servers in a cluster.
Ready to deploy
Solr is open source, is easy to install and configure, and provides a preconfigured example to help you get
started.
Optimized for search
Solr is fast and can execute complex queries in subsecond speed, often only tens of milliseconds.
Large volumes of documents
Solr is designed to deal with indexes containing many millions of documents.
Text-centric
Solr is optimized for searching natural-language text, like emails, web pages, resumes, PDF documents,
and social messages such as tweets or blogs.
Results sorted by relevance
Solr returns documents in ranked order based on how relevant each document is to the user’s query.
10
Lucene based Search Engines
Amazon
Elasticsearch
Service
11
Features overview
• Pagination and sorting
• Faceting
• Autosuggest
• Spell-checking
• Highlighting
• Geospatial search
• More Like This
12
Features overview
• Flexible query support
• Document clustering
• Import rich document formats (PDF, Office…)
• Import data from databases
• Multilingual support
DIH
Data Import Handler
13
Companies using SOLR
1414
Key SOLR Concepts
15
Key SOLR Concepts
• Documents
• Searching
• Relevancy
• Precision and Recall
• Searching at Scale STORAGE RETRIEVAL
Tracking
Indexing
Query
16
Lucene Document
• Documents are the unit of information for
indexing and search
• A Document is a set of fields
• Each field has a name and a value
• All field types must be defined, and all field
names (or dynamic field-naming patterns)
should be specified in Solr’s schema.xml
Seminars
Schema Configuration
• Per collection/index
• Xml file
• Define how the inverted Index will be built
• Fields/Field Types definition
Seminars
Schema Configuration
• Per collection/index
• Xml file
• Define how the inverted Index will be built
• Fields/Field Types definition
DOCUMENT
FIELD
17
Lucene Document – Search problem
The Beginner’s Guide to Buying a House
How to Buy Your First House
Purchasing a Home
Becoming a New Home owner
Buying a New Home
Decorating Your Home
A Fun Guide to Cooking
How to Raise a Child
Buying a New Car
SELECT * FROM Books WHERE Name = 'buying a new home’;
0 results
SELECT * FROM Books
WHERE Name LIKE '%buying%’
AND Name LIKE '%a%’
AND Name LIKE '%home%’;
1 result
Buying a New Home
SELECT * FROM Books
WHERE Name LIKE '%buying%’
OR Name LIKE '%a%’
OR Name LIKE '%home%’;
8 results
A Fun Guide to Cooking, Decorating Your Home, How to Raise a Child, Buying a New Car,
Buying a New Home, The Beginner’s Guide to Buying a House, Purchasing a Home,
Becoming a New Home owner
Unimportant words
Synonyms
Linguistic variations
Ordering
18
Lucene Document – Inverted Index
Doc # Content field Term Doc #
1 A Fun Guide to Cooking a 1,3,4,5,6,7,8
2 Decorating Your Home becoming 8
3 How to Raise a Child beginner’s 6
4 Buying a New Car buy 9
5 Buying a New Home buying 4,5,6
6 The Beginner’s Guide to Buying a House child 3
7 Purchasing a Home cooking 1
8 Becoming a New Home Owner decorating 2
9 How to Buy Your First House home 2,5,7,8
house 6,9
how 3,9
new 4,5,8
purchasing 7
your 2,9
INVERTED
INDEX
19
Searching
TERM DOCS
buying 4,5,6,7,9
home 2,5,6,7,8,9
Unimportant word “a” is skipped
Synonyms purchasing ~ buying
Linguistic variations buy ~ buying
Synonyms house ~ home
(AND) = 5,6,7,9
Buying a New Home
The Beginner’s Guide to Buying a House
Purchasing a Home
How to Buy Your First House
20
Searching operators
• Required terms
• Optional terms
• Negated terms
• Phrases
• Grouped expressions
• Fuzzy matching
• Wildcard
• Range
• Distance
• Proximity
buying AND home
buying OR home
buying NOT home
“buying a home”
(buying OR renting) AND home
offi* off*r off?r
yearsOld:[18-21]
administrator~
“chief officer”~1
21
Relevancy till SOLR 4 (TF/IDF)
A relevancy score for each document is calculated and the search results are sorted from the highest score to the lowest.
Similarity
Term frequency
• A document is more relevant for a particular term if the term appears multiple times
Inverse document frequency
• Measure of how “rare” a search term is, is calculated by finding the document frequency (how many total documents
the search term appears within)
Boosting
• Multiplier in query time to adjust the weight of a field
• title:solr^2.5 description:solr
Normalization factors for fields, queries and coord
Ordering
22
Relevancy from SOLR 6 (BM25)
BM25 improves upon TF/IDF
BM25 stands for “Best Match 25” (25th iteration on TF/IDF)
Includes different factors
• Frequency of a term in all Documents
• Term Frequency in a Document
• Document Length
BM25 limits influence of term frequency:
• less influence of commonwords
With TF/IDF: short fields (title,...) are automatically scored higher
BM25: Scales field length with average
• field length treatment does not automatically boost short fields
Ordering
23
Precision and Recall
Precision is a measure of how “good” each of the results of a query is. A query that returns one single
correct document out of a million other correct documents is still considered perfectly precise.
Recall is a measure of how many of the correct documents are returned. A query that returns one
single correct document out of a million other correct documents is considered a very poor recall
scoring.
>> Precision and Recall balance will improve the quality of your search results.
20 correct documents
Search results containing 10 documents
(8 correct and 2 incorrect)
Precision = 80% (8 / 10)
Recall = 40% (8 / 20)
What is the precision and
recall for the
previous ”buying a home”
sample?
24
Searching at Scale
Scaling SOLR
Solr is able to scale to handle
billions of documents and an
infinite number of queries
by adding servers.
Some limitations
• You can insert, delete, and update documents, but not single fields (easily)
• Solr is not optimized for processing quite long queries (thousands of terms) or returning quite
large result sets to users.
2525
Lab
26
Requirements
• Java Runtime Environment 1.8+
$ java -version
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
• Supported Operating Systems
• Linux
• MacOS
• Windows
https://lucene.apache.org/solr/downloads.html
27
Directory layout
bin/
• solr | solr.cmd : start SOLR
• post : posting content to SOLR
• solr.in.sh | solr.in.cmd : configuration
contrib/
• add-ons plugins
dist/
• SOLR Jar files
docs/
• JavaDocs
example/
• CSV, XML and JSON
• DIH for databases
• Word and PDF files
licenses/
• 3rd party libraries
server/
• SOLR Admin UI
• Jetty Libraries
• Log files
• Sample configsets
28
Starting SOLR
• Use the command line interface tool called bin/solr (Linux) or binsolr.cmd (Windows)
$ bin/solr start -p 8983
Waiting up to 180 seconds to see Solr running on port 8983 []
Started Solr server on port 8983 (pid=4521). Happy searching!
• Check if Solr is Running
$ bin/solr status
Found 1 Solr nodes:
Solr process 4521 running on port 8983
{
"solr_home":"/Users/aborroy/Downloads/solr-introduction-university/solr-8.4.1/server/solr",
"version":"8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28",
"startTime":"2020-03-08T08:13:49.969Z",
"uptime":"0 days, 0 hours, 17 minutes, 56 seconds",
"memory":"91.6 MB (%17.9) of 512 MB"}
29
The SOLR Admin Web Interface
http://127.0.0.1:8983/solr/#/
30
Creating a new Core
$ bin/solr create -c films
• -c indicates the collection name
Check default fields added by SOLR to the Schema >>>>>>>
Check JSON Data to be posted in example/films/films.json
{
"id": "/en/45_2006",
"directed_by": [
"Gary Lennon"
],
"initial_release_date": "2006-11-30",
"genre": [
"Black comedy",
"Thriller"
],
"name": ".45"
}
31
Posting data
$ bin/post -c films example/films/films.json
Posting files to [base] url http://localhost:8983/solr/films/update...
POSTing file films.json (application/json) to [base]/json/docs
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/films/update/json/docs
SimplePostTool: WARNING: Response: {
"responseHeader":{
"status":400,
"QTime":120},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","java.lang.NumberFormatException"],
"msg":"ERROR: [doc=/en/quien_es_el_senor_lopez] Error adding field 'name'='¿Quién es el señor López?' msg=For input string:
"¿Quién es el señor López?"",
"code":400}}
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for
URL: http://localhost:8983/solr/films/update/json/docs
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/films/update...
Time spent: 0:00:00.323
32
How many results were posted?
http://127.0.0.1:8983/solr/films/select?indent=on&q=*:*&wt=json
• q: query event
• fq: filter queries
• sort: asc or desc
• start, rows: offset and number of rows
• fl: list of fields to return
• wt: response in XML or JSON
33
What was wrong?
Check carefully JSON Data to be posted in example/films/films.json
{
"id": "/en/quien_es_el_senor_lopez",
"directed_by": [
"Luis Mandoki"
],
"genre": [
"Documentary film"
],
"name": "u00bfQuiu00e9n es el seu00f1or Lu00f3pez?"
},
http://127.0.0.1:8983/solr/#/films/schema?field=name
34
Auto-Generated SOLR Schema
http://127.0.0.1:8983/solr/#/films/files?file=managed-schema
A single document might
contain multiple values
for this field type
The value of the field
can be used in queries
to retrieve matching
documents (true by
default)
SOLR rejects any
attempts to add a
document which does
not have a value for this
field
The actual value of the
field can be retrieved by
queries
name can contain text!
35
Re-Creating the Core
Deleting core “films”
$ bin/solr delete -c films
Deleting core 'films' using command:
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=film
s&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true
Creating core “films”
$ bin/solr create -c films
Created new core 'films’
Creating the field “name” for the core “films”
http://127.0.0.1:8983/solr/#/films/schema
36
Posting Data 2
$ bin/post -c films example/films/films.json
Posting files to [base] url http://localhost:8983/solr/films/update...
POSTing file films.json (application/json) to [base]/json/docs
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/films/update...
Time spent: 0:00:00.417
http://127.0.0.1:8983/solr/films/select?indent=on&q=*:*&wt=json
37
Exploring SOLR Analyzers
• Solr analyzes both index content and query input before matching the results
• The live analysis can be observed by using “Analysis” option from Solr Admin UI
38
Exploring SOLR Analyzers
• Using the right locale will produce better results
39
Searching
q = genre:Fantasy directed_by:"Robert Zemeckis"
• This query is searching for both genre Fantasy and directed by Robert Zemeckis (OR is default operator)
40
Filtering
q = genre:Fantasy
fq = initial_release_date:[NOW-12YEAR TO *]
• This query is searching for both genre Fantasy in the latest 12 years
41
Sorting
q = *:*
sort = initial_release_date desc
• This query is ordering all the films by release date in descent order
42
Fuzzy Edit
q = directed_by:Zemeckis
q = directed_by:Zemekis~1
q = directed_by:Zemequis~2
43
Faceting
q = *:*
fq = genre:epic
facet = on
facet_field = directed_by_str
http://127.0.0.1:8983/solr/films/select?facet.field=directed_by_str
&facet=on&facet.mincount=1&fq=genre:epic&indent=on&q=*:*
&wt=json
44
Faceting
Multiple fields for faceting
http://127.0.0.1:8983/solr/films/select?facet.field=directed_by_str&facet.field=genre&facet=on&indent=on&q=*:*&wt=js
on
45
Highlighting
q = genre:epic
hl = on
hl.fl = genre
46
Indexing Documents
Create a new collection
$ bin/solr create -c files -d example/files/conf
Posting Word and PDF Documents
% bin/post -c files ../Documents
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory ../Documents (3 files, depth=0)
POSTing file Non-text-searchable.pdf (application/pdf) to [base]/extract
POSTing file Sample-Document.pdf (application/pdf) to [base]/extract
POSTing file Sample-Document-scoring.docx (application/vnd.openxmlformats-
officedocument.wordprocessingml.document) to [base]/extract
3 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/files/update...
Time spent: 0:00:06.338
47
Searching documents
q = video
48
Documents : ExtractingUpdateRequestHandler
The magic happens in files/conf/solrconfig.xml
<!-- Solr Cell Update Request Handler
http://wiki.apache.org/solr/ExtractingRequestHandler
-->
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="xpath">/xhtml:html/xhtml:body/descendant:node()</str>
<str name="capture">content</str>
<str name="fmap.meta">attr_meta_</str>
<str name="uprefix">attr_</str>
<str name="lowernames">true</str>
</lst>
</requestHandler>
4949
Alfresco using Apache SOLR
50
Alfresco uses an Angular app to get results from SOLR
ADF
Angular App
Repository
REST API
SOLR
IndexesFilesDB
User
51
Alfresco Content Application
5252
References
53
Quick References
SOLR
• https://lucene.apache.org/solr/resources.html#documentation
• https://www.manning.com/books/solr-in-action
• https://github.com/treygrainger/solr-in-action
“Let’s Build an Inverted Index: Introduction to Apache Lucene/Solr” by Sease
• https://www.slideshare.net/SeaseLtd/lets-build-an-inverted-index-introduction-to-apache-lucenesolr
Source code
• https://github.com/apache/lucene-solr
• https://cwiki.apache.org/confluence/display/solr/HowToContribute
This presentation
• https://www.slideshare.net/angelborroy/a-practical-introduction-to-apache-solr
Angel Borroy
Software Engineer
March 2020
A Practical
Introduction to
Apache SOLR
CODELAB

Contenu connexe

Tendances

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
MIJIN AN
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
Angel Borroy López
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan Blue
Databricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Flink Forward
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
University of California, Santa Cruz
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
Rommel Garcia
 
ELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log Management
El Mahdi Benzekri
 
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
Symphony Software Foundation
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Local Apache NiFi Processor Debug
Local Apache NiFi Processor DebugLocal Apache NiFi Processor Debug
Local Apache NiFi Processor Debug
Deon Huang
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
Gaurav Verma
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 

Tendances (20)

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan Blue
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
ELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log Management
 
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Local Apache NiFi Processor Debug
Local Apache NiFi Processor DebugLocal Apache NiFi Processor Debug
Local Apache NiFi Processor Debug
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 

Similaire à A Practical Introduction to Apache Solr

Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
lucenerevolution
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
Jungsu Heo
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Big Search 4 Big Data War Stories
Big Search 4 Big Data War StoriesBig Search 4 Big Data War Stories
Big Search 4 Big Data War Stories
OpenSource Connections
 
Distributed Logging Architecture in the Container Era
Distributed Logging Architecture in the Container EraDistributed Logging Architecture in the Container Era
Distributed Logging Architecture in the Container Era
Glenn Davis
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
SATOSHI TAGOMORI
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
Orbit Patent Search
Orbit   Patent SearchOrbit   Patent Search
Orbit Patent Search
Nurjahan M
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Lucidworks
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platform
Luis Cabaceira
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
Apache Geode
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
Daniel Coupal
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
Ike Ellis
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
Andrew Siemer
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
Ivo Andreev
 
Engage 2020 - Best Practices for analyzing Domino Applications
Engage 2020 - Best Practices for analyzing Domino ApplicationsEngage 2020 - Best Practices for analyzing Domino Applications
Engage 2020 - Best Practices for analyzing Domino Applications
panagenda
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Lucidworks (Archived)
 
Active Directory Fundamentals
Active Directory FundamentalsActive Directory Fundamentals
Active Directory Fundamentals
Angie Miller
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 

Similaire à A Practical Introduction to Apache Solr (20)

Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Big Search 4 Big Data War Stories
Big Search 4 Big Data War StoriesBig Search 4 Big Data War Stories
Big Search 4 Big Data War Stories
 
Distributed Logging Architecture in the Container Era
Distributed Logging Architecture in the Container EraDistributed Logging Architecture in the Container Era
Distributed Logging Architecture in the Container Era
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Orbit Patent Search
Orbit   Patent SearchOrbit   Patent Search
Orbit Patent Search
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platform
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
 
Engage 2020 - Best Practices for analyzing Domino Applications
Engage 2020 - Best Practices for analyzing Domino ApplicationsEngage 2020 - Best Practices for analyzing Domino Applications
Engage 2020 - Best Practices for analyzing Domino Applications
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
Active Directory Fundamentals
Active Directory FundamentalsActive Directory Fundamentals
Active Directory Fundamentals
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 

Plus de Angel Borroy López

Transitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearchTransitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearch
Angel Borroy López
 
Alfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 EuropeAlfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Angel Borroy López
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Angel Borroy López
 
Using Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms togetherUsing Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms together
Angel Borroy López
 
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Angel Borroy López
 
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
Angel Borroy López
 
Docker Init with Templates for Alfresco
Docker Init with Templates for AlfrescoDocker Init with Templates for Alfresco
Docker Init with Templates for Alfresco
Angel Borroy López
 
Before & After Docker Init
Before & After Docker InitBefore & After Docker Init
Before & After Docker Init
Angel Borroy López
 
Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0
Angel Borroy López
 
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseHow to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
Angel Borroy López
 
Using Podman with Alfresco
Using Podman with AlfrescoUsing Podman with Alfresco
Using Podman with Alfresco
Angel Borroy López
 
CSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud NativeCSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud Native
Angel Borroy López
 
Alfresco Embedded Activiti Engine
Alfresco Embedded Activiti EngineAlfresco Embedded Activiti Engine
Alfresco Embedded Activiti Engine
Angel Borroy López
 
Alfresco Transform Core 3.0.0
Alfresco Transform Core 3.0.0Alfresco Transform Core 3.0.0
Alfresco Transform Core 3.0.0
Angel Borroy López
 
Collaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoCollaborative Editing Tools for Alfresco
Collaborative Editing Tools for Alfresco
Angel Borroy López
 
Desarrollando una Extensión para Docker
Desarrollando una Extensión para DockerDesarrollando una Extensión para Docker
Desarrollando una Extensión para Docker
Angel Borroy López
 
DockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdfDockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdf
Angel Borroy López
 
Deploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP PlatformsDeploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP Platforms
Angel Borroy López
 
Introduction to AWS
Introduction to AWSIntroduction to AWS
Introduction to AWS
Angel Borroy López
 
Alfresco Certificates
Alfresco Certificates Alfresco Certificates
Alfresco Certificates
Angel Borroy López
 

Plus de Angel Borroy López (20)

Transitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearchTransitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearch
 
Alfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 EuropeAlfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 Europe
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Using Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms togetherUsing Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms together
 
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
 
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
 
Docker Init with Templates for Alfresco
Docker Init with Templates for AlfrescoDocker Init with Templates for Alfresco
Docker Init with Templates for Alfresco
 
Before & After Docker Init
Before & After Docker InitBefore & After Docker Init
Before & After Docker Init
 
Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0
 
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseHow to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
 
Using Podman with Alfresco
Using Podman with AlfrescoUsing Podman with Alfresco
Using Podman with Alfresco
 
CSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud NativeCSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud Native
 
Alfresco Embedded Activiti Engine
Alfresco Embedded Activiti EngineAlfresco Embedded Activiti Engine
Alfresco Embedded Activiti Engine
 
Alfresco Transform Core 3.0.0
Alfresco Transform Core 3.0.0Alfresco Transform Core 3.0.0
Alfresco Transform Core 3.0.0
 
Collaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoCollaborative Editing Tools for Alfresco
Collaborative Editing Tools for Alfresco
 
Desarrollando una Extensión para Docker
Desarrollando una Extensión para DockerDesarrollando una Extensión para Docker
Desarrollando una Extensión para Docker
 
DockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdfDockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdf
 
Deploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP PlatformsDeploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP Platforms
 
Introduction to AWS
Introduction to AWSIntroduction to AWS
Introduction to AWS
 
Alfresco Certificates
Alfresco Certificates Alfresco Certificates
Alfresco Certificates
 

Dernier

一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
TaghreedAltamimi
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
Ayan Halder
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Envertis Software Solutions
 

Dernier (20)

一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
 

A Practical Introduction to Apache Solr

  • 1. Angel Borroy Software Engineer March 2020 A Practical Introduction to Apache SOLR CODELAB
  • 2. 22 Requirements Java Runtime Environment 1.8+ $ java -version openjdk version "11.0.2" 2019-01-15 OpenJDK Runtime Environment 18.9 (build 11.0.2+9) OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode) Supported Operating Systems • Linux • MacOS • Windows https://lucene.apache.org/solr/downloads.html https://www.slideshare.net/angelborroy/a-practical-introduction-to-apache-solr
  • 3. 33 A Practical Introduction to Apache SOLR • Open Source • What is SOLR • Key SOLR Concepts • SOLR Lab • Quick References NEOCOM 2020
  • 4. 44
  • 5. 5 Why should you use Open Source? • State of the Art Technologies • Community Support • Vast Documentation • Code is accessible • Customizable • Mostly free licensing
  • 6. 6 Why should you contribute to Open Source? • Share Knowledge and Ideas • Improve established Technologies • Become part of a Community • Not only code, all your skills are relevant • Be useful to the World
  • 7. 77
  • 8. 8 What is SOLR • A Search Engine • A REST-like API • Built on Lucene • Open Source • Blazing-fast • Scalable • Fault tolerant
  • 9. 9 Why SOLR Scalable Solr scales by distributing work (indexing and query processing) to multiple servers in a cluster. Ready to deploy Solr is open source, is easy to install and configure, and provides a preconfigured example to help you get started. Optimized for search Solr is fast and can execute complex queries in subsecond speed, often only tens of milliseconds. Large volumes of documents Solr is designed to deal with indexes containing many millions of documents. Text-centric Solr is optimized for searching natural-language text, like emails, web pages, resumes, PDF documents, and social messages such as tweets or blogs. Results sorted by relevance Solr returns documents in ranked order based on how relevant each document is to the user’s query.
  • 10. 10 Lucene based Search Engines Amazon Elasticsearch Service
  • 11. 11 Features overview • Pagination and sorting • Faceting • Autosuggest • Spell-checking • Highlighting • Geospatial search • More Like This
  • 12. 12 Features overview • Flexible query support • Document clustering • Import rich document formats (PDF, Office…) • Import data from databases • Multilingual support DIH Data Import Handler
  • 15. 15 Key SOLR Concepts • Documents • Searching • Relevancy • Precision and Recall • Searching at Scale STORAGE RETRIEVAL Tracking Indexing Query
  • 16. 16 Lucene Document • Documents are the unit of information for indexing and search • A Document is a set of fields • Each field has a name and a value • All field types must be defined, and all field names (or dynamic field-naming patterns) should be specified in Solr’s schema.xml Seminars Schema Configuration • Per collection/index • Xml file • Define how the inverted Index will be built • Fields/Field Types definition Seminars Schema Configuration • Per collection/index • Xml file • Define how the inverted Index will be built • Fields/Field Types definition DOCUMENT FIELD
  • 17. 17 Lucene Document – Search problem The Beginner’s Guide to Buying a House How to Buy Your First House Purchasing a Home Becoming a New Home owner Buying a New Home Decorating Your Home A Fun Guide to Cooking How to Raise a Child Buying a New Car SELECT * FROM Books WHERE Name = 'buying a new home’; 0 results SELECT * FROM Books WHERE Name LIKE '%buying%’ AND Name LIKE '%a%’ AND Name LIKE '%home%’; 1 result Buying a New Home SELECT * FROM Books WHERE Name LIKE '%buying%’ OR Name LIKE '%a%’ OR Name LIKE '%home%’; 8 results A Fun Guide to Cooking, Decorating Your Home, How to Raise a Child, Buying a New Car, Buying a New Home, The Beginner’s Guide to Buying a House, Purchasing a Home, Becoming a New Home owner Unimportant words Synonyms Linguistic variations Ordering
  • 18. 18 Lucene Document – Inverted Index Doc # Content field Term Doc # 1 A Fun Guide to Cooking a 1,3,4,5,6,7,8 2 Decorating Your Home becoming 8 3 How to Raise a Child beginner’s 6 4 Buying a New Car buy 9 5 Buying a New Home buying 4,5,6 6 The Beginner’s Guide to Buying a House child 3 7 Purchasing a Home cooking 1 8 Becoming a New Home Owner decorating 2 9 How to Buy Your First House home 2,5,7,8 house 6,9 how 3,9 new 4,5,8 purchasing 7 your 2,9 INVERTED INDEX
  • 19. 19 Searching TERM DOCS buying 4,5,6,7,9 home 2,5,6,7,8,9 Unimportant word “a” is skipped Synonyms purchasing ~ buying Linguistic variations buy ~ buying Synonyms house ~ home (AND) = 5,6,7,9 Buying a New Home The Beginner’s Guide to Buying a House Purchasing a Home How to Buy Your First House
  • 20. 20 Searching operators • Required terms • Optional terms • Negated terms • Phrases • Grouped expressions • Fuzzy matching • Wildcard • Range • Distance • Proximity buying AND home buying OR home buying NOT home “buying a home” (buying OR renting) AND home offi* off*r off?r yearsOld:[18-21] administrator~ “chief officer”~1
  • 21. 21 Relevancy till SOLR 4 (TF/IDF) A relevancy score for each document is calculated and the search results are sorted from the highest score to the lowest. Similarity Term frequency • A document is more relevant for a particular term if the term appears multiple times Inverse document frequency • Measure of how “rare” a search term is, is calculated by finding the document frequency (how many total documents the search term appears within) Boosting • Multiplier in query time to adjust the weight of a field • title:solr^2.5 description:solr Normalization factors for fields, queries and coord Ordering
  • 22. 22 Relevancy from SOLR 6 (BM25) BM25 improves upon TF/IDF BM25 stands for “Best Match 25” (25th iteration on TF/IDF) Includes different factors • Frequency of a term in all Documents • Term Frequency in a Document • Document Length BM25 limits influence of term frequency: • less influence of commonwords With TF/IDF: short fields (title,...) are automatically scored higher BM25: Scales field length with average • field length treatment does not automatically boost short fields Ordering
  • 23. 23 Precision and Recall Precision is a measure of how “good” each of the results of a query is. A query that returns one single correct document out of a million other correct documents is still considered perfectly precise. Recall is a measure of how many of the correct documents are returned. A query that returns one single correct document out of a million other correct documents is considered a very poor recall scoring. >> Precision and Recall balance will improve the quality of your search results. 20 correct documents Search results containing 10 documents (8 correct and 2 incorrect) Precision = 80% (8 / 10) Recall = 40% (8 / 20) What is the precision and recall for the previous ”buying a home” sample?
  • 24. 24 Searching at Scale Scaling SOLR Solr is able to scale to handle billions of documents and an infinite number of queries by adding servers. Some limitations • You can insert, delete, and update documents, but not single fields (easily) • Solr is not optimized for processing quite long queries (thousands of terms) or returning quite large result sets to users.
  • 26. 26 Requirements • Java Runtime Environment 1.8+ $ java -version openjdk version "11.0.2" 2019-01-15 OpenJDK Runtime Environment 18.9 (build 11.0.2+9) OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode) • Supported Operating Systems • Linux • MacOS • Windows https://lucene.apache.org/solr/downloads.html
  • 27. 27 Directory layout bin/ • solr | solr.cmd : start SOLR • post : posting content to SOLR • solr.in.sh | solr.in.cmd : configuration contrib/ • add-ons plugins dist/ • SOLR Jar files docs/ • JavaDocs example/ • CSV, XML and JSON • DIH for databases • Word and PDF files licenses/ • 3rd party libraries server/ • SOLR Admin UI • Jetty Libraries • Log files • Sample configsets
  • 28. 28 Starting SOLR • Use the command line interface tool called bin/solr (Linux) or binsolr.cmd (Windows) $ bin/solr start -p 8983 Waiting up to 180 seconds to see Solr running on port 8983 [] Started Solr server on port 8983 (pid=4521). Happy searching! • Check if Solr is Running $ bin/solr status Found 1 Solr nodes: Solr process 4521 running on port 8983 { "solr_home":"/Users/aborroy/Downloads/solr-introduction-university/solr-8.4.1/server/solr", "version":"8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28", "startTime":"2020-03-08T08:13:49.969Z", "uptime":"0 days, 0 hours, 17 minutes, 56 seconds", "memory":"91.6 MB (%17.9) of 512 MB"}
  • 29. 29 The SOLR Admin Web Interface http://127.0.0.1:8983/solr/#/
  • 30. 30 Creating a new Core $ bin/solr create -c films • -c indicates the collection name Check default fields added by SOLR to the Schema >>>>>>> Check JSON Data to be posted in example/films/films.json { "id": "/en/45_2006", "directed_by": [ "Gary Lennon" ], "initial_release_date": "2006-11-30", "genre": [ "Black comedy", "Thriller" ], "name": ".45" }
  • 31. 31 Posting data $ bin/post -c films example/films/films.json Posting files to [base] url http://localhost:8983/solr/films/update... POSTing file films.json (application/json) to [base]/json/docs SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/films/update/json/docs SimplePostTool: WARNING: Response: { "responseHeader":{ "status":400, "QTime":120}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.lang.NumberFormatException"], "msg":"ERROR: [doc=/en/quien_es_el_senor_lopez] Error adding field 'name'='¿Quién es el señor López?' msg=For input string: "¿Quién es el señor López?"", "code":400}} SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/films/update/json/docs 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/films/update... Time spent: 0:00:00.323
  • 32. 32 How many results were posted? http://127.0.0.1:8983/solr/films/select?indent=on&q=*:*&wt=json • q: query event • fq: filter queries • sort: asc or desc • start, rows: offset and number of rows • fl: list of fields to return • wt: response in XML or JSON
  • 33. 33 What was wrong? Check carefully JSON Data to be posted in example/films/films.json { "id": "/en/quien_es_el_senor_lopez", "directed_by": [ "Luis Mandoki" ], "genre": [ "Documentary film" ], "name": "u00bfQuiu00e9n es el seu00f1or Lu00f3pez?" }, http://127.0.0.1:8983/solr/#/films/schema?field=name
  • 34. 34 Auto-Generated SOLR Schema http://127.0.0.1:8983/solr/#/films/files?file=managed-schema A single document might contain multiple values for this field type The value of the field can be used in queries to retrieve matching documents (true by default) SOLR rejects any attempts to add a document which does not have a value for this field The actual value of the field can be retrieved by queries name can contain text!
  • 35. 35 Re-Creating the Core Deleting core “films” $ bin/solr delete -c films Deleting core 'films' using command: http://localhost:8983/solr/admin/cores?action=UNLOAD&core=film s&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true Creating core “films” $ bin/solr create -c films Created new core 'films’ Creating the field “name” for the core “films” http://127.0.0.1:8983/solr/#/films/schema
  • 36. 36 Posting Data 2 $ bin/post -c films example/films/films.json Posting files to [base] url http://localhost:8983/solr/films/update... POSTing file films.json (application/json) to [base]/json/docs 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/films/update... Time spent: 0:00:00.417 http://127.0.0.1:8983/solr/films/select?indent=on&q=*:*&wt=json
  • 37. 37 Exploring SOLR Analyzers • Solr analyzes both index content and query input before matching the results • The live analysis can be observed by using “Analysis” option from Solr Admin UI
  • 38. 38 Exploring SOLR Analyzers • Using the right locale will produce better results
  • 39. 39 Searching q = genre:Fantasy directed_by:"Robert Zemeckis" • This query is searching for both genre Fantasy and directed by Robert Zemeckis (OR is default operator)
  • 40. 40 Filtering q = genre:Fantasy fq = initial_release_date:[NOW-12YEAR TO *] • This query is searching for both genre Fantasy in the latest 12 years
  • 41. 41 Sorting q = *:* sort = initial_release_date desc • This query is ordering all the films by release date in descent order
  • 42. 42 Fuzzy Edit q = directed_by:Zemeckis q = directed_by:Zemekis~1 q = directed_by:Zemequis~2
  • 43. 43 Faceting q = *:* fq = genre:epic facet = on facet_field = directed_by_str http://127.0.0.1:8983/solr/films/select?facet.field=directed_by_str &facet=on&facet.mincount=1&fq=genre:epic&indent=on&q=*:* &wt=json
  • 44. 44 Faceting Multiple fields for faceting http://127.0.0.1:8983/solr/films/select?facet.field=directed_by_str&facet.field=genre&facet=on&indent=on&q=*:*&wt=js on
  • 45. 45 Highlighting q = genre:epic hl = on hl.fl = genre
  • 46. 46 Indexing Documents Create a new collection $ bin/solr create -c files -d example/files/conf Posting Word and PDF Documents % bin/post -c files ../Documents Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log Entering recursive mode, max depth=999, delay=0s Indexing directory ../Documents (3 files, depth=0) POSTing file Non-text-searchable.pdf (application/pdf) to [base]/extract POSTing file Sample-Document.pdf (application/pdf) to [base]/extract POSTing file Sample-Document-scoring.docx (application/vnd.openxmlformats- officedocument.wordprocessingml.document) to [base]/extract 3 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/files/update... Time spent: 0:00:06.338
  • 48. 48 Documents : ExtractingUpdateRequestHandler The magic happens in files/conf/solrconfig.xml <!-- Solr Cell Update Request Handler http://wiki.apache.org/solr/ExtractingRequestHandler --> <requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler" > <lst name="defaults"> <str name="xpath">/xhtml:html/xhtml:body/descendant:node()</str> <str name="capture">content</str> <str name="fmap.meta">attr_meta_</str> <str name="uprefix">attr_</str> <str name="lowernames">true</str> </lst> </requestHandler>
  • 50. 50 Alfresco uses an Angular app to get results from SOLR ADF Angular App Repository REST API SOLR IndexesFilesDB User
  • 53. 53 Quick References SOLR • https://lucene.apache.org/solr/resources.html#documentation • https://www.manning.com/books/solr-in-action • https://github.com/treygrainger/solr-in-action “Let’s Build an Inverted Index: Introduction to Apache Lucene/Solr” by Sease • https://www.slideshare.net/SeaseLtd/lets-build-an-inverted-index-introduction-to-apache-lucenesolr Source code • https://github.com/apache/lucene-solr • https://cwiki.apache.org/confluence/display/solr/HowToContribute This presentation • https://www.slideshare.net/angelborroy/a-practical-introduction-to-apache-solr
  • 54. Angel Borroy Software Engineer March 2020 A Practical Introduction to Apache SOLR CODELAB