Slides from the 25th october talk at the 2019 Plone conference.
Elasticsearch allows Plone to rely on a search engine that is scalable and performant, compared to the regular search feature.
After an introduction about the strengths of the Elastic Stack, you will see how to take advantage of analyzers, tokenizers, custom and boost scoring, geo-search and.
Driving Behavioral Change for Information Management through Data-Driven Gree...
Faster and better search results with Elasticsearch
1. FASTER AND BETTER SEARCHFASTER AND BETTER SEARCH
RESULTS WITH ELASTICSEARCHRESULTS WITH ELASTICSEARCH
TAKE YOUR SITE-WIDE SEARCHES TO THE NEXT LEVELTAKE YOUR SITE-WIDE SEARCHES TO THE NEXT LEVEL
1
2. WEB SITE SEARCHWEB SITE SEARCH
Search across different fields (title, content,...);
show relevant results first;
2
3. WEB SITE SEARCHWEB SITE SEARCH
Search across different fields (title, content,...);
show relevant results first;
categorize results;
filter by various attributes;
2
4. WEB SITE SEARCHWEB SITE SEARCH
Search across different fields (title, content,...);
show relevant results first;
categorize results;
filter by various attributes;
withstand user typos;
treat synonyms as the same word;
2
5. WEB SITE SEARCHWEB SITE SEARCH
Search across different fields (title, content,...);
show relevant results first;
categorize results;
filter by various attributes;
withstand user typos;
treat synonyms as the same word;
be scalable;
be fault tolerant;
easy to deploy.
2
6. PLONE SITE SEARCHPLONE SITE SEARCH
ZCatalog:
fully integrated in Plone;
no advanced features (like synonyms support);
not very scalable.
3
7. PLONE SITE SEARCHPLONE SITE SEARCH
ZCatalog:
fully integrated in Plone;
no advanced features (like synonyms support);
not very scalable.
Apache Solr:
based on the Java search library Apache Lucene;
better results ranking;
advanced features;
more configurable;
some clustering support (using Zookeper)
3
8. PLONE SITE SEARCHPLONE SITE SEARCH
ZCatalog:
fully integrated in Plone;
no advanced features (like synonyms support);
not very scalable.
Apache Solr:
based on the Java search library Apache Lucene;
better results ranking;
advanced features;
more configurable;
some clustering support (using Zookeper)
Elasticsearch:
based (again) on Lucene;
similar search features of Solr
great scalability;
less XML, more JSON. 3
9. PLONE SITE SEARCHPLONE SITE SEARCH
ZCatalog:
fully integrated in Plone;
no advanced features (like synonyms support);
not very scalable.
Apache Solr: collective.solr, alm.solrindex
based on the Java search library Apache Lucene;
better results ranking;
advanced features;
more configurable;
some clustering support (using Zookeper)
Elasticsearch: collective.elasticsearch
based (again) on Lucene;
similar search features of Solr
great scalability;
less XML, more JSON. 3
10. ELASTIC STACKELASTIC STACK
Also know as ELK:
Elasticsearch,
Logstash,
Kibana,
Beats.
Two main class of use cases:
Almost static data: search engines,
Time series data: logs and metrics.
4
12. INDEX A DOCUMENTINDEX A DOCUMENT
POST plone/_doc
{
"title": "Getting started with plone and Elasticsearch",
"author": "Enrico Polesel",
"content": "We want to index the entire content of our Plone website into elasticsearch...",
"tags": ["plone", "search", "elasticsearch", "cluster", "performance", "high availability"],
"date": "2019-10-25T11:50:00+0200"
}
6
13. INDEX A DOCUMENTINDEX A DOCUMENT
POST plone/_doc
{
"title": "Getting started with plone and Elasticsearch",
"author": "Enrico Polesel",
"content": "We want to index the entire content of our Plone website into elasticsearch...",
"tags": ["plone", "search", "elasticsearch", "cluster", "performance", "high availability"],
"date": "2019-10-25T11:50:00+0200"
}
{
"_index" : "plone",
"_type" : "_doc",
"_id" : "Y0MZ7W0B3-sU3YTrncfM",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
6
14. DATA TYPESDATA TYPES
short, long,
float, double
IP
geopoint
interval, date_interval
keyword (not analyzed strings),
text (analyzed strings),
object, array, nested object,
...
7
15. DATA TYPESDATA TYPES
short, long,
float, double
IP
geopoint
interval, date_interval
keyword (not analyzed strings),
text (analyzed strings),
object, array, nested object,
...
7
16. DATA TYPESDATA TYPES
short, long,
float, double
IP
geopoint
interval, date_interval
keyword (not analyzed strings),
text (analyzed strings),
object, array, nested object,
...
7
17. DATA TYPESDATA TYPES
short, long,
float, double
IP
geopoint
interval, date_interval
keyword (not analyzed strings),
text (analyzed strings),
object, array, nested object,
...
7
18. DATA TYPESDATA TYPES
short, long,
float, double
IP
geopoint
interval, date_interval
keyword (not analyzed strings),
text (analyzed strings),
object, array, nested object,
...
7
19. DATA TYPESDATA TYPES
short, long,
float, double
IP
geopoint
interval, date_interval
keyword (not analyzed strings),
text (analyzed strings),
object, array, nested object,
...
7
20. DATA TYPESDATA TYPES
short, long,
float, double
IP
geopoint
interval, date_interval
keyword (not analyzed strings),
text (analyzed strings),
object, array, nested object,
...
7
22. ANALYZERSANALYZERS
1. Char filters
convert HTML escape codes
normalize unicode symbols
replace patterns
2. Tokenizer
separate on whitespaces
separate on punctuation
may be grammar based
may generate partial words
special tokenizer for special strings (like paths)
3. Token filters
normalize tokens
stemming
remove stopwords
translate synonyms 9
24. QUERY - FUZZY MATCHQUERY - FUZZY MATCH
With distance 1 we have:
Changing a character (box → fox)
Removing a character (black → lack)
Inserting a character (sic → sick)
Transposing two adjacent characters (act → cat)
GET plone/_search
{
"query": {
"match": {
"content": {
"query": "ploMe",
"fuzziness": 1
}
}
}
}
11
25. QUERY - MULTI MATCHQUERY - MULTI MATCH
Matches in the title field will be boosted!
GET plone/_search
{
"query": {
"multi_match": {
"query": "plome",
"fields": [ "tilte^2", "content" ],
"fuzziness": 1
}
}
}
12
31. RUNNING ELASTICSEARCHRUNNING ELASTICSEARCH
config/elasticsearch.yml
config/jvm.options
Docker, yum/apt, Windows and MacOS also supported!
See
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.0-linux-x86_64.tar.gz
$ tar -xf elasticsearch-7.4.0-linux-x86_64.tar.gz
$ cd elasticsearch-7.4.0-linux-x86_64
$ bin/elasticsearch
https://www.elastic.co/downloads/
18
32. RUNNING ELASTICSEARCHRUNNING ELASTICSEARCH
config/elasticsearch.yml
config/jvm.options
Docker, yum/apt, Windows and MacOS also supported!
See
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.0-linux-x86_64.tar.gz
$ tar -xf elasticsearch-7.4.0-linux-x86_64.tar.gz
$ cd elasticsearch-7.4.0-linux-x86_64
$ bin/elasticsearch
$ wget https://artifacts.elastic.co/downloads/kibana/kibana-7.4.0-linux-x86_64.tar.gz
$ tar -xf kibana-7.4.0-linux-x86_64.tar.gz
$ cd kibana-7.4.0-linux-x86_64
$ bin/kibana
https://www.elastic.co/downloads/
18
33. CLUSTERINGCLUSTERING
Need high availability? Install two data nodes! (replica is enabled by default)
Need more space? Increase the number of nodes! (and of indeces/shards)
Need more search performance? Increase the number of replicas!
Have disks of different type (fast/slow)? Use hot-cold architecture!
19