SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
99 Problems, But
The Search Ain’t One
Andrei Zmievski • PHP UK •!Feb 25, 2011
who am I?
 curl http://localhost:9200/speaker/info/andrei


{“name”:       “Andrei Zmievski”,
 “projects”:   [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”],
 “likes”:      [“coding”, “beer”, “brewing”, “photography”],
 “twitter”:    “@a”,
 “email”:      “andrei@zmievski.org”}
what is elasticsearch?

a search engine for the NoSQL generation

  domain-driven

  distributed

  RESTful

  Hitchhiker’s Guide to the Galaxy (no, really)
document model


document-oriented

JSON-based

schema-free
engine


based on Lucene

multi-tenancy

distributed, out of the box
nomenclature

index

type

document

  _id

node
3 easy steps
1. index
           !"#$%&'()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<=
           >
request




           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N=


           >
response




           %%%%?1:?/-#"9
           %%%%?OB7<9P?/?!178?
           %%%%?O-I.9?/?3.92:9#?
           %%%%?OB<?/?;?
           N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE                                total number of hits
           %%?,B-3?%/%>
           !!!!"#$#%&"!'!()
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
                                                       the index of the doc
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           !!!!!!"*+,-./"!'!"0$,1")
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>                              the type of the doc
           %%%%%%?OB7<9P?%/%?!178?E
           !!!!!!"*#23."!'!"43.%5.6")
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")                             the id of the doc
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")                             the id of the doc
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")
           %%%%%%?O3!1#9?%/%6UV46LM64E                            the hit score
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%?-11:?%/%TE
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           !!!!!!"*+-"!'!"7")
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
                                                                  the original source
           8
           !!!!",%9."'!":,-6.+!;9+.<45+")
           !!!!"#%&5"'!"==!>6$?&.94)!?@#!#A.!B.%60A!:+,C#!D,.")
           !!!!"&+5.4"'!E"0$-+,F")!"?..6")!"3A$#$F6%3A2"G)
           !!!!"#H+##.6"'!"%")
           !!!!"A.+FA#"'!(IJ
           K%N%J%N%N
2. search
request



           !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#


           >%"#$$5"!'!L)
           %%?O3,2#<3?%/%>
           %%%%?-1-2$?%/%;E                             the execution time
           %%%%?3"!!9338"$?%/%;E
           %%%%?82B$9<?%/%6
           %%NE
           %%?,B-3?%/%>
           %%%%?-1-2$?%/%;E
response




           %%%%?@2PO3!1#9?%/%6UV46LM64E
           %%%%?,B-3?%/%G%>
           %%%%%%?OB7<9P?%/%?!178?E
           %%%%%%?O-I.9?%/%?3.92:9#?E
           %%%%%%?OB<?%/%?5?E
           %%%%%%?O3!1#9?%/%6UV46LM64E
           %%%%%%?O31"#!9?%/%
           >
           %%%%?72@9?/%?A7<#9B%C@B9D3:B?E
           %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
           %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
           %%%%?-KB--9#?/%?2?E
           %%%%?,9BH,-?/%;LM
           N%N%J%N%N
3. profit


that’s up to you
demo
distributed model


provides:

  performance

  resiliency (high-availability)
shards
a portion of the document space

each one is a separate Lucene index

  thus, many per-index settings are available

document is sharded by its _id value

  but can be assigned (routed) to a shard
  deterministically
zero-conf discovery


zen (multicast and unicast)

cloud (EC2 via API)
auto-routing

master node:

  maintains cluster state

  reassigns shards if nodes leave/join cluster

any node can serve as the request router

the query is handled via scatter-gather mechanism
replicas

each shard can have 1 or more replicas

# of replicas can be updated dynamically after
index creation

replicas can be used for querying in parallel
shard allocation
               node 1




       start with a single node
shard allocation
                node 1
                 person1
                 person2




      PUT /person {
         “index”: {
            “number_of_shards”: 2,
            “number_of_replicas”: 1
      }}
shard allocation
       node 1          node 2
       person1         person1
       person2         person2




        start the second node
shard allocation
node 1    node 2         node 3   node 4
person1   person1
person2   person2




            start 2 more nodes
shard allocation
node 1    node 2         node 3    node 4
person1                  person1
          person2                  person2




            start 2 more nodes
document sharding
node 1    node 2         node 3     node 4
person1                   person1
          person2                   person2




            PUT /person/info/1
            {…}
document sharding
     node 1         node 2         node 3     node 4
     person1                        person1
                    person2                   person2




                      PUT /person/info/1
hashed to shard 1     {…}
document sharding
node 1    node 2         node 3      node 4
person1                   person1
          person2                    person2




                        replicated

            PUT /person/info/1
            {…}
document sharding
node 1    node 2         node 3     node 4
person1                   person1
          person2                   person2




            PUT /person/info/2
            {…}
document sharding
node 1         node 2            node 3     node 4
person1                           person1
               person2                      person2




hashed to shard 2
                    PUT /person/info/2
                    {…}
document sharding
node 1    node 2         node 3      node 4
person1                   person1
          person2                    person2




                                    replicated

            PUT /person/info/2
            {…}
scatter-gather
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
shard allocation
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
shard allocation
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
shard allocation
node 1          node 2        node 3          node 4
person1                        person1
                person2                       person2




          GET /person/_search?q=name:thomas
transactional model

per-document consistency

no need to commit/flush

uses write-behind transaction log

write consistency (W) can be controlled

  one, quorum, or all
(near) real-time search


1 second refresh rate by default

_refresh API also
index storage

node data considered transient

can be stored in local file system, JVM heap,
native OS memory, or FS & memory combination

persistent storage requires a gateway
gateways
persistent store for cluster state and indices

asynchronous, translog-based write strategy

allows full recovery if a cluster restart is needed

supported gateways:
  local
  shared FS
  Hadoop via HDFS
  S3
mapping
describes document structure to the search
engine

automatically created with sensible defaults

explicit mapping can be provided (generally, a
good idea)

can run into merge conflicts
mapping

important meta fields:

  _source

  _all

  _boost
mapping types

simple:

  string, integer/long, float/double, boolean, and
  null)

complex:

  array, object
sample mapping
document



           >?"39#?/%%%%%%?<9#B!:?E
           %?-B-$9?/%%%%%?W17X-%(27B!?E
           %?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE
           %?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E
           %?.#B1#B-I?/%%5N



           >?.13-?/%>
           %%?.#1.9#-B93?%/%>
mapping




           %%%%?"39#?/%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE
           %%%%?@9332H9?/%>?-I.9?/%?3-#B7H?E%[F113-/%;UVNE
           %%%%?-2H3?/%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE
           %%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9/%[71NE
           %%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N
           NNN
analyzers
break down (tokenize) and normalize fields during
indexing and query strings at search time

analyzer = tokenizer + token filters (0 or more)
*-27<2#<%A72$IZ9#%S
%%%*-27<2#<%+1:97BZ9#%]
%%%%%%%*-27<2#<%+1:97%^B$-9#%]
%%%%%%%_1K9#!239%+1:97%^B$-9#%]
%%%%%%%*-1.%+1:97%^B$-9#
analyzers
                            analyzers, tokenizers, and filters can be
                            customized
mapping elasticsearch.yml




                            B7<9P/
                            %%272$I3B3/
                            %%%%272$IZ9#/
                            %%%%%%.@&%,F/
                            %%%%%%%%-I.9/%!"3-1@
                            %%%%%%%%-1:97BZ9#/%3-27<2#<
                            %%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E
                            %%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J


                            `
                            ?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE
                            `
API
API conventions


append ?pretty=true to get readable JSON

boolean values: false/0/off = false, rest is true

JSONP support via callback parameter
API structure

http://host:port/[index]/[type]/[_action/id]

 GET http://es:9200/_status

 GET http://es:9200/twitter/_status

 POST http://es:9200/twitter/tweet/1

 GET http://es:9200/twitter/tweet/1
API structure
http://host:port/[index]/[type]/[_action/id]

 GET http://es:9200/twitter/tweet/_search

 GET http://es:9200/twitter/user/_search

 GET http://es:9200/twitter/tweet,user/_search

 GET http://es:9200/twitter,facebook/_search

 GET http://es:9200/_search
_cluster API structure

GET /_cluster/health

GET /_cluster/health/index1,index2

GET /_cluster/nodes/stats

GET /_cluster/nodes/nodeId1,nodeId2/stats
API {core}
index             search

bulk               query

delete             from/size paging

delete by query    sort

get                highlighting

count              selective fields
API {indices}
create           optimize

delete           snapshot

open/close       update settings

get/put/delete   analyze
mapping
                 status
refresh
                 flush
API {cluster}

health

state

nodes info

nodes stats

nodes shutdown
Query DSL
term / terms   query_string

range            default_operator

prefix            analyzer

bool             phrase_slop

fuzzy            etc

wildcard
filters


share some similar features with queries (term,
range, etc)

why use a filter?
filters
faster than queries

cached (depends on the filter)

  the cache is used for different queries against
  the same filter

no scoring

more useful ones: term, terms, range, prefix, and,
or, not, exists, missing, query
facets

provide aggregated data based on the search
request

terms, histogram, date histogram, range,
statistical, and more
geo search

implemented as filters (and a facet)

  geo_distance

  geo_bounding_box

  geo_polygon
interfaces
REST

  including memcached

Java /!Groovy

Language clients (REST/Thrift):

  pyes, PHP (standalone and symfony), Ruby, Perl

Flume sink implementation
elastica

similar to the other PHP ElasticSearch client

API naming is consistent with Zend Framework

can be extended for new filters, facets, etc

still under development
elastica
          $es = new Elastica_Client('vm', 9200);
          $index = new Elastica_Index($es, 'test');
          $index->create(array(), true);
          $type = new Elastica_Type($index, 'person');
          $doc = new Elastica_Document(1, array('name' => 'Andrei Zmievski',
example




                                                 'email' => 'andrei@test.com',
                                                 'username' => 'andrei',
                                                 'bills' => array(2, 3, 5)));
          $type->addDocument($doc);

          $qs = new Elastica_Query_QueryString('andrei');
          $query = new Elastica_Query($qs);
          $resultSet = $type->search($query);
          print $resultSet->count();
data import

ES is not the primary data store (usually)

to import/synchronize data:

  write an agent (Gearman, message queues, etc)

  use rivers (CouchDB, RabbitMQ, Twitter)
10 more features
versioning          load balancing nodes

index aliases       plugins

parent/child docs   more_like_this

scripting           multi_field mapping

dynamic mapping     percolation
templates
References

http://github.com/elasticsearch/elasticsearch

http://www.elasticsearch.org/community/forum

IRC: #elasticsearch on irc.freenode.net

twitter: @elasticsearch


             HTTP://ZMIEVSKI.ORG/TALKS

Contenu connexe

En vedette

The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXAndrea Iacono
 
How to build_a_search_engine
How to build_a_search_engineHow to build_a_search_engine
How to build_a_search_engineAndrea Iacono
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data OutOpenThink Labs
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningPetar Djekic
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6Andrei Zmievski
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 
Building a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneBuilding a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneMirko Calvaresi
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 

En vedette (10)

The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphX
 
How to build_a_search_engine
How to build_a_search_engineHow to build_a_search_engine
How to build_a_search_engine
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out
 
Andrei's Regex Clinic
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex Clinic
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Building a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneBuilding a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and Lucene
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Dernier

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Dernier (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

99 Problems, But The Search Ain't One

  • 1. 99 Problems, But The Search Ain’t One Andrei Zmievski • PHP UK •!Feb 25, 2011
  • 2. who am I? curl http://localhost:9200/speaker/info/andrei {“name”: “Andrei Zmievski”, “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “andrei@zmievski.org”}
  • 3. what is elasticsearch? a search engine for the NoSQL generation domain-driven distributed RESTful Hitchhiker’s Guide to the Galaxy (no, really)
  • 8. 1. index !"#$%&'()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<= > request %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N= > response %%%%?1:?/-#"9 %%%%?OB7<9P?/?!178? %%%%?O-I.9?/?3.92:9#? %%%%?OB<?/?;? N
  • 9. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 10. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE total number of hits %%?,B-3?%/%> !!!!"#$#%&"!'!() response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 11. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E the index of the doc response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> !!!!!!"*+,-./"!'!"0$,1") %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 12. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> the type of the doc %%%%%%?OB7<9P?%/%?!178?E !!!!!!"*#23."!'!"43.%5.6") %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 13. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") the id of the doc %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 14. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") the id of the doc %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 15. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") %%%%%%?O3!1#9?%/%6UV46LM64E the hit score %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 16. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!'!"7") %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% the original source 8 !!!!",%9."'!":,-6.+!;9+.<45+") !!!!"#%&5"'!"==!>6$?&.94)!?@#!#A.!B.%60A!:+,C#!D,.") !!!!"&+5.4"'!E"0$-+,F")!"?..6")!"3A$#$F6%3A2"G) !!!!"#H+##.6"'!"%") !!!!"A.+FA#"'!(IJ K%N%J%N%N
  • 17. 2. search request !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%"#$$5"!'!L) %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E the execution time %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  • 19. demo
  • 20. distributed model provides: performance resiliency (high-availability)
  • 21. shards a portion of the document space each one is a separate Lucene index thus, many per-index settings are available document is sharded by its _id value but can be assigned (routed) to a shard deterministically
  • 22. zero-conf discovery zen (multicast and unicast) cloud (EC2 via API)
  • 23. auto-routing master node: maintains cluster state reassigns shards if nodes leave/join cluster any node can serve as the request router the query is handled via scatter-gather mechanism
  • 24. replicas each shard can have 1 or more replicas # of replicas can be updated dynamically after index creation replicas can be used for querying in parallel
  • 25. shard allocation node 1 start with a single node
  • 26. shard allocation node 1 person1 person2 PUT /person { “index”: { “number_of_shards”: 2, “number_of_replicas”: 1 }}
  • 27. shard allocation node 1 node 2 person1 person1 person2 person2 start the second node
  • 28. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 start 2 more nodes
  • 29. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 start 2 more nodes
  • 30. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/1 {…}
  • 31. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/1 hashed to shard 1 {…}
  • 32. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 replicated PUT /person/info/1 {…}
  • 33. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/2 {…}
  • 34. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 hashed to shard 2 PUT /person/info/2 {…}
  • 35. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 replicated PUT /person/info/2 {…}
  • 36. scatter-gather node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 37. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 38. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 39. shard allocation node 1 node 2 node 3 node 4 person1 person1 person2 person2 GET /person/_search?q=name:thomas
  • 40. transactional model per-document consistency no need to commit/flush uses write-behind transaction log write consistency (W) can be controlled one, quorum, or all
  • 41. (near) real-time search 1 second refresh rate by default _refresh API also
  • 42. index storage node data considered transient can be stored in local file system, JVM heap, native OS memory, or FS & memory combination persistent storage requires a gateway
  • 43. gateways persistent store for cluster state and indices asynchronous, translog-based write strategy allows full recovery if a cluster restart is needed supported gateways: local shared FS Hadoop via HDFS S3
  • 44. mapping describes document structure to the search engine automatically created with sensible defaults explicit mapping can be provided (generally, a good idea) can run into merge conflicts
  • 45. mapping important meta fields: _source _all _boost
  • 46. mapping types simple: string, integer/long, float/double, boolean, and null) complex: array, object
  • 47. sample mapping document >?"39#?/%%%%%%?<9#B!:?E %?-B-$9?/%%%%%?W17X-%(27B!?E %?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE %?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E %?.#B1#B-I?/%%5N >?.13-?/%> %%?.#1.9#-B93?%/%> mapping %%%%?"39#?/%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE %%%%?@9332H9?/%>?-I.9?/%?3-#B7H?E%[F113-/%;UVNE %%%%?-2H3?/%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE %%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9/%[71NE %%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N NNN
  • 48. analyzers break down (tokenize) and normalize fields during indexing and query strings at search time analyzer = tokenizer + token filters (0 or more) *-27<2#<%A72$IZ9#%S %%%*-27<2#<%+1:97BZ9#%] %%%%%%%*-27<2#<%+1:97%^B$-9#%] %%%%%%%_1K9#!239%+1:97%^B$-9#%] %%%%%%%*-1.%+1:97%^B$-9#
  • 49. analyzers analyzers, tokenizers, and filters can be customized mapping elasticsearch.yml B7<9P/ %%272$I3B3/ %%%%272$IZ9#/ %%%%%%.@&%,F/ %%%%%%%%-I.9/%!"3-1@ %%%%%%%%-1:97BZ9#/%3-27<2#< %%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E %%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J ` ?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE `
  • 50. API
  • 51. API conventions append ?pretty=true to get readable JSON boolean values: false/0/off = false, rest is true JSONP support via callback parameter
  • 52. API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/_status GET http://es:9200/twitter/_status POST http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/1
  • 53. API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/tweet/_search GET http://es:9200/twitter/user/_search GET http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search
  • 54. _cluster API structure GET /_cluster/health GET /_cluster/health/index1,index2 GET /_cluster/nodes/stats GET /_cluster/nodes/nodeId1,nodeId2/stats
  • 55. API {core} index search bulk query delete from/size paging delete by query sort get highlighting count selective fields
  • 56. API {indices} create optimize delete snapshot open/close update settings get/put/delete analyze mapping status refresh flush
  • 58. Query DSL term / terms query_string range default_operator prefix analyzer bool phrase_slop fuzzy etc wildcard
  • 59. filters share some similar features with queries (term, range, etc) why use a filter?
  • 60. filters faster than queries cached (depends on the filter) the cache is used for different queries against the same filter no scoring more useful ones: term, terms, range, prefix, and, or, not, exists, missing, query
  • 61. facets provide aggregated data based on the search request terms, histogram, date histogram, range, statistical, and more
  • 62. geo search implemented as filters (and a facet) geo_distance geo_bounding_box geo_polygon
  • 63. interfaces REST including memcached Java /!Groovy Language clients (REST/Thrift): pyes, PHP (standalone and symfony), Ruby, Perl Flume sink implementation
  • 64. elastica similar to the other PHP ElasticSearch client API naming is consistent with Zend Framework can be extended for new filters, facets, etc still under development
  • 65. elastica $es = new Elastica_Client('vm', 9200); $index = new Elastica_Index($es, 'test'); $index->create(array(), true); $type = new Elastica_Type($index, 'person'); $doc = new Elastica_Document(1, array('name' => 'Andrei Zmievski', example 'email' => 'andrei@test.com', 'username' => 'andrei', 'bills' => array(2, 3, 5))); $type->addDocument($doc); $qs = new Elastica_Query_QueryString('andrei'); $query = new Elastica_Query($qs); $resultSet = $type->search($query); print $resultSet->count();
  • 66. data import ES is not the primary data store (usually) to import/synchronize data: write an agent (Gearman, message queues, etc) use rivers (CouchDB, RabbitMQ, Twitter)
  • 67. 10 more features versioning load balancing nodes index aliases plugins parent/child docs more_like_this scripting multi_field mapping dynamic mapping percolation templates