ElasticSearch is the new kid on the search block. Built on top of Lucene and adhering to the best concepts of so-called NoSQL movement, ElasticSearch is a distributed, highly available, fast RESTful search engine, ready to be plugged into Web applications.
21. shards
a portion of the document space
each one is a separate Lucene index
thus, many per-index settings are available
document is sharded by its _id value
but can be assigned (routed) to a shard
deterministically
23. auto-routing
master node:
maintains cluster state
reassigns shards if nodes leave/join cluster
any node can serve as the request router
the query is handled via scatter-gather mechanism
24. replicas
each shard can have 1 or more replicas
# of replicas can be updated dynamically after
index creation
replicas can be used for querying in parallel
42. index storage
node data considered transient
can be stored in local file system, JVM heap,
native OS memory, or FS & memory combination
persistent storage requires a gateway
43. gateways
persistent store for cluster state and indices
asynchronous, translog-based write strategy
allows full recovery if a cluster restart is needed
supported gateways:
local
shared FS
Hadoop via HDFS
S3
44. mapping
describes document structure to the search
engine
automatically created with sensible defaults
explicit mapping can be provided (generally, a
good idea)
can run into merge conflicts
48. analyzers
break down (tokenize) and normalize fields during
indexing and query strings at search time
analyzer = tokenizer + token filters (0 or more)
*-27<2#<%A72$IZ9#%S
%%%*-27<2#<%+1:97BZ9#%]
%%%%%%%*-27<2#<%+1:97%^B$-9#%]
%%%%%%%_1K9#!239%+1:97%^B$-9#%]
%%%%%%%*-1.%+1:97%^B$-9#
49. analyzers
analyzers, tokenizers, and filters can be
customized
mapping elasticsearch.yml
B7<9P/
%%272$I3B3/
%%%%272$IZ9#/
%%%%%%.@&%,F/
%%%%%%%%-I.9/%!"3-1@
%%%%%%%%-1:97BZ9#/%3-27<2#<
%%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E
%%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J
`
?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE
`
53. API structure
http://host:port/[index]/[type]/[_action/id]
GET http://es:9200/twitter/tweet/_search
GET http://es:9200/twitter/user/_search
GET http://es:9200/twitter/tweet,user/_search
GET http://es:9200/twitter,facebook/_search
GET http://es:9200/_search
54. _cluster API structure
GET /_cluster/health
GET /_cluster/health/index1,index2
GET /_cluster/nodes/stats
GET /_cluster/nodes/nodeId1,nodeId2/stats
55. API {core}
index search
bulk query
delete from/size paging
delete by query sort
get highlighting
count selective fields
56. API {indices}
create optimize
delete snapshot
open/close update settings
get/put/delete analyze
mapping
status
refresh
flush
60. filters
faster than queries
cached (depends on the filter)
the cache is used for different queries against
the same filter
no scoring
more useful ones: term, terms, range, prefix, and,
or, not, exists, missing, query
61. facets
provide aggregated data based on the search
request
terms, histogram, date histogram, range,
statistical, and more
63. interfaces
REST
including memcached
Java /!Groovy
Language clients (REST/Thrift):
pyes, PHP (standalone and symfony), Ruby, Perl
Flume sink implementation
64. elastica
similar to the other PHP ElasticSearch client
API naming is consistent with Zend Framework
can be extended for new filters, facets, etc
still under development
65. elastica
$es = new Elastica_Client('vm', 9200);
$index = new Elastica_Index($es, 'test');
$index->create(array(), true);
$type = new Elastica_Type($index, 'person');
$doc = new Elastica_Document(1, array('name' => 'Andrei Zmievski',
example
'email' => 'andrei@test.com',
'username' => 'andrei',
'bills' => array(2, 3, 5)));
$type->addDocument($doc);
$qs = new Elastica_Query_QueryString('andrei');
$query = new Elastica_Query($qs);
$resultSet = $type->search($query);
print $resultSet->count();
66. data import
ES is not the primary data store (usually)
to import/synchronize data:
write an agent (Gearman, message queues, etc)
use rivers (CouchDB, RabbitMQ, Twitter)
67. 10 more features
versioning load balancing nodes
index aliases plugins
parent/child docs more_like_this
scripting multi_field mapping
dynamic mapping percolation
templates