390. 과정 Chapter 3
Query logic
Elasticsearch is a distributed search engine, and so all functionality provided must
be distributed in its nature. It is exactly the same with querying. Since we want to
discuss some more advanced topics on how to control the query process, we first
need to know how it works.
By default, if we don't alter anything, the query process will consist of two phases as
shown in the following diagram:
Application
Elasticsearch Node
Elasticsearch Node
Elasticsearch Cluster
Shard 1
Shard 2
Scatter phase
Gather phase
Results
Query
When we send a query, we send it to one of the Elasticsearch nodes. What is
occurring now is a so-called scatter phase. The query is distributed to all the shards
that our index is built of. For example, if it is built of five shards and one replica,
then five physical shards will be queried (we don't need to query both a shard and
its replica because they contain the same data). Each of the queried shards will only
return the document identifier and the score of the document. The node that sent the
scatter query will wait for all the shards to complete their task, gather the results, and
sort them appropriately (in this case, from the top scoring to the lowest scoring ones).
After that, a new request will be sent to build the search results. However, for now,
the request will be sent only to those shards that held the documents to build the