Elasticsearch - a search and real-time analytics server based on Apache Lucene - is gaining a lot of popularity lately, and is being used world-wide to power many sophisticated systems. While many use it for the "standard" stuff (that is, simple full-text search and real-time log analysis), there are some really interesting usage patterns that can prove useful in many real-world scenarios. In this talk we will briefly talk about Elasticsearch and its common use-cases, and then showcase some less common use-cases leveraging Elasticsearch in an interesting and often times innovating ways.
8. DocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
The index:
Dictionary and
posting lists
6 documents to index
Example from:
Justin Zobel , Alistair Moffat,
Inverted files for text search engines,
ACM Computing Surveys (CSUR)
v.38 n.2, p.6-es, 2006
The old night keeper keeps the keep in the town1
In the big old house in the big old gown.2
The house in the town had the big old keep3
Where the old night keeper never did sleep.4
The night keeper keeps the keep in the night5
And keeps in the dark and sleeps in the light.6
Full-text Search 101:
The inverted index
9. Full-text Search 101:
The inverted index
DocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
The index:
Dictionary and
posting lists
6 documents to index
The old night keeper keeps the keep in the town1
In the big old house in the big old gown.2
The house in the town had the big old keep3
Where the old night keeper never did sleep.4
The night keeper keeps the keep in the night5
And keeps in the dark and sleeps in the light.6
User queries for “keeper”
13. Relevance
• Precision
The fraction of the retrieved
documents that are relevant
• Recall
The fraction of the relevant
documents that are retrieved
• Order of results
14. Challenges with search
• Relevance
• Getting the tokens right
– Tokenization
– Stemming
• Multi-lingual content
– Or other cross-cutting search concerns
• Tolerance
35. #6: Debugging a distributed system
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif
HTTP/1.0" 200 2326 "http://www.example.com/start.html"
"Mozilla/4.08 [en] (Win98; I ;Nav)"
System.NullReferenceException: Object reference not set to an instance of an object.
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
at AjaxControlToolkit.ToolkitScriptManager.GetScriptCombineAttributes(Assembly assembly)
at AjaxControlToolkit.ToolkitScriptManager.IsScriptCombinable(ScriptEntry scriptEntry)
at AjaxControlToolkit.ToolkitScriptManager.OnResolveScriptReference(ScriptReferenceEventArgs e)
at System.Web.UI.ScriptManager.RegisterScripts()
at System.Web.UI.ScriptManager.OnPagePreRenderComplete(Object sender, EventArgs e)
at System.Web.UI.Page.OnPreRenderComplete(EventArgs e)
at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean
includeStagesAfterAsyncPoint)
36. #7: Distributed git storage
• PoC in C# using libgit2sharp
• https://github.com/synhershko/libgit2sharp.El
asticsearch
• Kudos @nulltoken
37. Putting this to practice
• Search on your data
– Data doesn’t have to be structured to be queried
• Use your logs to gain insight
– Metrics
– Establish a baseline
– Investigate on unexpected / unfamiliar behaviors