Elasticsearch indexes can be used for search, analytics, and visualization. Inverted indexes excel at search and analytics by indexing documents and terms which allows for fast retrieval of documents matching queries and analysis of term frequencies. Columnar indexes in Elasticsearch allow for real-time analytics across structured and unstructured data like histograms of term usage over time by storing metadata like document IDs and timestamps with the index. These capabilities make Elasticsearch well-suited for visualization and analytics in distributed, cloud-based environments.
11. 11
Electronic search engines have been around for a
long time
1928 – patent application by Emanuel Goldberg for a “Statistical Machine”
http://www.google.com/patents/US1838389
Basically an optical version of grep that predates almost everything
12. 12
Timeline, in no way complete
• 7th Century B.C.E. ? – library catalogs
• 1928 – Goldberg “Statistical Machine”
– Optical search on microfilm
• 1945 – Vannevar Bush “microfilm rapid selector”; “Memex”
• 1960s – SMART Information Retrieval System (Cornell U.)
• 1974 – grep first appears in Unix v4
• 1990s – WWW search engines
• 1999 – Doug Cutting Lucene search indexer
13. 13
Inverted Indexes
• Pay the cost at indexing time (insertion time)
• Reap the benefits at retrieval time
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
14. 14
Pretty Good At Retrieval
Find documents mentioning “foxes” ?
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
15. 15
Excellent at Search
Find documents mentioning
“quick” AND “fox” ?
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
16. 16
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
Excellent at Real Time Analytics
What was the most commonly mentioned term?
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
17. 17
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
Histogram about the mention of foxes over time:
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
18. 18
Columnar Indexes
18
text: “the quick brown fox”
date: Monday
text: “brown fox in the forest”
date: Tuesday
Document (1)
Document (2)
text: “brown bear”
date: Monday
Document (3)
Doc id Date
1 Monday
2 Tuesday
3 Monday
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
19. 19
Now do it in parallel
• Distributed
• Non-blocking
• Read / Write
• Commodity hardware
• Fault-tolerance
• High Availability
19