SlideShare une entreprise Scribd logo
1  sur  26
1
Search, Time Series, and Graph Analysis in the Cloud
Dave Erickson
dave@elastic.co
Visualizing Data in
Elasticsearch
2
Dave Erickson – Developer
• Biotech
• Electronic Archives & Libraries
• Geospatial
• Healthcare
• Air Traffic Control
• Financial Services
3
4
Elastic Stack: Real Time Search & Analytics at Scale
Elastic Cloud
Security
X-Pack
Kibana
User Interface
ElasticsearchStore, Index,
& Analyze
Ingest
Logstash Beats
+
Alerting
Monitoring
Reporting
Graph
5
6
Visualization is Important
https://www.reddit.com/r/dataisugly/
7
Visualization in the Cloud
• Qualities We Want:
‒ Parallel
‒ Highly Available
‒ Platform Independent
‒ Multi-tenancy
‒ Extensible
• Use Cases:
‒ Search, Discovery, & Analytics
‒ Metrics & Time Series Data
‒ Structured & Unstructured
‒ Security Analytics
8
Wait …
Why would you use a
search engine for
analytics?
9
Search indexes have been around for a long time
10
Scaled, distributed search indexes have been around
for a long time
11
Electronic search engines have been around for a
long time
1928 – patent application by Emanuel Goldberg for a “Statistical Machine”
http://www.google.com/patents/US1838389
Basically an optical version of grep that predates almost everything
12
Timeline, in no way complete
• 7th Century B.C.E. ? – library catalogs
• 1928 – Goldberg “Statistical Machine”
– Optical search on microfilm
• 1945 – Vannevar Bush “microfilm rapid selector”; “Memex”
• 1960s – SMART Information Retrieval System (Cornell U.)
• 1974 – grep first appears in Unix v4
• 1990s – WWW search engines
• 1999 – Doug Cutting Lucene search indexer
13
Inverted Indexes
• Pay the cost at indexing time (insertion time)
• Reap the benefits at retrieval time
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
14
Pretty Good At Retrieval
Find documents mentioning “foxes” ?
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
15
Excellent at Search
Find documents mentioning
“quick” AND “fox” ?
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
16
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
Excellent at Real Time Analytics
What was the most commonly mentioned term?
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
17
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
Histogram about the mention of foxes over time:
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
18
Columnar Indexes
18
text: “the quick brown fox”
date: Monday
text: “brown fox in the forest”
date: Tuesday
Document (1)
Document (2)
text: “brown bear”
date: Monday
Document (3)
Doc id Date
1 Monday
2 Tuesday
3 Monday
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
19
Now do it in parallel
• Distributed
• Non-blocking
• Read / Write
• Commodity hardware
• Fault-tolerance
• High Availability
19
20
Use Cases
20
21
22
23
24
25
26
Thank You!
dave@elastic.co

Contenu connexe

Tendances

Smart Searching Through Trillion of Research Papers with Apache Spark ML with...
Smart Searching Through Trillion of Research Papers with Apache Spark ML with...Smart Searching Through Trillion of Research Papers with Apache Spark ML with...
Smart Searching Through Trillion of Research Papers with Apache Spark ML with...Databricks
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data ExplorationSeeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data ExplorationGreg Goltsov
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterImply
 
Ginix generalized inverted index for keyword search
Ginix generalized inverted index for keyword searchGinix generalized inverted index for keyword search
Ginix generalized inverted index for keyword searchIEEEFINALYEARPROJECTS
 
Big Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it allBig Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it allBigDataExpo
 
Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016StampedeCon
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsNatalino Busa
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
COVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvIt
COVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvItCOVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvIt
COVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvItMartin Chapman
 
MongoDC - Location Aware Applications w/mongodb
MongoDC - Location Aware Applications w/mongodbMongoDC - Location Aware Applications w/mongodb
MongoDC - Location Aware Applications w/mongodbLalit Kapoor
 
Au cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite ElasticAu cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite ElasticElasticsearch
 
Sensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und GrafanaSensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und GrafanaClaus Matzinger
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidImply
 
Kafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 stepsKafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 stepsAzmath Mohamad
 

Tendances (19)

Smart Searching Through Trillion of Research Papers with Apache Spark ML with...
Smart Searching Through Trillion of Research Papers with Apache Spark ML with...Smart Searching Through Trillion of Research Papers with Apache Spark ML with...
Smart Searching Through Trillion of Research Papers with Apache Spark ML with...
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data ExplorationSeeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
 
Insight_150115_Demo
Insight_150115_DemoInsight_150115_Demo
Insight_150115_Demo
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 
Ginix generalized inverted index for keyword search
Ginix generalized inverted index for keyword searchGinix generalized inverted index for keyword search
Ginix generalized inverted index for keyword search
 
Big Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it allBig Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it all
 
Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analytics
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
COVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvIt
COVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvItCOVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvIt
COVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvIt
 
MongoDC - Location Aware Applications w/mongodb
MongoDC - Location Aware Applications w/mongodbMongoDC - Location Aware Applications w/mongodb
MongoDC - Location Aware Applications w/mongodb
 
Au cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite ElasticAu cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite Elastic
 
Sensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und GrafanaSensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und Grafana
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
Kafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 stepsKafka Spark Realtime stream processing and analytics in 6 steps
Kafka Spark Realtime stream processing and analytics in 6 steps
 
متن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌دادهمتن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌داده
 

En vedette

Storage Methods for Nonstandard Data Patterns
Storage Methods for Nonstandard Data PatternsStorage Methods for Nonstandard Data Patterns
Storage Methods for Nonstandard Data PatternsBob Burgess
 
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库TokuDB 高科扩展性 MySQL 和 MariaDB 数据库
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库YUCHENG HU
 
TokuDB - What You Need to Know
TokuDB - What You Need to KnowTokuDB - What You Need to Know
TokuDB - What You Need to KnowJervin Real
 
MySQL Tokudb engine benchmark
MySQL Tokudb engine benchmarkMySQL Tokudb engine benchmark
MySQL Tokudb engine benchmarkLouis liu
 
MySQL Performance Tuning Variables
MySQL Performance Tuning VariablesMySQL Performance Tuning Variables
MySQL Performance Tuning VariablesFromDual GmbH
 

En vedette (6)

Storage Methods for Nonstandard Data Patterns
Storage Methods for Nonstandard Data PatternsStorage Methods for Nonstandard Data Patterns
Storage Methods for Nonstandard Data Patterns
 
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库TokuDB 高科扩展性 MySQL 和 MariaDB 数据库
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库
 
TokuDB - What You Need to Know
TokuDB - What You Need to KnowTokuDB - What You Need to Know
TokuDB - What You Need to Know
 
MySQL Tokudb engine benchmark
MySQL Tokudb engine benchmarkMySQL Tokudb engine benchmark
MySQL Tokudb engine benchmark
 
MySQL Performance Tuning Variables
MySQL Performance Tuning VariablesMySQL Performance Tuning Variables
MySQL Performance Tuning Variables
 
MySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB StatusMySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB Status
 

Similaire à Visualization and Analytics in Elasticsearch

Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesItamar
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Gasperi Jerome
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyItamar
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...Charlie Hull
 
Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Adam Leadbetter
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Binary Trees? Automatically identifying the links between born-digital records
Binary Trees? Automatically identifying the links between born-digital recordsBinary Trees? Automatically identifying the links between born-digital records
Binary Trees? Automatically identifying the links between born-digital recordsRoss Spencer
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchTO THE NEW | Technology
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgDavid Pilato
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Roy Russo
 
Data Mining Related Project Topics
Data Mining Related Project TopicsData Mining Related Project Topics
Data Mining Related Project TopicsPhdtopiccom
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Alex Pinto
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outlineIan Duncan
 
Falcon Full Text Search Engine
Falcon Full Text Search EngineFalcon Full Text Search Engine
Falcon Full Text Search EngineHideshi Ogoshi
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scaledatamantra
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud EcosystemAmazon Web Services
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
 

Similaire à Visualization and Analytics in Elasticsearch (20)

Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easy
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
 
Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...Why quality control and quality assurance is important for the legacy of GEOT...
Why quality control and quality assurance is important for the legacy of GEOT...
 
Kx brian
Kx brianKx brian
Kx brian
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Binary Trees? Automatically identifying the links between born-digital records
Binary Trees? Automatically identifying the links between born-digital recordsBinary Trees? Automatically identifying the links between born-digital records
Binary Trees? Automatically identifying the links between born-digital records
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearch
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Data Mining Related Project Topics
Data Mining Related Project TopicsData Mining Related Project Topics
Data Mining Related Project Topics
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
 
Falcon Full Text Search Engine
Falcon Full Text Search EngineFalcon Full Text Search Engine
Falcon Full Text Search Engine
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
 

Dernier

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Dernier (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Visualization and Analytics in Elasticsearch

  • 1. 1 Search, Time Series, and Graph Analysis in the Cloud Dave Erickson dave@elastic.co Visualizing Data in Elasticsearch
  • 2. 2 Dave Erickson – Developer • Biotech • Electronic Archives & Libraries • Geospatial • Healthcare • Air Traffic Control • Financial Services
  • 3. 3
  • 4. 4 Elastic Stack: Real Time Search & Analytics at Scale Elastic Cloud Security X-Pack Kibana User Interface ElasticsearchStore, Index, & Analyze Ingest Logstash Beats + Alerting Monitoring Reporting Graph
  • 5. 5
  • 7. 7 Visualization in the Cloud • Qualities We Want: ‒ Parallel ‒ Highly Available ‒ Platform Independent ‒ Multi-tenancy ‒ Extensible • Use Cases: ‒ Search, Discovery, & Analytics ‒ Metrics & Time Series Data ‒ Structured & Unstructured ‒ Security Analytics
  • 8. 8 Wait … Why would you use a search engine for analytics?
  • 9. 9 Search indexes have been around for a long time
  • 10. 10 Scaled, distributed search indexes have been around for a long time
  • 11. 11 Electronic search engines have been around for a long time 1928 – patent application by Emanuel Goldberg for a “Statistical Machine” http://www.google.com/patents/US1838389 Basically an optical version of grep that predates almost everything
  • 12. 12 Timeline, in no way complete • 7th Century B.C.E. ? – library catalogs • 1928 – Goldberg “Statistical Machine” – Optical search on microfilm • 1945 – Vannevar Bush “microfilm rapid selector”; “Memex” • 1960s – SMART Information Retrieval System (Cornell U.) • 1974 – grep first appears in Unix v4 • 1990s – WWW search engines • 1999 – Doug Cutting Lucene search indexer
  • 13. 13 Inverted Indexes • Pay the cost at indexing time (insertion time) • Reap the benefits at retrieval time “the quick brown fox” “brown fox in the forest” Document (1) Document (2) “brown bear” Document (3) Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1
  • 14. 14 Pretty Good At Retrieval Find documents mentioning “foxes” ? Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1 “the quick brown fox” “brown fox in the forest” Document (1) Document (2) “brown bear” Document (3)
  • 15. 15 Excellent at Search Find documents mentioning “quick” AND “fox” ? Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1 “the quick brown fox” “brown fox in the forest” Document (1) Document (2) “brown bear” Document (3)
  • 16. 16 “the quick brown fox” “brown fox in the forest” Document (1) Document (2) “brown bear” Document (3) Excellent at Real Time Analytics What was the most commonly mentioned term? Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1
  • 17. 17 “the quick brown fox” “brown fox in the forest” Document (1) Document (2) “brown bear” Document (3) Histogram about the mention of foxes over time: Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1
  • 18. 18 Columnar Indexes 18 text: “the quick brown fox” date: Monday text: “brown fox in the forest” date: Tuesday Document (1) Document (2) text: “brown bear” date: Monday Document (3) Doc id Date 1 Monday 2 Tuesday 3 Monday Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1
  • 19. 19 Now do it in parallel • Distributed • Non-blocking • Read / Write • Commodity hardware • Fault-tolerance • High Availability 19
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25

Notes de l'éditeur

  1. You’ve done the work ahead of time by building the index. Access if fast
  2. Incredibly flexible ad-hoc query, structured or unstructured
  3. Wait .. Did we just do analytics?
  4. Wait .. Did we just do analytics?