SlideShare une entreprise Scribd logo
1  sur  20
Elasticsearch as (only)
datastore
How-to integrate it to existing project (and leave
everything else)
Tomas Sirny
@junckritter
www.reactor.am
Me
• Python developer
• DevOps role
• using/learning Elasticsearch for 3 years (version 0.16)
• Synopsi.tv, Reactor
Synopsi.tv
• movie recommendations service
• database of movies, TV shows
• need for search - Elasticsearch
• search-box on every page (prefix search)
• advanced search (search + facets)
• PostgreSQL as main datastore
• import to ES with script, hooks on add/update
Synopsi.tv - lessons
• Good for search
• Mappings are powerful
• need for reindexing - format/mapping change
• sometimes missing documents (Bruce Willis) - in index but not
searchable
• probably not yet suitable as only datastore
Reactor
• service for communication with users of your application
• send data about users and their activity (events)
• filter users, define segments
• set rules for reactions - email, webhook, SMS, etc.
Data structure
• small and simple pieces of data - JSON
• simple relations (application <-> user <-> field)
• “noSQL” more suitable, but doable in e.g. PostgreSQL
Backends - theory
• application saves data in several forms:
• raw (as they come)
• cleaned and sanitized
• formatted for specific datastore
• save method iterates over configured set of datastore backends and
send same data to them
• different backends for different operations - get, filter, analytics
• slight duplication of functionality in application
Backends - practice
• Mongo/DynamoDB - raw data, cold storage
• PostgreSQL - working data
• ElasticSearch - added later, working data, analytics
• different sets in different environments (devel machine,
production)
Source of truth
• one backend is trusted by definition - cold storage, need for
repopulation of data in other backends
• should be simple (hard to break) and scalable (probably in cloud)
• possible forms:
• JSON files, Hadoop, etc.
• noSQL database (Mongo, Cassandra, DynamoDB)
• Elasticsearch - different format (indices, nodes) from working
data
Input to ElasticSearch
• regularly run import/update script - if you do not need (almost-)
live data
• logic in application (our case)
• river - input channel from other source to ElasticSearch
(CouchDB, MongoDB, Hadoop)
ElasticSearch pros
• easy start
• easy scaling (e.g. with AWS plugin can new node automatically join
cluster and be ready in short time)
• search capabilities
• analytics with facets/aggregations (Kibana plugin)
• easy backup with snapshots
• easy to deploy - just one service
• highly tweakable yet sane defaults
ElasticSearch cons
• only one type of relation between documents - parent/child
(nested)
• higher need for reindexing - repopulate data from scratch
(change of format, new mappings for fields)
Lessons learned
Clusters
• Possible to use several clusters - one for data, one for
monitoring
• Can talk to each other via tribe nodes - nodes from cluster1
send monitoring data (Marvel plugin) to tribe node to save them
to cluster2
Nodes
• Use data and non-data nodes
• Data nodes for storage
• Non-data nodes for admin, local nodes on web server, etc.
• Use tags
Indices
• Split data to different indices (per client, segments, etc.) -
reactor_app-id
• Use aliases (migrations, time windows, etc.) - reactor_app-id as
alias for reactor_app-id_timestamp1 and reactor_app-
id_timestamp2
• Possible to allocate bigger indices to more performant servers -
node.tag and index.routing.allocation.include.tag
Shards
• Use several shards (default 5)
• Utilize more machines
• Immutable for existing index - plan your infrastructure
Indexing
• If possible, use batch indexing
• Use update script wisely - too big batch with updates can slow
down application
• Send indexing traffic to (local) non-data node
• Play with mapping, use not_analyzed, field_data
Machines
• Use local SSD, not network storage
• Replicate, make snapshots
• Make benchmark - few bigger machines can outperform more
small machines (network delays, cluster management)
• If in cloud, use ElasticSearch plugins for that cloud provider
(cluster discovery, snapshots on cloud storage)
Thanks.

Contenu connexe

Tendances

Scalable real-time processing techniques
Scalable real-time processing techniquesScalable real-time processing techniques
Scalable real-time processing techniquesLars Albertsson
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceHBaseCon
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Zhenxiao Luo
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...Altinity Ltd
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017Zhenxiao Luo
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@GrabShubham Tagra
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series WorldMapR Technologies
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginnerscpallares
 
Superset druid realtime
Superset druid realtimeSuperset druid realtime
Superset druid realtimearupmalakar
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Roopa Tangirala
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics TokyoAdam Gibson
 
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at ElasticDataconomy Media
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And TechniquesKnoldus Inc.
 

Tendances (20)

Scalable real-time processing techniques
Scalable real-time processing techniquesScalable real-time processing techniques
Scalable real-time processing techniques
 
Lighthouse
LighthouseLighthouse
Lighthouse
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@Grab
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginners
 
Superset druid realtime
Superset druid realtimeSuperset druid realtime
Superset druid realtime
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til Piffl
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
 
Bigdata : Big picture
Bigdata : Big pictureBigdata : Big picture
Bigdata : Big picture
 
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
 

Similaire à ElasticSearch as (only) datastore

Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Anubhav Kale
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...Ashnikbiz
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyBozhidar Bozhanov
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Project Deimos
Project DeimosProject Deimos
Project DeimosSimon Suo
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2Thomas Segismont
 
PostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLPostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLAlexei Krasner
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherObjectRocket
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
 
Beyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forksBeyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forksSameer Kumar
 

Similaire à ElasticSearch as (only) datastore (20)

Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and Multitenancy
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Project Deimos
Project DeimosProject Deimos
Project Deimos
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
 
PostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLPostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQL
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Beyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forksBeyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forks
 
Sql Server2008
Sql Server2008Sql Server2008
Sql Server2008
 

Dernier

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 

Dernier (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

ElasticSearch as (only) datastore

  • 1. Elasticsearch as (only) datastore How-to integrate it to existing project (and leave everything else) Tomas Sirny @junckritter www.reactor.am
  • 2. Me • Python developer • DevOps role • using/learning Elasticsearch for 3 years (version 0.16) • Synopsi.tv, Reactor
  • 3. Synopsi.tv • movie recommendations service • database of movies, TV shows • need for search - Elasticsearch • search-box on every page (prefix search) • advanced search (search + facets) • PostgreSQL as main datastore • import to ES with script, hooks on add/update
  • 4. Synopsi.tv - lessons • Good for search • Mappings are powerful • need for reindexing - format/mapping change • sometimes missing documents (Bruce Willis) - in index but not searchable • probably not yet suitable as only datastore
  • 5. Reactor • service for communication with users of your application • send data about users and their activity (events) • filter users, define segments • set rules for reactions - email, webhook, SMS, etc.
  • 6. Data structure • small and simple pieces of data - JSON • simple relations (application <-> user <-> field) • “noSQL” more suitable, but doable in e.g. PostgreSQL
  • 7. Backends - theory • application saves data in several forms: • raw (as they come) • cleaned and sanitized • formatted for specific datastore • save method iterates over configured set of datastore backends and send same data to them • different backends for different operations - get, filter, analytics • slight duplication of functionality in application
  • 8. Backends - practice • Mongo/DynamoDB - raw data, cold storage • PostgreSQL - working data • ElasticSearch - added later, working data, analytics • different sets in different environments (devel machine, production)
  • 9. Source of truth • one backend is trusted by definition - cold storage, need for repopulation of data in other backends • should be simple (hard to break) and scalable (probably in cloud) • possible forms: • JSON files, Hadoop, etc. • noSQL database (Mongo, Cassandra, DynamoDB) • Elasticsearch - different format (indices, nodes) from working data
  • 10. Input to ElasticSearch • regularly run import/update script - if you do not need (almost-) live data • logic in application (our case) • river - input channel from other source to ElasticSearch (CouchDB, MongoDB, Hadoop)
  • 11. ElasticSearch pros • easy start • easy scaling (e.g. with AWS plugin can new node automatically join cluster and be ready in short time) • search capabilities • analytics with facets/aggregations (Kibana plugin) • easy backup with snapshots • easy to deploy - just one service • highly tweakable yet sane defaults
  • 12. ElasticSearch cons • only one type of relation between documents - parent/child (nested) • higher need for reindexing - repopulate data from scratch (change of format, new mappings for fields)
  • 14. Clusters • Possible to use several clusters - one for data, one for monitoring • Can talk to each other via tribe nodes - nodes from cluster1 send monitoring data (Marvel plugin) to tribe node to save them to cluster2
  • 15. Nodes • Use data and non-data nodes • Data nodes for storage • Non-data nodes for admin, local nodes on web server, etc. • Use tags
  • 16. Indices • Split data to different indices (per client, segments, etc.) - reactor_app-id • Use aliases (migrations, time windows, etc.) - reactor_app-id as alias for reactor_app-id_timestamp1 and reactor_app- id_timestamp2 • Possible to allocate bigger indices to more performant servers - node.tag and index.routing.allocation.include.tag
  • 17. Shards • Use several shards (default 5) • Utilize more machines • Immutable for existing index - plan your infrastructure
  • 18. Indexing • If possible, use batch indexing • Use update script wisely - too big batch with updates can slow down application • Send indexing traffic to (local) non-data node • Play with mapping, use not_analyzed, field_data
  • 19. Machines • Use local SSD, not network storage • Replicate, make snapshots • Make benchmark - few bigger machines can outperform more small machines (network delays, cluster management) • If in cloud, use ElasticSearch plugins for that cloud provider (cluster discovery, snapshots on cloud storage)