SlideShare une entreprise Scribd logo
1  sur  45
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jon Handler – Principal Solutions Architect, Search Services
March 21, 2018
Amazon Elasticsearch Service
Deep Dive
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A tale of a small workload
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Source: TechCrunch survey of popular open source software from April’17
Open source
Fast time to value
Easy ingestion
Easy visualization
High performance and distributed
Best analytics and search
Elasticsearch can help
Rank Project Name
Overall Project
Rating
1 Linux 100.00
2 Git 31.10
3 MySQL 25.23
4 Node.js 22.75
5 Docker 22.61
6 Hadoop 16.19
7 Elasticsearch 15.72
8 Spark 14.99
9 MongoDB 14.68
10 Selenium 12.81
11 NPM 12.31
12 Redis 11.61
B E N E F I T S
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
With Elasticsearch, you can run your business AND monitor it as
well
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Elasticsearch Service is a fully
managed service that makes it easy to
deploy, manage, and scale Elasticsearch
and Kibana in the AWS cloud
Amazon Elasticsearch Service
+
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tightly Integrated with
Other AWS Services
Seamless data ingestion, security,
auditing and orchestration
Benefits of Amazon Elasticsearch Service
Supports Open-Source APIs
and Tools
Drop-in replacement with no need to
learn new APIs or skills
Easy to Use
Deploy a production-ready
Elasticsearch cluster in minutes
Scalable
Resize your cluster with a few
clicks or a single API call
Secure
Deploy into your VPC and restrict
access using security groups and IAM
policies
Highly Available
Replicate across Availability Zones,
with monitoring and automated
self-healing
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Service Architecture
AWS SDK
AWS CLI
AWS CloudFormation
Elasticsearch
data nodes
Elasticsearch
master nodes
Elastic Load
Balancing
IAM
Amazon
CloudWatch
AWS
CloudTrail
Amazon Elasticsearch Service domain
Key Idea
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security
• Public endpoints – IAM
• Private endpoints – IAM
and security groups
• Encryption
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use IAM for public endpoints
{ "Version": "2012-10-17",
"Statement": [ {
"Effect": "Allow",
"Principal": {
"AWS":
"arn:aws:iam::XXXX:root”
},
"Action": "es:*",
"Resource":
"arn:aws:es:
us-west-2:XXXX:domain/YYYY/*”
}
]
}
• To grant access for Kibana,
use a CIDR Condition
• To grant read-only access to
an account or role, specify an
es:HttpGet Action
• To limit a user to a specific
index, or API add it to the
Resource
• For Admin access, specify
administrative es:* Actions
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Add security groups for private endpoints
Specify a subnet and security group
to apply CIDR restrictions on
inbound/outbound traffic
security group
security group
Amazon Elasticsearch Service
Data Master
Data
Master
IAM
IAM
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Encrypt your data
• Encrypted data at rest on
Amazon ES instances
• Both EBS and ephemeral store
• Encrypted automatic snapshots
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The front end
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architecture: Web Serving
Public
Subnet
Internet
Gateway
ALB
Private
Subnet
Httpd
Amazon ES
ENI
Amazon ES
Domain
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Application search
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Each document has
a set of fieldsKey Idea
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
{ "title": "Star Wars",
"plot": "Luke Skywalker joins forces with a
Jedi Knight, a cocky pilot, a wookiee and two
droids to save the universe from the Empire's
world-destroying battle-station, while also
attempting to rescue Princess Leia from the evil
Darth Vader.",
"year": 1977,
"actors": [
"Mark Hamill",
"Harrison Ford",
"Carrie Fisher”
],
"directors": [
"George Lucas”
],
"rating": 8.7,
"genres" : [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi”
]
}
Elasticsearch works with
structured JSON containing
fields and values
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Elasticsearch creates an index for each field
Doc
Name Value
Name Value
Name Value
Name Value
Name Value
Name Value
Fields Analysis
Field indices
Term 1
Term 2
Term 3
Term 4
Term 5
Term 6
Term 7
1,
4,
8,
12,
30,
42,
58,
100.
..
Posting lists
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data is stored in indexes, distributed across shards
Amazon Elasticsearch Service domain
ID
Field: value
Field: value
Field: value
Field: value
Index
Shards
Instances
• Shards are primary or
replica
• Primary shard count can’t
be changed
• Elasticsearch distributes
shards to instances
elastically
• Primary and replica are
distributed to different
instances
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How many instances?
• Index size is approximately source size
• Double this if you are deploying an index replica
• Instance count based on storage
requirements
• Either local storage or up to 1.5 TB of Amazon
Elastic Block Store (EBS) per instance
Example: a 2 TB corpus will need 4 instances
Assuming a replica and using EBS
Given 1.5 TB of storage per instance, this gives 6TB of storage
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which instance type?
Instance Max Storage* Workload
T2 3.5TB You want to do dev/QA
M3, M4 150TB Your data and queries are “average”
R3, R4 150TB You have higher request volumes, larger
documents, or are using aggregations heavily
C4 150TB You need to support high concurrency
I2, I3 1.5 PB You have XL storage needs
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Shards are units of storage and work
• Set primary shard count based on
storage, 40GB per primary
• Example: set shard count = 50 for a 2TB
corpus (2TB / 40GB = 50 shards)
• Always use at least 1 replica in production
• Set shard count so that shards per
instance ~= number of CPUs
• Keep shard sizes as equivalent as possible
Key Idea
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use templates to set shard count
*PUT <endpoint>/_template/template1
{
"index_patterns": [”movies*"],
"settings": {
"number_of_shards": 50,
"number_of_replicas": 1
}
}
• All new indexes which match the index pattern receive the settings
• You can also specify a mapping for all indexes
*Note: ES 6.0+ syntax
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Query your data
{
"query": {
"match": {
"title": "iron man"
}
}
}
Title Score
Iron Man 10.56436
Iron Man 2 8.631084
Iron Man 3 8.631084
Iron Sky 6.387543
The Man with the Iron Fists 6.1855173
The Man in the Iron Mask 6.1855173
The Iron Giant 5.218624
The Iron Lady 5.218624
77 hits
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
You can search for values by field, with Boolean expressions
to get scored, sorted results
1,
4,
8,
12,
30,
42,
58,
100.
...
1,
4,
8,
12,
30,
42,
58,
100.
...
Term 1
Term 2
Term 3
Term 4
Term 5
Term 6
Term 7
Analysis
1,
4,
8,
12,
30,
42,
58,
100
...
Posting lists
Field1:value
Field2:value
Field3:value
Term 1
Term 2
Term 3
Term 4
Term 5
Term 6
Term 7
Term 1
Term 2
Term 3
Term 4
Term 5
Term 6
Term 7
1,
12,
58,
100
58,
12,
1,
100
Result
Key Idea
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
You can analyze field values to get statistics and
build visualizations
58,
12,
1,
100,
115
123,
214,
947
Result
GET
GET
POST
GET
PUT
GET
GET
POST
Field
Data
Buckets
GET
POST
PUT
5
2
1
Counts
Key Idea
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Full text search
CASE STUDY: MirrorWeb
Make the UK Government and UK Parliament’s web archives searchable
Large scale ingestion scenario: 120 TB of data (1.2 MM 100MB files), duplicates and
bad data, Warc format
P R O B L E M
Scalability: Started on a 9-node, R4.4Xlarge cluster for fast ingest, reduced to 6 R4.Xlarge instances for search. Able to
reconfigure the cluster with no down time
Cost effective: Indexed 1.4 billion documents for $337
Fast: 146 MM docs per hour indexed. 14x faster than the previous best for this data set (using Hadoop)
B E N E F I T S
S O L U T I O N
Amazon
S3 (Storage)
Amazon
EC2
(Filtering)
Amazon EC2
(Extraction)
Amazon ES
(Search)
For more on this case, see http://tinyurl.com/ybqwbolq
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Log transport
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architecture: Log transport
Public
Subnet
Internet
Gateway
ALB
Private
Subnet
Filebeat
Redis
ENI
Kibana
Amazon ES
ENI
Amazon ES
Domain
Redis
Cluster
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Each log line or other event constitutes a search document
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Each log line has various fields that contain data you can
search and analyzeKey Idea
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Build an ingest pipeline that
completes these tasks
Data source Collect Transform Buffer Deliver
Key Idea
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis Firehose
CloudWatch Logs
Logstash
Amazon Lambda
Transform
Kinesis Firehose/
Streams
CloudWatch Logs
Amazon Elasticache/Redis
Kafka Rabbit MQLogstash
Amazon S3
Buffer
Kinesis Firehose/
Streams
Logstash
Worker nodes
Deliver
Kinesis Agent
CloudWatch Logs Agent
Beats
Fluentd
Application
Logstash
Collect
CloudWatch Logs
Amazon Lambda
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Lambda architectures for dynamic updating
Files
S3 Events
Amazon S3 AWS Lambda
Function
DynamoDB
streams
Amazon
DynamoDB
Table
AWS Lambda
Function
Amazon Elasticsearch Service
Amazon KinesisData Producers AWS Lambda
Function
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
When you use Elasticsearch for log analysis, split data into
daily indexes
• On ingest, create indexes with a root string,
e.g. logs_
• Depending on volume, rotate at regular
intervals – normally daily
• Daily indexes simplify index management.
Delete the oldest index to create more space
on your cluster
logs_01.21.2018
logs_01.22.2018
logs_01.23.2018
logs_01.24.2018
logs_01.25.2018
logs_01.26.2018
logs_01.27.2018
logs_01.28.2018
logs_01.29.2018
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Log analysis
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architecture: Log analytics
Public
Subnet
Private
Subnet
Bastion
Kibana
Amazon ES
ENI
Amazon ES
Domain
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo
Log Analysis
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Application Monitoring & Root-cause Analysis
CASE STUDY: EXPEDIA
Logs, lots and lots of logs. How to cost effectively monitor logs?
Require centralized logging infrastructure
Did not have the man power to manage infrastructure
P R O B L E M
Quick insights: Able to identify and troubleshoot issues in real-time
Secure: Integrated w/ AWS IAM
Scalable: Cluster sizes are able to grow to accommodate additional log sources
B E N E F I T S
Streaming AWS CloudTrail logs, application logs, and Docker startup logs to
Elasticsearch
Created centralized logging service for all team members
Using Kibana for visualizations and for Elasticsearch queries
S O L U T I O N
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Loose
Ends
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use 3 dedicated master instances in production
Master instances orchestrate and
make your cluster more stable
Amazon ES domain
Key Idea
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dedicated master node recommendations
Number of data nodes Master node instance type
< 10 m3.medium+
< 20 m4.large+
<= 50 c4.xlarge+
50-100 c4.2xlarge+
• In production, use 3 dedicated masters
• Use smaller instances than data instances
• Size based on instance count, index count, and schema size
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use zone awareness in production
100% data redundancy in 2 zones
makes your cluster more highly
available
Amazon ES domain
Availability Zone 2
Availability Zone 1
Key Idea
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Set CloudWatch alarms for at least these metrics
Name Metric Threshold Periods
ClusterStatus.red Maximum >= 1 1
ClusterIndexWritesBlocked Maximum >= 1 1
CPUUtilization/MasterCPUUtilization Average >= 80% 3
JVMMemoryPressure/Master... Maximum >= 80% 3
FreeStorageSpace Minimum <= (25% of
avail space)
1
AutomatedSnapshotFailure Maximum >= 1 1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Wrap up
• Whether you’re monitoring your infrastructure, or searching
your application data, Amazon Elasticsearch Service makes it
easy to locate the information you need
• The service offers ease-of-use, scalability, security, high
availability, and compatibility with existing open source
software
• With its tight integration with other AWS services you can get
your data in and start working
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

Contenu connexe

Tendances

Disaster Recovery with the AWS Cloud
Disaster Recovery with the AWS CloudDisaster Recovery with the AWS Cloud
Disaster Recovery with the AWS Cloud
Amazon Web Services
 

Tendances (20)

Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCache
 
Understand AWS Pricing
Understand AWS PricingUnderstand AWS Pricing
Understand AWS Pricing
 
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
Tinder and DynamoDB: It's a Match! Massive Data Migration, Zero Down Time - D...
Tinder and DynamoDB: It's a Match! Massive Data Migration, Zero Down Time - D...Tinder and DynamoDB: It's a Match! Massive Data Migration, Zero Down Time - D...
Tinder and DynamoDB: It's a Match! Massive Data Migration, Zero Down Time - D...
 
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
 
Disaster Recovery with the AWS Cloud
Disaster Recovery with the AWS CloudDisaster Recovery with the AWS Cloud
Disaster Recovery with the AWS Cloud
 
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
 
AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...
 
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
 
Getting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceGetting Started with AWS Database Migration Service
Getting Started with AWS Database Migration Service
 
SRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon AuroraSRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon Aurora
 
Migrating Oracle to PostgreSQL
Migrating Oracle to PostgreSQLMigrating Oracle to PostgreSQL
Migrating Oracle to PostgreSQL
 
Getting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceGetting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration Service
 
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million Users
 

Similaire à Amazon Elasticsearch Service Deep Dive - AWS Online Tech Talks

Similaire à Amazon Elasticsearch Service Deep Dive - AWS Online Tech Talks (20)

BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
 
Building High Performance Apps with In-memory Data
Building High Performance Apps with In-memory DataBuilding High Performance Apps with In-memory Data
Building High Performance Apps with In-memory Data
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL Pipelines
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter Dachnowicz
 
Adding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAdding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San Francisco
 
Using Search with a Database: Database Week SF
Using Search with a Database: Database Week SFUsing Search with a Database: Database Week SF
Using Search with a Database: Database Week SF
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
 
WildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing WorkshopWildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing Workshop
 
ElastiCache: Deep Dive Best Practices and Usage Patterns - AWS Online Tech Talks
ElastiCache: Deep Dive Best Practices and Usage Patterns - AWS Online Tech TalksElastiCache: Deep Dive Best Practices and Usage Patterns - AWS Online Tech Talks
ElastiCache: Deep Dive Best Practices and Usage Patterns - AWS Online Tech Talks
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Amazon Elasticsearch Service Deep Dive - AWS Online Tech Talks

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jon Handler – Principal Solutions Architect, Search Services March 21, 2018 Amazon Elasticsearch Service Deep Dive
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A tale of a small workload
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Source: TechCrunch survey of popular open source software from April’17 Open source Fast time to value Easy ingestion Easy visualization High performance and distributed Best analytics and search Elasticsearch can help Rank Project Name Overall Project Rating 1 Linux 100.00 2 Git 31.10 3 MySQL 25.23 4 Node.js 22.75 5 Docker 22.61 6 Hadoop 16.19 7 Elasticsearch 15.72 8 Spark 14.99 9 MongoDB 14.68 10 Selenium 12.81 11 NPM 12.31 12 Redis 11.61 B E N E F I T S
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. With Elasticsearch, you can run your business AND monitor it as well
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Elasticsearch Service is a fully managed service that makes it easy to deploy, manage, and scale Elasticsearch and Kibana in the AWS cloud Amazon Elasticsearch Service +
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tightly Integrated with Other AWS Services Seamless data ingestion, security, auditing and orchestration Benefits of Amazon Elasticsearch Service Supports Open-Source APIs and Tools Drop-in replacement with no need to learn new APIs or skills Easy to Use Deploy a production-ready Elasticsearch cluster in minutes Scalable Resize your cluster with a few clicks or a single API call Secure Deploy into your VPC and restrict access using security groups and IAM policies Highly Available Replicate across Availability Zones, with monitoring and automated self-healing
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Service Architecture AWS SDK AWS CLI AWS CloudFormation Elasticsearch data nodes Elasticsearch master nodes Elastic Load Balancing IAM Amazon CloudWatch AWS CloudTrail Amazon Elasticsearch Service domain Key Idea
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Security • Public endpoints – IAM • Private endpoints – IAM and security groups • Encryption
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use IAM for public endpoints { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::XXXX:root” }, "Action": "es:*", "Resource": "arn:aws:es: us-west-2:XXXX:domain/YYYY/*” } ] } • To grant access for Kibana, use a CIDR Condition • To grant read-only access to an account or role, specify an es:HttpGet Action • To limit a user to a specific index, or API add it to the Resource • For Admin access, specify administrative es:* Actions
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Add security groups for private endpoints Specify a subnet and security group to apply CIDR restrictions on inbound/outbound traffic security group security group Amazon Elasticsearch Service Data Master Data Master IAM IAM
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Encrypt your data • Encrypted data at rest on Amazon ES instances • Both EBS and ephemeral store • Encrypted automatic snapshots
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The front end
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architecture: Web Serving Public Subnet Internet Gateway ALB Private Subnet Httpd Amazon ES ENI Amazon ES Domain
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application search
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Each document has a set of fieldsKey Idea
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. { "title": "Star Wars", "plot": "Luke Skywalker joins forces with a Jedi Knight, a cocky pilot, a wookiee and two droids to save the universe from the Empire's world-destroying battle-station, while also attempting to rescue Princess Leia from the evil Darth Vader.", "year": 1977, "actors": [ "Mark Hamill", "Harrison Ford", "Carrie Fisher” ], "directors": [ "George Lucas” ], "rating": 8.7, "genres" : [ "Action", "Adventure", "Fantasy", "Sci-Fi” ] } Elasticsearch works with structured JSON containing fields and values
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Elasticsearch creates an index for each field Doc Name Value Name Value Name Value Name Value Name Value Name Value Fields Analysis Field indices Term 1 Term 2 Term 3 Term 4 Term 5 Term 6 Term 7 1, 4, 8, 12, 30, 42, 58, 100. .. Posting lists
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data is stored in indexes, distributed across shards Amazon Elasticsearch Service domain ID Field: value Field: value Field: value Field: value Index Shards Instances • Shards are primary or replica • Primary shard count can’t be changed • Elasticsearch distributes shards to instances elastically • Primary and replica are distributed to different instances
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How many instances? • Index size is approximately source size • Double this if you are deploying an index replica • Instance count based on storage requirements • Either local storage or up to 1.5 TB of Amazon Elastic Block Store (EBS) per instance Example: a 2 TB corpus will need 4 instances Assuming a replica and using EBS Given 1.5 TB of storage per instance, this gives 6TB of storage
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Which instance type? Instance Max Storage* Workload T2 3.5TB You want to do dev/QA M3, M4 150TB Your data and queries are “average” R3, R4 150TB You have higher request volumes, larger documents, or are using aggregations heavily C4 150TB You need to support high concurrency I2, I3 1.5 PB You have XL storage needs
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Shards are units of storage and work • Set primary shard count based on storage, 40GB per primary • Example: set shard count = 50 for a 2TB corpus (2TB / 40GB = 50 shards) • Always use at least 1 replica in production • Set shard count so that shards per instance ~= number of CPUs • Keep shard sizes as equivalent as possible Key Idea
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use templates to set shard count *PUT <endpoint>/_template/template1 { "index_patterns": [”movies*"], "settings": { "number_of_shards": 50, "number_of_replicas": 1 } } • All new indexes which match the index pattern receive the settings • You can also specify a mapping for all indexes *Note: ES 6.0+ syntax
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Query your data { "query": { "match": { "title": "iron man" } } } Title Score Iron Man 10.56436 Iron Man 2 8.631084 Iron Man 3 8.631084 Iron Sky 6.387543 The Man with the Iron Fists 6.1855173 The Man in the Iron Mask 6.1855173 The Iron Giant 5.218624 The Iron Lady 5.218624 77 hits
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. You can search for values by field, with Boolean expressions to get scored, sorted results 1, 4, 8, 12, 30, 42, 58, 100. ... 1, 4, 8, 12, 30, 42, 58, 100. ... Term 1 Term 2 Term 3 Term 4 Term 5 Term 6 Term 7 Analysis 1, 4, 8, 12, 30, 42, 58, 100 ... Posting lists Field1:value Field2:value Field3:value Term 1 Term 2 Term 3 Term 4 Term 5 Term 6 Term 7 Term 1 Term 2 Term 3 Term 4 Term 5 Term 6 Term 7 1, 12, 58, 100 58, 12, 1, 100 Result Key Idea
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. You can analyze field values to get statistics and build visualizations 58, 12, 1, 100, 115 123, 214, 947 Result GET GET POST GET PUT GET GET POST Field Data Buckets GET POST PUT 5 2 1 Counts Key Idea
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Full text search CASE STUDY: MirrorWeb Make the UK Government and UK Parliament’s web archives searchable Large scale ingestion scenario: 120 TB of data (1.2 MM 100MB files), duplicates and bad data, Warc format P R O B L E M Scalability: Started on a 9-node, R4.4Xlarge cluster for fast ingest, reduced to 6 R4.Xlarge instances for search. Able to reconfigure the cluster with no down time Cost effective: Indexed 1.4 billion documents for $337 Fast: 146 MM docs per hour indexed. 14x faster than the previous best for this data set (using Hadoop) B E N E F I T S S O L U T I O N Amazon S3 (Storage) Amazon EC2 (Filtering) Amazon EC2 (Extraction) Amazon ES (Search) For more on this case, see http://tinyurl.com/ybqwbolq
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Log transport
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architecture: Log transport Public Subnet Internet Gateway ALB Private Subnet Filebeat Redis ENI Kibana Amazon ES ENI Amazon ES Domain Redis Cluster
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Each log line or other event constitutes a search document
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Each log line has various fields that contain data you can search and analyzeKey Idea
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Build an ingest pipeline that completes these tasks Data source Collect Transform Buffer Deliver Key Idea
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Firehose CloudWatch Logs Logstash Amazon Lambda Transform Kinesis Firehose/ Streams CloudWatch Logs Amazon Elasticache/Redis Kafka Rabbit MQLogstash Amazon S3 Buffer Kinesis Firehose/ Streams Logstash Worker nodes Deliver Kinesis Agent CloudWatch Logs Agent Beats Fluentd Application Logstash Collect CloudWatch Logs Amazon Lambda
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Lambda architectures for dynamic updating Files S3 Events Amazon S3 AWS Lambda Function DynamoDB streams Amazon DynamoDB Table AWS Lambda Function Amazon Elasticsearch Service Amazon KinesisData Producers AWS Lambda Function
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. When you use Elasticsearch for log analysis, split data into daily indexes • On ingest, create indexes with a root string, e.g. logs_ • Depending on volume, rotate at regular intervals – normally daily • Daily indexes simplify index management. Delete the oldest index to create more space on your cluster logs_01.21.2018 logs_01.22.2018 logs_01.23.2018 logs_01.24.2018 logs_01.25.2018 logs_01.26.2018 logs_01.27.2018 logs_01.28.2018 logs_01.29.2018
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Log analysis
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architecture: Log analytics Public Subnet Private Subnet Bastion Kibana Amazon ES ENI Amazon ES Domain
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo Log Analysis
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application Monitoring & Root-cause Analysis CASE STUDY: EXPEDIA Logs, lots and lots of logs. How to cost effectively monitor logs? Require centralized logging infrastructure Did not have the man power to manage infrastructure P R O B L E M Quick insights: Able to identify and troubleshoot issues in real-time Secure: Integrated w/ AWS IAM Scalable: Cluster sizes are able to grow to accommodate additional log sources B E N E F I T S Streaming AWS CloudTrail logs, application logs, and Docker startup logs to Elasticsearch Created centralized logging service for all team members Using Kibana for visualizations and for Elasticsearch queries S O L U T I O N
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Loose Ends
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use 3 dedicated master instances in production Master instances orchestrate and make your cluster more stable Amazon ES domain Key Idea
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dedicated master node recommendations Number of data nodes Master node instance type < 10 m3.medium+ < 20 m4.large+ <= 50 c4.xlarge+ 50-100 c4.2xlarge+ • In production, use 3 dedicated masters • Use smaller instances than data instances • Size based on instance count, index count, and schema size
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use zone awareness in production 100% data redundancy in 2 zones makes your cluster more highly available Amazon ES domain Availability Zone 2 Availability Zone 1 Key Idea
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Set CloudWatch alarms for at least these metrics Name Metric Threshold Periods ClusterStatus.red Maximum >= 1 1 ClusterIndexWritesBlocked Maximum >= 1 1 CPUUtilization/MasterCPUUtilization Average >= 80% 3 JVMMemoryPressure/Master... Maximum >= 80% 3 FreeStorageSpace Minimum <= (25% of avail space) 1 AutomatedSnapshotFailure Maximum >= 1 1
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Wrap up • Whether you’re monitoring your infrastructure, or searching your application data, Amazon Elasticsearch Service makes it easy to locate the information you need • The service offers ease-of-use, scalability, security, high availability, and compatibility with existing open source software • With its tight integration with other AWS services you can get your data in and start working
  • 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!