This document provides an agenda and overview for an AWS Builders' Day event on analyzing web and application logs using Amazon S3, CloudFront, Lambda, and Elasticsearch Service. The agenda includes hosting sites in S3, scaling with CloudFront, analyzing logs, using Elasticsearch Service, and a demo. The document then discusses in more detail how to enable hosting and scaling in S3 and CloudFront. It covers logging and analyzing logs using Elasticsearch Service and Kibana. It provides examples of using Elasticsearch Service for log analytics, search, and with Kinesis and Lambda. The document concludes with best practices and a demo of using Lambda to ingest logs into Elasticsearch.
5. Why hosting in S3 is cool!
• Simple static website hosting with a simple
workflow
• 2 clicks to enable
• Scales
• Fast
• Supports your own FQDN
• You can extend it with lambda
functions for dynamic content
Developer
Users
Resolve DNS
Fetch site
direct from
S3
Push Code
7. Going global
• Simple to enable
• Speeds up your site for international users
• Choice of regions
• Allows you to enter multiple CNAME’s
• Integrated with Route 53 alias records
• Supports SSL certs from certificate manager
8. Amazon Global Network
• Redundant 100GbE network
• Redundant private capacity
between all Regions except China
Over 160 Global CloudFront
PoPs
89 Direct Connect Locations
Paris
Sweden
AWS GovCloud East
First 5 years: 4 regions
2016–2020: 13 regions
Next 5 years: 7 regions
AW S
REGIONS
2 0 R e g i o n s 6 1 A Z s
Milan
CapeTown
14. What produces data?
• Metering
Records
• Mobile
Apps
• IoT
Sensors
Web
Clickstream
• Enterprise
Documents
• Application
Logs
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/ht
docs/test
15. Logs, logs and more logs
Logs are important:
• Debugging
• Working out user flow
• Monitoring
Centralising those logs are even more important
16. Amazon Elasticsearch Service is a cost-effective
managed service that makes it easy to deploy,
manage, and scale open-source Elasticsearch for log
analytics, full-text search, and more.
17. Easy to Use
Deploy a production-ready Elasticsearch
cluster in minutes
Simplifies time-consuming management
tasks such as software patching, failure
recovery, backups, and monitoring
Open
Get direct access to the Elasticsearch
open-source API
Fully compatible with the open-source
Elasticsearch API, for all code and
applications
Secure
Secure Elasticsearch clusters with AWS
Identity and Access Management (IAM)
policies with fine-grained access control
access for users and endpoints
Automatically applies security patches
without disruption, keeping Elasticsearch
environments secure
Available
Provides high availability using Zone
Awareness, which replicates data between
two Availability Zones
Monitors the health of clusters and
automatically replaces failed nodes,
without service disruption
AWS Integrated
Integrates with Amazon Kinesis Firehose,
AWS IoT, and Amazon CloudWatch Logs for
seamless data ingestion
AWS CloudTrail for auditing, AWS Identity
and Access Management (IAM) for
security, and AWS CloudFormation for
cloud orchestration
Scalable
Scale clusters from a single node up to 20
nodes
Configure clusters to meet performance
requirements by selecting from a range of
instance types and storage options,
including SSD-powered EBS volumes
Amazon Elasticsearch Service Benefits
19. Amazon Elasticsearch Service leading use cases
Log Analytics &
Operational Monitoring
• Monitor the performance of
applications, web servers, and
hardware
• Easy to use, powerful data
visualization tools to detect
issues quickly
• Dig into logs in an intuitive,
fine-grained way
• Kibana provides fast, easy
visualization
Search
• Application or website provides
search capabilities over diverse
documents
• Tasked with making this knowledge
base searchable and accessible
• Text matching, faceting, filtering,
fuzzy search, autocomplete,
highlighting, and other search
features
• Query API to support application
search
20. Leading enterprises trust Amazon Elasticsearch
Service for their search and analytics applications
Media &
Entertainment
Online
Services
Technology Other
21. Adobe Developer Platform (Adobe I/O)
P R O B L E M
• Cost-effective monitor
for XL amount of log
data
• Over 200,000 API calls
per second at peak -
destinations, response
times, bandwidth
• Integrate seamlessly
with other components
of AWS ecosystem
S O L U T I O N
• Log data is routed
with Amazon Kinesis
to Amazon
Elasticsearch Service,
then displayed using
Kibana
• Adobe team can
easily see traffic
patterns and error
rates, quickly
identifying anomalies
and potential
challenges
B E N E F I T S
• Management and
operational simplicity
• Flexibility to try out
different cluster config
during dev and test
Amazon
Kinesis
Streams
Spark Streaming
Amazon
Elasticsearch
Service
Data
Sources
1
23. Easy to use and scalable
AWS SDK
AWS CLI Elasticsearch
data nodes
Elasticsearch
master nodes
Amazon Elasticsearch Service domain
Developer
Amazon Cognito
24.
25. Data pattern
Amazon ES cluster
logs_01.21.2017
logs_01.22.2017
logs_01.23.2017
logs_01.24.2017
logs_01.25.2017
logs_01.26.2017
logs_01.27.2017
Shard 1
Shard 2
Shard 3
host
ident
auth
timestamp
etc.
Each index has
multiple shards
Each shard contains
a set of documents
Each document contains
a set of fields and values
One index per day
26. Deployment of indices to a cluster
• Index 1
– Shard 1
– Shard 2
– Shard 3
• Index 2
– Shard 1
– Shard 2
– Shard 3
Amazon ES cluster
1
2
3
1
2
3
1
2
3
1
2
3
Primary Replica
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3
27. How many instances?
The index size will be about the same as the
corpus of source documents
• Double this if you are deploying an index replica
Size based on storage requirements
• Either local storage or up to 1.5 TB of Amazon Elastic
Block Store (EBS) per instance
• Example: 2 TB corpus will need 4 instances
– Assuming a replica and using EBS
– Given 1.5 TB of storage per instance, this gives 6TB of storage
28.
29. Instance type recommendations
Instance Workload
T2 Entry point. Dev and test. OK for dedicated masters of small
clusters.
M3, M4 Equal read and write volumes.
R3, R4 Read-heavy or workloads with high memory demands (e.g.,
aggregations).
C4 High concurrency/indexing workloads
I2,I3 Up to 1.6 TB of SSD instance storage.
30.
31. Cluster with no dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3
32. Cluster with dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1
2
1
1
2
Instance 2
3
2
2
3
Instance 3Dedicated master nodes
Data nodes: queries and updates
33. Master node recommendations
Number of data nodes Master node instance type
< 10 m3.medium+
< 20 m4.large+
<= 50 c4.xlarge+
50-100 c4.2xlarge+
Always use an odd number of masters, >= 3
34.
35. Cluster with zone awareness
Amazon ES cluster
1
3
Instance 1
2
1 2
Instance 2
3
2
1
Instance 3
Availability Zone 1 Availability Zone 2
2
1
Instance 4
3
3
36. Small use cases
• Logstash co-located on the
Application instance
• SigV4 signing via provided output
plugin
• Up to 200 GB of data
• m3.medium + 100G EBS data
nodes
• 3x m3.medium master nodes
Application
Instance
37. Large use cases
• Data flows from instances and
applications via Lambda
• SigV4 signing via Lambda/roles
• Up to 5 TB of data
• r3.2xlarge + 512 GB EBS data
nodes
• 3x m3.medium master nodes
38. XL use cases
• Ingest supported through high-
volume technologies like Spark or
Kinesis
• Up to 60 TB of data today
• R3.8xlarge + 640GB data nodes
• 3x m3.xlarge master nodes
39. Best practices
• Data nodes = Storage needed/Storage per node
• Use GP2 EBS volumes
• Use 3 dedicated master nodes for production deployments
• Enable Zone Awareness
• Set indices.fielddata.cache.size = 40
43. Permissions
IAM Roles
Permissions to allow:
Access to ES – (get granular and lock it down to a single cluster)
Access S3 - (read only permissions to one bucket)
CloudWatch Logs – (push logs from lambda app)