Analyzing Your Web and Application Logs

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Builders’ Day London
Analyzing your web and application logs
S3, Cloudfront, Lambda and Elasticsearch Service
Matt Pitchford
Principal Solutions Architect
pitchfm@amazon.co.uk

Agenda
• Hosting sites in S3
• Scaling out with CloudFront
• But what about logs?
• The Elasticsearch Service
• Using Lambda as glue
• Demo time

Why hosting in S3 is cool!
• Simple static website hosting with a simple
workflow
• 2 clicks to enable
• Scales
• Fast
• Supports your own FQDN
• You can extend it with lambda
functions for dynamic content
Developer
Users
Resolve DNS
Fetch site
direct from
S3
Push Code

Going global
• Simple to enable
• Speeds up your site for international users
• Choice of regions
• Allows you to enter multiple CNAME’s
• Integrated with Route 53 alias records
• Supports SSL certs from certificate manager

Amazon Global Network
• Redundant 100GbE network
• Redundant private capacity
between all Regions except China
Over 160 Global CloudFront
PoPs
89 Direct Connect Locations
Paris
Sweden
AWS GovCloud East
First 5 years: 4 regions
2016–2020: 13 regions
Next 5 years: 7 regions
AW S
REGIONS
2 0 R e g i o n s 6 1 A Z s
Milan
CapeTown

Going global
Developer
Resolve DNS
Push Code
Edge
location
Edge
location
Edge
location
Users go to closest POP
AWS Global network

What produces data?
• Metering
Records
• Mobile
Apps
• IoT
Sensors
Web
Clickstream
• Enterprise
Documents
• Application
Logs
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/ht
docs/test

Logs, logs and more logs
Logs are important:
• Debugging
• Working out user flow
• Monitoring
Centralising those logs are even more important

Amazon Elasticsearch Service is a cost-effective
managed service that makes it easy to deploy,
manage, and scale open-source Elasticsearch for log
analytics, full-text search, and more.

Easy to Use
Deploy a production-ready Elasticsearch
cluster in minutes
Simplifies time-consuming management
tasks such as software patching, failure
recovery, backups, and monitoring
Open
Get direct access to the Elasticsearch
open-source API
Fully compatible with the open-source
Elasticsearch API, for all code and
applications
Secure
Secure Elasticsearch clusters with AWS
Identity and Access Management (IAM)
policies with fine-grained access control
access for users and endpoints
Automatically applies security patches
without disruption, keeping Elasticsearch
environments secure
Available
Provides high availability using Zone
Awareness, which replicates data between
two Availability Zones
Monitors the health of clusters and
automatically replaces failed nodes,
without service disruption
AWS Integrated
Integrates with Amazon Kinesis Firehose,
AWS IoT, and Amazon CloudWatch Logs for
seamless data ingestion
AWS CloudTrail for auditing, AWS Identity
and Access Management (IAM) for
security, and AWS CloudFormation for
cloud orchestration
Scalable
Scale clusters from a single node up to 20
nodes
Configure clusters to meet performance
requirements by selecting from a range of
instance types and storage options,
including SSD-powered EBS volumes
Amazon Elasticsearch Service Benefits

ElasticSearch Service with Kibana

Amazon Elasticsearch Service leading use cases
Log Analytics &
Operational Monitoring
• Monitor the performance of
applications, web servers, and
hardware
• Easy to use, powerful data
visualization tools to detect
issues quickly
• Dig into logs in an intuitive,
fine-grained way
• Kibana provides fast, easy
visualization
Search
• Application or website provides
search capabilities over diverse
documents
• Tasked with making this knowledge
base searchable and accessible
• Text matching, faceting, filtering,
fuzzy search, autocomplete,
highlighting, and other search
features
• Query API to support application
search

Leading enterprises trust Amazon Elasticsearch
Service for their search and analytics applications
Media &
Entertainment
Online
Services
Technology Other

Adobe Developer Platform (Adobe I/O)
P R O B L E M
• Cost-effective monitor
for XL amount of log
data
• Over 200,000 API calls
per second at peak -
destinations, response
times, bandwidth
• Integrate seamlessly
with other components
of AWS ecosystem
S O L U T I O N
• Log data is routed
with Amazon Kinesis
to Amazon
Elasticsearch Service,
then displayed using
Kibana
• Adobe team can
easily see traffic
patterns and error
rates, quickly
identifying anomalies
and potential
challenges
B E N E F I T S
• Management and
operational simplicity
• Flexibility to try out
different cluster config
during dev and test
Amazon
Kinesis
Streams
Spark Streaming
Amazon
Elasticsearch
Service
Data
Sources
1

Easy to use and scalable
AWS SDK
AWS CLI Elasticsearch
data nodes
Elasticsearch
master nodes
Amazon Elasticsearch Service domain
Developer
Amazon Cognito

Data pattern
Amazon ES cluster
logs_01.21.2017
logs_01.22.2017
logs_01.23.2017
logs_01.24.2017
logs_01.25.2017
logs_01.26.2017
logs_01.27.2017
Shard 1
Shard 2
Shard 3
host
ident
auth
timestamp
etc.
Each index has
multiple shards
Each shard contains
a set of documents
Each document contains
a set of fields and values
One index per day

Deployment of indices to a cluster
• Index 1
– Shard 1
– Shard 2
– Shard 3
• Index 2
– Shard 1
– Shard 2
– Shard 3
Amazon ES cluster
1
2
3
1
2
3
1
2
3
1
2
3
Primary Replica
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3

How many instances?
The index size will be about the same as the
corpus of source documents
• Double this if you are deploying an index replica
Size based on storage requirements
• Either local storage or up to 1.5 TB of Amazon Elastic
Block Store (EBS) per instance
• Example: 2 TB corpus will need 4 instances
– Assuming a replica and using EBS
– Given 1.5 TB of storage per instance, this gives 6TB of storage

Instance type recommendations
Instance Workload
T2 Entry point. Dev and test. OK for dedicated masters of small
clusters.
M3, M4 Equal read and write volumes.
R3, R4 Read-heavy or workloads with high memory demands (e.g.,
aggregations).
C4 High concurrency/indexing workloads
I2,I3 Up to 1.6 TB of SSD instance storage.

Cluster with no dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3

Cluster with dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1
2
1
1
2
Instance 2
3
2
2
3
Instance 3Dedicated master nodes
Data nodes: queries and updates

Master node recommendations
Number of data nodes Master node instance type
< 10 m3.medium+
< 20 m4.large+
<= 50 c4.xlarge+
50-100 c4.2xlarge+
Always use an odd number of masters, >= 3

Cluster with zone awareness
Amazon ES cluster
1
3
Instance 1
2
1 2
Instance 2
3
2
1
Instance 3
Availability Zone 1 Availability Zone 2
2
1
Instance 4
3
3

Small use cases
• Logstash co-located on the
Application instance
• SigV4 signing via provided output
plugin
• Up to 200 GB of data
• m3.medium + 100G EBS data
nodes
• 3x m3.medium master nodes
Application
Instance

Large use cases
• Data flows from instances and
applications via Lambda
• SigV4 signing via Lambda/roles
• Up to 5 TB of data
• r3.2xlarge + 512 GB EBS data
nodes
• 3x m3.medium master nodes

XL use cases
• Ingest supported through high-
volume technologies like Spark or
Kinesis
• Up to 60 TB of data today
• R3.8xlarge + 640GB data nodes
• 3x m3.xlarge master nodes

Best practices
• Data nodes = Storage needed/Storage per node
• Use GP2 EBS volumes
• Use 3 dedicated master nodes for production deployments
• Enable Zone Awareness
• Set indices.fielddata.cache.size = 40

The function
https://gitlab.com/ric_harvey/cf-es-log-ingester

Permissions
Cluster access
Account ID

Permissions
IAM Roles
Permissions to allow:
Access to ES – (get granular and lock it down to a single cluster)
Access S3 - (read only permissions to one bucket)
CloudWatch Logs – (push logs from lambda app)

Analyzing Your Web and Application Logs

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Analyzing Your Web and Application Logs

Similaire à Analyzing Your Web and Application Logs (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Analyzing Your Web and Application Logs