SlideShare a Scribd company logo
1 of 98
Download to read offline
Using AWS To Build
A Scalable Machine Data Analytics Service
Christian Beedgen
November 13, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Who Am I
•  Co-Founder & CTO, Sumo Logic since 2010
–  Cloud-based Machine Data Analytics Service
–  Applications, Operations, Security

•  Server guy, Chief Architect, ArcSight, 2001-2009
–  Major SIEM player in the enterprise space
–  Log Management for security & compliance
Everything You Know Is Wrong
Everything You Know Is Wrong
Agenda
• 
• 
• 
• 
• 
• 
• 

Introduction To Logs & Logging
Why We Are Building This Service
Architecture Of The Service
Deployment Automation
Loosely Coupled Components
Lessons Learned
Cost & Business Value
Introduction To Logs & Logging
What Is Machine Data?
•  Actually, Machine Generated Data
Curt Monash:
“Data that was produced
entirely by machines OR
data that is more about
observing humans than
recording their choices.”

Daniel Abadi:
"Machine-generated data is
data that is generated as a
result of a decision of an
independent computational
agent or a measurement of
an event that is not caused
by a human action."
Examples Of Machine Data
• 
• 
• 
• 
• 
• 

Computer, network, and other equipment logs
Satellite and similar telemetry (espionage or science)
Location data, RFID chip readings, GPS system output
Temperature and other environmental sensor readings
Sensor readings from factories, pipelines, etc.
Output from many kinds of medical devices
What Are Logs?
• 
• 
• 
• 
• 

Logs are a kind of Machine Data
Time-stamped bits and pieces of text
Whispers & utterances of your infrastructure
Written to disk to a log file by applications
Sent over the network by devices
A Wealth Of Information
• 
• 
• 
• 
• 

Like Twitter for your infrastructure
Machine data analytics…
…is sentiment analysis for machines
Free data of tremendous value
Don’t forget to manage and analyze it
Or Else…
Anatomy Of A Log
Anatomy Of A Log

•  Timestamp with time zone!
Anatomy Of A Log

•  Timestamp with time zone!
•  Log level
Anatomy Of A Log

•  Timestamp with time zone!
•  Log level
•  Host ID & module name (process/service)
Anatomy Of A Log

• 
• 
• 
• 

Timestamp with time zone!
Log level
Host ID & module name (process/service)
Code location or class
Anatomy Of A Log

• 
• 
• 
• 
• 

Timestamp with time zone!
Log level
Host ID & module name (process/service)
Code location or class
Authentication context
Anatomy Of A Log

• 
• 
• 
• 
• 
• 

Timestamp with time zone!
Log level
Host ID & module name (process/service)
Code location or class
Authentication context
Key-value pairs
Use Cases
•  Availability & Performance
–  Prevent downtime by proactive analytics, alerting
–  Reduce MTTR by having all required data at your fingertips

•  Application Release
–  Derive metrics from development and staging systems pre-deploy
–  Baseline and compare after post-deploy quickly shows errors

•  Security & Compliance
–  Compliance starts with having all security related logs in one place
–  Analytics across all data facilitates detecting breaches and problems
Customer Metrics
Use Case

Customer Examples

Metric

Security &
Compliance

Apigee reduced compliance
audit costs by ~50%

Availability and
Performance

Ink saves nearly $500K
annually

Application
Release

Intaact reduced errors
by 4X
Machine Data Is Big Data
•  Volume
–  Machine Data is voluminous and will continue to grow
–  Our own application creates 1TB/logs per week easily

•  Velocity
–  Machine Data occurs in real-time, and it is time-stamped
–  Needs to be processed in real-time as well

•  Variety
–  Machine Data is unstructured, or poly-structured at best
–  Some standard schema, but sure enough not for you applications
Why We Are Building This Service
We Need To Evolve
We Need To Evolve
Legacy Products Fall Short
•  Volume leads to scalability issues
–  Every Log Management system will fail – I have seen it
–  Why should you bother with scaling yet one more system?

•  Velocity challenges processing pipelines
–  What good are dashboards if they are not real-time?
–  Streaming query engines are absolute must

•  Variety isn’t being embraced
–  All data should be allowed into the system
–  No vendor will ever know your application’s log schema
AWS Enables Innovation
• 
• 
• 
• 
• 

Attending Werner’s talk at Stanford in 2008
First parking lot discussion
This can apply to our space!
Datacenter as API
Massive power up to scraggly devs
AWS Enables Sumo Logic
•  Entering an existing market
–  Existing & established competition, some of it huge
–  Catch up & differentiate at the same time

•  A Big Data service
–  Scaling on premise is hard and leaves the hard part to the customer
–  Now we build one single system to deal with all customers

•  This data is important
–  Regulatory compliance is among the big drivers for collecting it
–  HA & DR concerns all over the place à Amazon S3
Deployment Architecture - Before
Deployment Architecture - After
Architecture Of The System
Development Approach
• 
• 
• 
• 
• 
• 
• 

Developed in Scala because we like it
Many small cohesive modules, low coupling
Maven-based build system
Layers of modules combined into applications
Different applications for different concerns
Internal Service-Oriented Architecture
Communication via documented protocols
Basic Concerns
•  Data ingestion
–  Receiving data
–  Raw storage
–  Full-text indexing

•  Data analysis
–  Interactive analytics
–  Scheduled queries
–  Machine learning

–  Continuous query
evaluation
Concerns Map To Clusters
• 
• 
• 
• 
• 
• 

A cluster is multiple instances of the same application
Deployed on multiple Amazon EC2 instances
Deployed across multiple availability zones
Instances within a cluster are oblivious of each other
Receive from upstream, talk to downstream
Receive from message bus, or talk RPC
Ingestion Path
Raw

Receiver

Bus

Index

CQ

S3
Receiver
• 
• 
• 
• 
• 
• 
• 

HTTPS endpoint behind Elastic Load Balancing
Decompress messages from Collector
Extract timestamps from messages
Aggregate messages per-customer into blocks
Flush blocks to message bus
Ack to Collector
“Statelessly stateful”/”Statefully stateless”

Receiver
Raw
• 
• 
• 
• 
• 
• 
• 

Raw

Receive message blocks from message bus
Encrypt message blocks
Different key for every day for every customer
Flush encrypted message blocks to Amazon S3
Copy blocks as CSV to customer’s Amazon S3 bucket
Ack to message bus
Fully stateless
Index
• 
• 
• 
• 
• 
• 
• 

Index

Receive message blocks from message bus
Cache message block on disk and ack to message bus
Add message blocks to Lucene indexes
Deal with wildly varying timestamps
Flush index shards to Amazon S3
Update meta data database with index shard info
Stateful
Continuous Query
• 
• 
• 
• 
• 
• 
• 

CQ

Receive message blocks from message bus
Evaluate each message against all search expressions
Push matching messages into respective pipelines
Ack to message bus
Flush results periodically for pickup by client
Persist checkpoints periodically to Amazon S3
Stateful, with checkpoint recovery
Analytics Path
Query
Service

S3
CQ
Query
• 
• 
• 
• 
• 
• 
• 

Query

Fully distributed streaming query engine
Materialize messages matching search expression
Push messages through a pipeline of operators
First stage – non-aggregation operators
Second stage – aggregation operators
Present both raw message results as well as aggregates
Results update periodically for interactive UI experience
Deployment Automation
Why Deployment Automation
• 
• 
• 
• 
• 
• 
• 

Add 1 part developers, 1 part Datacenter-as-API, stir…
Aim for fully integrated continuous deployment
Checkin à unit test à integration test à deployment
Jenkins automates it all – using AWS instances
Deployment doesn’t mean production
Nite à Stag à Long à Prod deployments
There are humans involved as well!
Automation Enables Scale
•  The goal is 100% - accept no less
•  Why U need automation
– 
– 
– 
– 

Number of deployments grows (staging, per-developer)
Number of AWS resources per deployment grows
Number of operators/developers grows
Frequency of deployments, changes increases
Current Deployment Stats
• 
• 
• 
• 
• 
• 

4 Deployments running 24/7, 50 for development
20+ clusters per deployment
25+ software components deployed
Hundreds of instances in production
Less than 10 minutes to deploy from scratch
Less than 4 minutes to restart hundreds of components
dsh: Another AWS deployment tool
• 
• 
• 
• 
• 

Model-driven, describe desired state, run to make it so
High performance due to parallelization
Covers all layers of the stack – AWS, OS, Sumo Logic
Easy to use and extend, scriptable CLI
Developer-friendly, Scala-based, high-level APIs
Example session
Sie Ist Ein Model & Sie Sieht Gut Aus
•  Model contains concepts
–  Deployment
–  Cluster
–  AWS Resources (Amazon S3, Amazon Elastic Load Balancing, Amazon
DynamoDB, Amazon RDS, etc.)
–  Software assemblies
–  AWS configuration (IAM users, security groups, etc.)

•  Human-readable names: prod-index-5!
Model Snippet
Model Snippet
Differential Deployment
•  Start by finding existing resources
–  Use tagging where it is available
–  Name prefixes (“prod_xxx”) where it isn’t (security groups, IAM, …)

•  Fix differences to model
–  Start “missing” instances
–  Change security group rules, missing IAM users

•  Proceed with caution
–  Never delete anything that holds data
–  Amazon EBS, Amazon DynamoDB, Amazon S3, Amazon RDS
Example Of Tag Usage
Making It Fast
•  Parallelize all the things
–  Upload to Amazon S3 while booting instances while creating IAM users
while setting up security groups while…
–  Hyper-concurrent rolling restarts
Hyper, Hyper
Making It Fast
•  Parallelize all the things
–  Upload to Amazon S3 while booting instances while creating IAM users
while setting up security groups while…
–  Hyper-concurrent rolling restarts

•  Fast enough for development
–  Write new code or fix a bug, compile locally
–  Push code to development deployment and make it live

•  Optimize data transfers
–  Use Amazon S3 hashes to only transfer new files
–  Only upload changed JARs
Making It Reliable
•  Check prerequisites before you even try
–  Does Prod account have room for this many instances?
–  Do I have the required permissions for the AWS APIs?
–  Any model discrepancies I can’t automatically resolve? Too many Amazon
EBS volumes?

•  Handle common failures automatically
–  No m1.large in us-east-1b? Move Amazon EBS volumes to us-west-1c and
try there
–  Hitting the AWS API rate limit? Throttle and try again
–  SSH didn’t come up on the instance? Kill it and launch another
–  Eventual consistency in AWS– query until it has the expected state (tags)
Making It Secure
•  Different AWS accounts
–  Per developer
–  Production

•  account.xml!
–  All credentials for one AWS
account (AWS keys, SSH
keys)
–  Password-protected

•  IAM
–  One user per Sumo
component
–  Minimal IAM policy
–  Inject AWS credentials

•  Security Groups
–  Part of the model
–  Minimal privileges
Making It Safe
• 
• 
• 
• 
• 

Let mistakes happen at most once
Add safeguards to prevent operator mistakes
Type in the deployment name before deleting anything
Disallow risky operations in production (shutdown Prod)
Don’t allow –SNAPSHOT code to be deployed in production
Making It Easy
•  Automate best practices
–  Distribute instances over availability zones evenly
–  Register instances in Elastic Load Balancing and match AZs to
instances
–  Tag all resources consistently

•  Consistent naming
–  Generate SSH with logical names
Making It Affordable
•  Developers forget to shut stuff down
–  Deployment reaper automatically shuts down deployments
–  Daily cost emails

•  Per-team budgets
–  Manager responsible to
keep within budget
Pitfalls
• 
• 
• 
• 

Base AMI plus scripted installation prevents auto scaling
Security group updates cause TCP disconnects
This is fixed in the VPC stack, however
Parallelism can cause stampedes (for example,
Amazon DynamoDB)
•  Tagging API rate limits are easy to hit
Loosely Coupled Components
Loose Coupling In The Large
• 
• 
• 
• 
• 

A deployment is made up of many things
Some of these things need to talk to each other
Some of these things come and go
Don’t pass in a huge list of static dependencies
Start each application with one parameter
$ bin/receiver prod.service-registry.sumologic.com!
Service Registry
• 
• 
• 
• 
• 
• 
• 

Service Registry is a concept, enables discovery
A client-side library accessing a Zookeeper cluster
Services are abstracted into types
Application provides and consumes different services
Sumo Logic services (RPC)
Third-party services (message bus)
AWS services (Amazon ElastiCache, Amazon RDS)
The Perils Of Horizontal Scale
• 
• 
• 
• 
• 
• 
• 

Scaling out a multi-tenant processing system
1000s of customers, 1000s of machines
Parallelism is good, but locality has to be considered
1 customer distributed over 1000 machines is bad
No single machine getting enough load for that customer
Batches & shards will become too small
Metadata and in-memory structures grow out of proportion
The Perils Of Horizontal Scale
Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index
The Perils Of Horizontal Scale
1

1

1

1

1

Index
Index
Index
Index
Index

1

1

1

1

1

Index
Index
Index
Index
Index

1

1

1

1

1

Index
Index
Index
Index
Index

1

1

1

1

1

Index
Index
Index
Index
Index

1

1

1

1

1

Index
Index
Index
Index
Index
The Perils Of Horizontal Scale
1

1

1

1

1

2

1

2

1

2

1

2

1

2

1

Index
Index
Index
Index
Index

2

1

2

1

2

1

2

1

2

1

Index
Index
Index
Index
Index

2

1

2

1

2

1

2

1

2

1

Index
Index
Index
Index
Index

2

1

2

1

2

1

2

1

2

1

Index
Index
Index
Index
Index

2

Index
2

Index
2

Index
2

Index
2

Index
The Perils Of Horizontal Scale
1
5
1
5
1
5
1
5
1
5

2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7

4

1

8

5

4

1

8

5

4

1

8

5

4

1

8

5

4

1

8

5

2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7

4

1

8

5

4

1

8

5

4

1

8

5

4

1

8

5

4

1

8

5

2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7

4

1

8

5

4

1

8

5

4

1

8

5

4

1

8

5

4

1

8

5

2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7

4

1

8

5

4

1

8

5

4

1

8

5

4

1

8

5

4

1

8

5

2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7
2

3

Index
6
7

4
8
4
8
4
8
4
8
4
8
The Perils Of Horizontal Scale
1Index

1Index

Index

Index

Index

1Index

1Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index
The Perils Of Horizontal Scale
1Index

1Index

2Index

2Index

2Index

1Index

1Index

2Index

2Index

2Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index

Index
The Perils Of Horizontal Scale
1Index4
3

1Index4
3

2Index5
3

2Index5
3

2Index6
3

1Index4
3

1Index4
3

2Index5
3

2Index5
3

2Index6
3

7 Index

7 Index

5 Index
8

5 Index
8

5Index6
8

7 Index

7 Index

5 Index
8

5 Index
8

5Index6
8

7Index

7Index

5Index
8

5Index
8

5Index6
8
Customer Partitioning
• 
• 
• 
• 
• 

Each cluster elects a leader node via Zookeeper
Leader runs the partitioning logic
Set[Customer], Set[Instance] à Map[Instance, Set[Customer]]!

Partitioning written to Zookeeper
Example: indexer node knows which customer’s message
blocks to pull from message bus
Lessons Learned
Some Tips On AWS S3
•  Use the TransferManager class from the AWS Java SDK
–  Multi-part uploads and downloads
–  Multi-threaded, overall latency reduction

•  Use random prefixes for keynames in Amazon S3 buckets
–  Amazon S3 partitions by keyname prefix
!

http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html

•  Endpoint URL for Amazon S3
–  s3.amazonaws.com might go to Virginia, or Pacific Northwest (!)

–  If you are in us-east, use s3-external-1.amazonaws.com instead
Elastic Block Store
•  RAID-0 makes Amazon EBS faster
–  Use LVM RAID-0 if heavy I/O is required
–  Align stripe sizes with file system block sizes

•  Snapshotting Amazon EBS volumes
–  Snapshots eat performance
–  Even for volumes with provisioned IOPS

•  Overlapping snapshots
–  Can be scheduled too close together, like every minute
–  I/Os start taking 30+ seconds
Cost & Business Value
Somebody Has To Pay For Lunch
• 
• 
• 
• 
• 

On-demand resources are very sexy
Automation gives developers their own sandbox
Compute is the most easily incurred cost
You need an automated reaper
Or just raise another round… J
Elasticity Is Not An Arbitrary Need
• 
• 
• 
• 
• 
• 
• 

At least in our system, there’s baseline load
At least in our system, the cost is in compute
Alert-based scaling can be safe & effective
Measure your spend with tools that are out there
We actually use Sumo Logic for that!
Look for a moving average of resource consumption
Buy Reserved Instances, don’t fret the instance types
One More Thing
Amazon CloudTrail
•  Logs! From AWS! The eagle has landed!
•  Amazon CloudTrail logs your API activity to Amazon S3
•  Sumo Logic will read from Amazon S3, allow analysis
Please give us your feedback on this
presentation

BDT401
As a thank you, we will select prize
winners daily for completed surveys!

Thank You
Chart Example
Category 4
Category 3
Category 2
Category 1
0%

20%
Series 1

40%
60%
Axis Title
Series 2

Series 3

80%
Series 4

100%
Powerpoint Guidelines
Arial

Please do not use gradients, shadows or outlines on shape
elements in your presentation.
PowerPoint Guidelines
When pasting content from another presentation
please paste using “Destination Theme”
Windows

Mac

Note: This works when copying entire slides from other presentations as long as the source presentation is also 16:9
PowerPoint Guidelines
When pasting content Code into a Code template please use the
“Keep Text Only Function” If any additional coloring needs to be done
to your code type please do it after pasting it into your slide.
Windows

Mac
68k Assembly Code Sample
; Syntax Test file for 68k Assembly code
; Some comments about this file
.D0 00000000
MS 2100 00000002
MM 2000;DI
LEA.L $002100,A1
MOVE.L #2,-(A1)
BSR $00002050
MM 2050;DI
MOVE.L (A1)+,D1
MOVE.L (A1),D2
ADD.L D1,D2
Basic text content slide
•  With Content
–  And more content
Title Slide #2
Slide with two columns
Slide with two columns and titles
Slide with space for custom content
Side Content
Description or content with place for
image on the right
Big picture slide
Please give us your feedback on this
presentation

As a thank you, we will select prize
winners daily for completed surveys!

Thank You

More Related Content

What's hot

Migration Recipes for Success - AWS Summit Cape Town 2017
Migration Recipes for Success - AWS Summit Cape Town 2017 Migration Recipes for Success - AWS Summit Cape Town 2017
Migration Recipes for Success - AWS Summit Cape Town 2017 Amazon Web Services
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)Amazon Web Services
 
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...Amazon Web Services
 
Continuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerContinuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerAmazon Web Services
 
Workshop: AWS Lamda Signal Corps vs Zombies
Workshop: AWS Lamda Signal Corps vs ZombiesWorkshop: AWS Lamda Signal Corps vs Zombies
Workshop: AWS Lamda Signal Corps vs ZombiesAmazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudAmazon Web Services
 
Adopting DevOps at Scale on AWS with VirtusaPolaris
Adopting DevOps at Scale on AWS with VirtusaPolarisAdopting DevOps at Scale on AWS with VirtusaPolaris
Adopting DevOps at Scale on AWS with VirtusaPolarisAmazon Web Services
 
AWS Webcast - Datacenter Migration to AWS
AWS Webcast - Datacenter Migration to AWSAWS Webcast - Datacenter Migration to AWS
AWS Webcast - Datacenter Migration to AWSAmazon Web Services
 
AWS re:Invent 2016: Store and collaborate on content securely with Amazon Wor...
AWS re:Invent 2016: Store and collaborate on content securely with Amazon Wor...AWS re:Invent 2016: Store and collaborate on content securely with Amazon Wor...
AWS re:Invent 2016: Store and collaborate on content securely with Amazon Wor...Amazon Web Services
 
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...Amazon Web Services
 
WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn
WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn
WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn Amazon Web Services
 
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...Amazon Web Services
 
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013Amazon Web Services
 
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...Amazon Web Services
 
Azure Serverless with Functions, Logic Apps, and Event Grid
Azure Serverless with Functions, Logic Apps, and Event Grid  Azure Serverless with Functions, Logic Apps, and Event Grid
Azure Serverless with Functions, Logic Apps, and Event Grid WinWire Technologies Inc
 
Simplify Your Database Migration to AWS | AWS Public Sector Summit 2016
Simplify Your Database Migration to AWS | AWS Public Sector Summit 2016Simplify Your Database Migration to AWS | AWS Public Sector Summit 2016
Simplify Your Database Migration to AWS | AWS Public Sector Summit 2016Amazon Web Services
 
Migrating your Databases to Amazon Aurora - AWS April 2016 Webinar Series
Migrating your Databases to Amazon Aurora - AWS April 2016 Webinar SeriesMigrating your Databases to Amazon Aurora - AWS April 2016 Webinar Series
Migrating your Databases to Amazon Aurora - AWS April 2016 Webinar SeriesAmazon Web Services
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)Amazon Web Services
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersAmazon Web Services
 

What's hot (20)

Migration Recipes for Success - AWS Summit Cape Town 2017
Migration Recipes for Success - AWS Summit Cape Town 2017 Migration Recipes for Success - AWS Summit Cape Town 2017
Migration Recipes for Success - AWS Summit Cape Town 2017
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
 
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...
 
Continuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerContinuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and Docker
 
Workshop: AWS Lamda Signal Corps vs Zombies
Workshop: AWS Lamda Signal Corps vs ZombiesWorkshop: AWS Lamda Signal Corps vs Zombies
Workshop: AWS Lamda Signal Corps vs Zombies
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless Cloud
 
Adopting DevOps at Scale on AWS with VirtusaPolaris
Adopting DevOps at Scale on AWS with VirtusaPolarisAdopting DevOps at Scale on AWS with VirtusaPolaris
Adopting DevOps at Scale on AWS with VirtusaPolaris
 
AWS Webcast - Datacenter Migration to AWS
AWS Webcast - Datacenter Migration to AWSAWS Webcast - Datacenter Migration to AWS
AWS Webcast - Datacenter Migration to AWS
 
AWS re:Invent 2016: Store and collaborate on content securely with Amazon Wor...
AWS re:Invent 2016: Store and collaborate on content securely with Amazon Wor...AWS re:Invent 2016: Store and collaborate on content securely with Amazon Wor...
AWS re:Invent 2016: Store and collaborate on content securely with Amazon Wor...
 
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
 
WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn
WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn
WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn
 
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
 
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
 
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
 
Azure Serverless with Functions, Logic Apps, and Event Grid
Azure Serverless with Functions, Logic Apps, and Event Grid  Azure Serverless with Functions, Logic Apps, and Event Grid
Azure Serverless with Functions, Logic Apps, and Event Grid
 
Simplify Your Database Migration to AWS | AWS Public Sector Summit 2016
Simplify Your Database Migration to AWS | AWS Public Sector Summit 2016Simplify Your Database Migration to AWS | AWS Public Sector Summit 2016
Simplify Your Database Migration to AWS | AWS Public Sector Summit 2016
 
Migrating your Databases to Amazon Aurora - AWS April 2016 Webinar Series
Migrating your Databases to Amazon Aurora - AWS April 2016 Webinar SeriesMigrating your Databases to Amazon Aurora - AWS April 2016 Webinar Series
Migrating your Databases to Amazon Aurora - AWS April 2016 Webinar Series
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for Partners
 

Viewers also liked

What's Working in In-App Monetization - GDC 2014
What's Working in In-App Monetization - GDC 2014What's Working in In-App Monetization - GDC 2014
What's Working in In-App Monetization - GDC 2014Amazon Web Services
 
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...Amazon Web Services
 
AWS Summit 2011: Customer Presentation - Vimeo
AWS Summit 2011: Customer Presentation - VimeoAWS Summit 2011: Customer Presentation - Vimeo
AWS Summit 2011: Customer Presentation - VimeoAmazon Web Services
 
Getting Started with Amazon Mechanical Turk - AWS Summit 2012 - NYC
Getting Started with Amazon Mechanical Turk - AWS Summit 2012 - NYCGetting Started with Amazon Mechanical Turk - AWS Summit 2012 - NYC
Getting Started with Amazon Mechanical Turk - AWS Summit 2012 - NYCAmazon Web Services
 
Webinar: Amazon SES Management Console
Webinar: Amazon SES Management ConsoleWebinar: Amazon SES Management Console
Webinar: Amazon SES Management ConsoleAmazon Web Services
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
AWS Summit 2011 : How to become an AWS Solution Provider
AWS Summit 2011 : How to become an AWS Solution ProviderAWS Summit 2011 : How to become an AWS Solution Provider
AWS Summit 2011 : How to become an AWS Solution ProviderAmazon Web Services
 
Discussion: Adoption, Issues & Strategies for AWS Cloud Implementation (DMG21...
Discussion: Adoption, Issues & Strategies for AWS Cloud Implementation (DMG21...Discussion: Adoption, Issues & Strategies for AWS Cloud Implementation (DMG21...
Discussion: Adoption, Issues & Strategies for AWS Cloud Implementation (DMG21...Amazon Web Services
 
Disaster Recovery using Amazon Web Services - Webinar
Disaster Recovery using Amazon Web Services - WebinarDisaster Recovery using Amazon Web Services - Webinar
Disaster Recovery using Amazon Web Services - WebinarAmazon Web Services
 
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...Amazon Web Services
 
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your DeploymentAWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your DeploymentAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 

Viewers also liked (13)

What's Working in In-App Monetization - GDC 2014
What's Working in In-App Monetization - GDC 2014What's Working in In-App Monetization - GDC 2014
What's Working in In-App Monetization - GDC 2014
 
Architecture Evolution at Wooga
Architecture Evolution at WoogaArchitecture Evolution at Wooga
Architecture Evolution at Wooga
 
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...
AWS Summit Auckland 2014 | Why Scale Matters and How the Cloud Really is Diff...
 
AWS Summit 2011: Customer Presentation - Vimeo
AWS Summit 2011: Customer Presentation - VimeoAWS Summit 2011: Customer Presentation - Vimeo
AWS Summit 2011: Customer Presentation - Vimeo
 
Getting Started with Amazon Mechanical Turk - AWS Summit 2012 - NYC
Getting Started with Amazon Mechanical Turk - AWS Summit 2012 - NYCGetting Started with Amazon Mechanical Turk - AWS Summit 2012 - NYC
Getting Started with Amazon Mechanical Turk - AWS Summit 2012 - NYC
 
Webinar: Amazon SES Management Console
Webinar: Amazon SES Management ConsoleWebinar: Amazon SES Management Console
Webinar: Amazon SES Management Console
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
AWS Summit 2011 : How to become an AWS Solution Provider
AWS Summit 2011 : How to become an AWS Solution ProviderAWS Summit 2011 : How to become an AWS Solution Provider
AWS Summit 2011 : How to become an AWS Solution Provider
 
Discussion: Adoption, Issues & Strategies for AWS Cloud Implementation (DMG21...
Discussion: Adoption, Issues & Strategies for AWS Cloud Implementation (DMG21...Discussion: Adoption, Issues & Strategies for AWS Cloud Implementation (DMG21...
Discussion: Adoption, Issues & Strategies for AWS Cloud Implementation (DMG21...
 
Disaster Recovery using Amazon Web Services - Webinar
Disaster Recovery using Amazon Web Services - WebinarDisaster Recovery using Amazon Web Services - Webinar
Disaster Recovery using Amazon Web Services - Webinar
 
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...
 
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your DeploymentAWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 

Similar to Using AWS to Build a Scalable Big Data Management & Processing Service (BDT401) | AWS re:Invent 2013

Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceChristian Beedgen
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architectureMatsuo Sawahashi
 
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...Startupfest
 
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...Vikas Sahni
 
Building azure applications ireland
Building azure applications irelandBuilding azure applications ireland
Building azure applications irelandMichael Meagher
 
Alfredo Reino - Monitoring aws and azure
Alfredo Reino - Monitoring aws and azureAlfredo Reino - Monitoring aws and azure
Alfredo Reino - Monitoring aws and azureDevSecCon
 
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWSChristian Beedgen
 
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...Amazon Web Services
 
Deep dive into service fabric after 2 years
Deep dive into service fabric after 2 yearsDeep dive into service fabric after 2 years
Deep dive into service fabric after 2 yearsTomasz Kopacz
 
Migrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSMigrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSTom Laszewski
 
Big Data and Machine Learning on AWS
Big Data and Machine Learning on AWSBig Data and Machine Learning on AWS
Big Data and Machine Learning on AWSCloudHesive
 
(SEC310) Keeping Developers and Auditors Happy in the Cloud
(SEC310) Keeping Developers and Auditors Happy in the Cloud(SEC310) Keeping Developers and Auditors Happy in the Cloud
(SEC310) Keeping Developers and Auditors Happy in the CloudAmazon Web Services
 
Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014Akash Mahajan
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanAmazon Web Services
 
AWS User Group Sydney - Meetup #60
AWS User Group Sydney - Meetup #60AWS User Group Sydney - Meetup #60
AWS User Group Sydney - Meetup #60PolarSeven Pty Ltd
 

Similar to Using AWS to Build a Scalable Big Data Management & Processing Service (BDT401) | AWS re:Invent 2013 (20)

Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics Service
 
AWS Webcast - Sumo Logic
AWS Webcast - Sumo LogicAWS Webcast - Sumo Logic
AWS Webcast - Sumo Logic
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
 
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
 
Building azure applications ireland
Building azure applications irelandBuilding azure applications ireland
Building azure applications ireland
 
Alfredo Reino - Monitoring aws and azure
Alfredo Reino - Monitoring aws and azureAlfredo Reino - Monitoring aws and azure
Alfredo Reino - Monitoring aws and azure
 
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS
 
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
 
Deep dive into service fabric after 2 years
Deep dive into service fabric after 2 yearsDeep dive into service fabric after 2 years
Deep dive into service fabric after 2 years
 
Migrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSMigrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWS
 
Big Data and Machine Learning on AWS
Big Data and Machine Learning on AWSBig Data and Machine Learning on AWS
Big Data and Machine Learning on AWS
 
(SEC310) Keeping Developers and Auditors Happy in the Cloud
(SEC310) Keeping Developers and Auditors Happy in the Cloud(SEC310) Keeping Developers and Auditors Happy in the Cloud
(SEC310) Keeping Developers and Auditors Happy in the Cloud
 
Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014Security in the cloud Workshop HSTC 2014
Security in the cloud Workshop HSTC 2014
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
AWS User Group Sydney - Meetup #60
AWS User Group Sydney - Meetup #60AWS User Group Sydney - Meetup #60
AWS User Group Sydney - Meetup #60
 
Managing Your Cloud Assets
Managing Your Cloud AssetsManaging Your Cloud Assets
Managing Your Cloud Assets
 
Benefits of Cloud Computing
Benefits of Cloud ComputingBenefits of Cloud Computing
Benefits of Cloud Computing
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Using AWS to Build a Scalable Big Data Management & Processing Service (BDT401) | AWS re:Invent 2013

  • 1. Using AWS To Build A Scalable Machine Data Analytics Service Christian Beedgen November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Who Am I •  Co-Founder & CTO, Sumo Logic since 2010 –  Cloud-based Machine Data Analytics Service –  Applications, Operations, Security •  Server guy, Chief Architect, ArcSight, 2001-2009 –  Major SIEM player in the enterprise space –  Log Management for security & compliance
  • 5. Agenda •  •  •  •  •  •  •  Introduction To Logs & Logging Why We Are Building This Service Architecture Of The Service Deployment Automation Loosely Coupled Components Lessons Learned Cost & Business Value
  • 7. What Is Machine Data? •  Actually, Machine Generated Data Curt Monash: “Data that was produced entirely by machines OR data that is more about observing humans than recording their choices.” Daniel Abadi: "Machine-generated data is data that is generated as a result of a decision of an independent computational agent or a measurement of an event that is not caused by a human action."
  • 8. Examples Of Machine Data •  •  •  •  •  •  Computer, network, and other equipment logs Satellite and similar telemetry (espionage or science) Location data, RFID chip readings, GPS system output Temperature and other environmental sensor readings Sensor readings from factories, pipelines, etc. Output from many kinds of medical devices
  • 9. What Are Logs? •  •  •  •  •  Logs are a kind of Machine Data Time-stamped bits and pieces of text Whispers & utterances of your infrastructure Written to disk to a log file by applications Sent over the network by devices
  • 10.
  • 11. A Wealth Of Information •  •  •  •  •  Like Twitter for your infrastructure Machine data analytics… …is sentiment analysis for machines Free data of tremendous value Don’t forget to manage and analyze it
  • 14. Anatomy Of A Log •  Timestamp with time zone!
  • 15. Anatomy Of A Log •  Timestamp with time zone! •  Log level
  • 16. Anatomy Of A Log •  Timestamp with time zone! •  Log level •  Host ID & module name (process/service)
  • 17. Anatomy Of A Log •  •  •  •  Timestamp with time zone! Log level Host ID & module name (process/service) Code location or class
  • 18. Anatomy Of A Log •  •  •  •  •  Timestamp with time zone! Log level Host ID & module name (process/service) Code location or class Authentication context
  • 19. Anatomy Of A Log •  •  •  •  •  •  Timestamp with time zone! Log level Host ID & module name (process/service) Code location or class Authentication context Key-value pairs
  • 20. Use Cases •  Availability & Performance –  Prevent downtime by proactive analytics, alerting –  Reduce MTTR by having all required data at your fingertips •  Application Release –  Derive metrics from development and staging systems pre-deploy –  Baseline and compare after post-deploy quickly shows errors •  Security & Compliance –  Compliance starts with having all security related logs in one place –  Analytics across all data facilitates detecting breaches and problems
  • 21. Customer Metrics Use Case Customer Examples Metric Security & Compliance Apigee reduced compliance audit costs by ~50% Availability and Performance Ink saves nearly $500K annually Application Release Intaact reduced errors by 4X
  • 22. Machine Data Is Big Data •  Volume –  Machine Data is voluminous and will continue to grow –  Our own application creates 1TB/logs per week easily •  Velocity –  Machine Data occurs in real-time, and it is time-stamped –  Needs to be processed in real-time as well •  Variety –  Machine Data is unstructured, or poly-structured at best –  Some standard schema, but sure enough not for you applications
  • 23. Why We Are Building This Service
  • 24. We Need To Evolve
  • 25. We Need To Evolve
  • 26. Legacy Products Fall Short •  Volume leads to scalability issues –  Every Log Management system will fail – I have seen it –  Why should you bother with scaling yet one more system? •  Velocity challenges processing pipelines –  What good are dashboards if they are not real-time? –  Streaming query engines are absolute must •  Variety isn’t being embraced –  All data should be allowed into the system –  No vendor will ever know your application’s log schema
  • 27. AWS Enables Innovation •  •  •  •  •  Attending Werner’s talk at Stanford in 2008 First parking lot discussion This can apply to our space! Datacenter as API Massive power up to scraggly devs
  • 28. AWS Enables Sumo Logic •  Entering an existing market –  Existing & established competition, some of it huge –  Catch up & differentiate at the same time •  A Big Data service –  Scaling on premise is hard and leaves the hard part to the customer –  Now we build one single system to deal with all customers •  This data is important –  Regulatory compliance is among the big drivers for collecting it –  HA & DR concerns all over the place à Amazon S3
  • 32. Development Approach •  •  •  •  •  •  •  Developed in Scala because we like it Many small cohesive modules, low coupling Maven-based build system Layers of modules combined into applications Different applications for different concerns Internal Service-Oriented Architecture Communication via documented protocols
  • 33. Basic Concerns •  Data ingestion –  Receiving data –  Raw storage –  Full-text indexing •  Data analysis –  Interactive analytics –  Scheduled queries –  Machine learning –  Continuous query evaluation
  • 34. Concerns Map To Clusters •  •  •  •  •  •  A cluster is multiple instances of the same application Deployed on multiple Amazon EC2 instances Deployed across multiple availability zones Instances within a cluster are oblivious of each other Receive from upstream, talk to downstream Receive from message bus, or talk RPC
  • 36. Receiver •  •  •  •  •  •  •  HTTPS endpoint behind Elastic Load Balancing Decompress messages from Collector Extract timestamps from messages Aggregate messages per-customer into blocks Flush blocks to message bus Ack to Collector “Statelessly stateful”/”Statefully stateless” Receiver
  • 37. Raw •  •  •  •  •  •  •  Raw Receive message blocks from message bus Encrypt message blocks Different key for every day for every customer Flush encrypted message blocks to Amazon S3 Copy blocks as CSV to customer’s Amazon S3 bucket Ack to message bus Fully stateless
  • 38. Index •  •  •  •  •  •  •  Index Receive message blocks from message bus Cache message block on disk and ack to message bus Add message blocks to Lucene indexes Deal with wildly varying timestamps Flush index shards to Amazon S3 Update meta data database with index shard info Stateful
  • 39. Continuous Query •  •  •  •  •  •  •  CQ Receive message blocks from message bus Evaluate each message against all search expressions Push matching messages into respective pipelines Ack to message bus Flush results periodically for pickup by client Persist checkpoints periodically to Amazon S3 Stateful, with checkpoint recovery
  • 41. Query •  •  •  •  •  •  •  Query Fully distributed streaming query engine Materialize messages matching search expression Push messages through a pipeline of operators First stage – non-aggregation operators Second stage – aggregation operators Present both raw message results as well as aggregates Results update periodically for interactive UI experience
  • 43. Why Deployment Automation •  •  •  •  •  •  •  Add 1 part developers, 1 part Datacenter-as-API, stir… Aim for fully integrated continuous deployment Checkin à unit test à integration test à deployment Jenkins automates it all – using AWS instances Deployment doesn’t mean production Nite à Stag à Long à Prod deployments There are humans involved as well!
  • 44. Automation Enables Scale •  The goal is 100% - accept no less •  Why U need automation –  –  –  –  Number of deployments grows (staging, per-developer) Number of AWS resources per deployment grows Number of operators/developers grows Frequency of deployments, changes increases
  • 45. Current Deployment Stats •  •  •  •  •  •  4 Deployments running 24/7, 50 for development 20+ clusters per deployment 25+ software components deployed Hundreds of instances in production Less than 10 minutes to deploy from scratch Less than 4 minutes to restart hundreds of components
  • 46. dsh: Another AWS deployment tool •  •  •  •  •  Model-driven, describe desired state, run to make it so High performance due to parallelization Covers all layers of the stack – AWS, OS, Sumo Logic Easy to use and extend, scriptable CLI Developer-friendly, Scala-based, high-level APIs
  • 48. Sie Ist Ein Model & Sie Sieht Gut Aus •  Model contains concepts –  Deployment –  Cluster –  AWS Resources (Amazon S3, Amazon Elastic Load Balancing, Amazon DynamoDB, Amazon RDS, etc.) –  Software assemblies –  AWS configuration (IAM users, security groups, etc.) •  Human-readable names: prod-index-5!
  • 51. Differential Deployment •  Start by finding existing resources –  Use tagging where it is available –  Name prefixes (“prod_xxx”) where it isn’t (security groups, IAM, …) •  Fix differences to model –  Start “missing” instances –  Change security group rules, missing IAM users •  Proceed with caution –  Never delete anything that holds data –  Amazon EBS, Amazon DynamoDB, Amazon S3, Amazon RDS
  • 52. Example Of Tag Usage
  • 53. Making It Fast •  Parallelize all the things –  Upload to Amazon S3 while booting instances while creating IAM users while setting up security groups while… –  Hyper-concurrent rolling restarts
  • 55. Making It Fast •  Parallelize all the things –  Upload to Amazon S3 while booting instances while creating IAM users while setting up security groups while… –  Hyper-concurrent rolling restarts •  Fast enough for development –  Write new code or fix a bug, compile locally –  Push code to development deployment and make it live •  Optimize data transfers –  Use Amazon S3 hashes to only transfer new files –  Only upload changed JARs
  • 56. Making It Reliable •  Check prerequisites before you even try –  Does Prod account have room for this many instances? –  Do I have the required permissions for the AWS APIs? –  Any model discrepancies I can’t automatically resolve? Too many Amazon EBS volumes? •  Handle common failures automatically –  No m1.large in us-east-1b? Move Amazon EBS volumes to us-west-1c and try there –  Hitting the AWS API rate limit? Throttle and try again –  SSH didn’t come up on the instance? Kill it and launch another –  Eventual consistency in AWS– query until it has the expected state (tags)
  • 57. Making It Secure •  Different AWS accounts –  Per developer –  Production •  account.xml! –  All credentials for one AWS account (AWS keys, SSH keys) –  Password-protected •  IAM –  One user per Sumo component –  Minimal IAM policy –  Inject AWS credentials •  Security Groups –  Part of the model –  Minimal privileges
  • 58. Making It Safe •  •  •  •  •  Let mistakes happen at most once Add safeguards to prevent operator mistakes Type in the deployment name before deleting anything Disallow risky operations in production (shutdown Prod) Don’t allow –SNAPSHOT code to be deployed in production
  • 59. Making It Easy •  Automate best practices –  Distribute instances over availability zones evenly –  Register instances in Elastic Load Balancing and match AZs to instances –  Tag all resources consistently •  Consistent naming –  Generate SSH with logical names
  • 60. Making It Affordable •  Developers forget to shut stuff down –  Deployment reaper automatically shuts down deployments –  Daily cost emails •  Per-team budgets –  Manager responsible to keep within budget
  • 61. Pitfalls •  •  •  •  Base AMI plus scripted installation prevents auto scaling Security group updates cause TCP disconnects This is fixed in the VPC stack, however Parallelism can cause stampedes (for example, Amazon DynamoDB) •  Tagging API rate limits are easy to hit
  • 63. Loose Coupling In The Large •  •  •  •  •  A deployment is made up of many things Some of these things need to talk to each other Some of these things come and go Don’t pass in a huge list of static dependencies Start each application with one parameter $ bin/receiver prod.service-registry.sumologic.com!
  • 64. Service Registry •  •  •  •  •  •  •  Service Registry is a concept, enables discovery A client-side library accessing a Zookeeper cluster Services are abstracted into types Application provides and consumes different services Sumo Logic services (RPC) Third-party services (message bus) AWS services (Amazon ElastiCache, Amazon RDS)
  • 65. The Perils Of Horizontal Scale •  •  •  •  •  •  •  Scaling out a multi-tenant processing system 1000s of customers, 1000s of machines Parallelism is good, but locality has to be considered 1 customer distributed over 1000 machines is bad No single machine getting enough load for that customer Batches & shards will become too small Metadata and in-memory structures grow out of proportion
  • 66. The Perils Of Horizontal Scale Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index
  • 67. The Perils Of Horizontal Scale 1 1 1 1 1 Index Index Index Index Index 1 1 1 1 1 Index Index Index Index Index 1 1 1 1 1 Index Index Index Index Index 1 1 1 1 1 Index Index Index Index Index 1 1 1 1 1 Index Index Index Index Index
  • 68. The Perils Of Horizontal Scale 1 1 1 1 1 2 1 2 1 2 1 2 1 2 1 Index Index Index Index Index 2 1 2 1 2 1 2 1 2 1 Index Index Index Index Index 2 1 2 1 2 1 2 1 2 1 Index Index Index Index Index 2 1 2 1 2 1 2 1 2 1 Index Index Index Index Index 2 Index 2 Index 2 Index 2 Index 2 Index
  • 69. The Perils Of Horizontal Scale 1 5 1 5 1 5 1 5 1 5 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 4 1 8 5 4 1 8 5 4 1 8 5 4 1 8 5 4 1 8 5 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 4 1 8 5 4 1 8 5 4 1 8 5 4 1 8 5 4 1 8 5 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 4 1 8 5 4 1 8 5 4 1 8 5 4 1 8 5 4 1 8 5 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 4 1 8 5 4 1 8 5 4 1 8 5 4 1 8 5 4 1 8 5 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 2 3 Index 6 7 4 8 4 8 4 8 4 8 4 8
  • 70. The Perils Of Horizontal Scale 1Index 1Index Index Index Index 1Index 1Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index
  • 71. The Perils Of Horizontal Scale 1Index 1Index 2Index 2Index 2Index 1Index 1Index 2Index 2Index 2Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index Index
  • 72. The Perils Of Horizontal Scale 1Index4 3 1Index4 3 2Index5 3 2Index5 3 2Index6 3 1Index4 3 1Index4 3 2Index5 3 2Index5 3 2Index6 3 7 Index 7 Index 5 Index 8 5 Index 8 5Index6 8 7 Index 7 Index 5 Index 8 5 Index 8 5Index6 8 7Index 7Index 5Index 8 5Index 8 5Index6 8
  • 73. Customer Partitioning •  •  •  •  •  Each cluster elects a leader node via Zookeeper Leader runs the partitioning logic Set[Customer], Set[Instance] à Map[Instance, Set[Customer]]! Partitioning written to Zookeeper Example: indexer node knows which customer’s message blocks to pull from message bus
  • 75. Some Tips On AWS S3 •  Use the TransferManager class from the AWS Java SDK –  Multi-part uploads and downloads –  Multi-threaded, overall latency reduction •  Use random prefixes for keynames in Amazon S3 buckets –  Amazon S3 partitions by keyname prefix ! http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html •  Endpoint URL for Amazon S3 –  s3.amazonaws.com might go to Virginia, or Pacific Northwest (!) –  If you are in us-east, use s3-external-1.amazonaws.com instead
  • 76. Elastic Block Store •  RAID-0 makes Amazon EBS faster –  Use LVM RAID-0 if heavy I/O is required –  Align stripe sizes with file system block sizes •  Snapshotting Amazon EBS volumes –  Snapshots eat performance –  Even for volumes with provisioned IOPS •  Overlapping snapshots –  Can be scheduled too close together, like every minute –  I/Os start taking 30+ seconds
  • 78. Somebody Has To Pay For Lunch •  •  •  •  •  On-demand resources are very sexy Automation gives developers their own sandbox Compute is the most easily incurred cost You need an automated reaper Or just raise another round… J
  • 79. Elasticity Is Not An Arbitrary Need •  •  •  •  •  •  •  At least in our system, there’s baseline load At least in our system, the cost is in compute Alert-based scaling can be safe & effective Measure your spend with tools that are out there We actually use Sumo Logic for that! Look for a moving average of resource consumption Buy Reserved Instances, don’t fret the instance types
  • 81. Amazon CloudTrail •  Logs! From AWS! The eagle has landed! •  Amazon CloudTrail logs your API activity to Amazon S3 •  Sumo Logic will read from Amazon S3, allow analysis
  • 82.
  • 83. Please give us your feedback on this presentation BDT401 As a thank you, we will select prize winners daily for completed surveys! Thank You
  • 84. Chart Example Category 4 Category 3 Category 2 Category 1 0% 20% Series 1 40% 60% Axis Title Series 2 Series 3 80% Series 4 100%
  • 85. Powerpoint Guidelines Arial Please do not use gradients, shadows or outlines on shape elements in your presentation.
  • 86. PowerPoint Guidelines When pasting content from another presentation please paste using “Destination Theme” Windows Mac Note: This works when copying entire slides from other presentations as long as the source presentation is also 16:9
  • 87. PowerPoint Guidelines When pasting content Code into a Code template please use the “Keep Text Only Function” If any additional coloring needs to be done to your code type please do it after pasting it into your slide. Windows Mac
  • 88. 68k Assembly Code Sample ; Syntax Test file for 68k Assembly code ; Some comments about this file .D0 00000000 MS 2100 00000002 MM 2000;DI LEA.L $002100,A1 MOVE.L #2,-(A1) BSR $00002050 MM 2050;DI MOVE.L (A1)+,D1 MOVE.L (A1),D2 ADD.L D1,D2
  • 89. Basic text content slide •  With Content –  And more content
  • 90.
  • 92. Slide with two columns
  • 93. Slide with two columns and titles
  • 94. Slide with space for custom content
  • 95.
  • 96. Side Content Description or content with place for image on the right
  • 98. Please give us your feedback on this presentation As a thank you, we will select prize winners daily for completed surveys! Thank You