Meeting the demands of everchanging IT management and security requirements means evolving both how you respond to and resolve incidents. It’s critical for organizations to adopt a scalable DevOps solution that integrates with their current monitoring systems to enable collaboration across development and operations teams, reducing the mean time to resolution. PagerDuty works with AWS services like Amazon CloudWatch, to provide rapid incident response with rich, contextual details that allow you to analyze trends and monitor the performance of your applications and AWS environment.
2. DevOps on the AWS Cloud
Thomas Robinson, Solutions Architect, AWS
3. Traditional Development Models are Obsolete
Business is increasingly software-driven
End-users expect both continuous improvement and
stability from applications
IT needs to be able to provision infrastructure as rapidly as
developers demand it
An organization’s pace of innovation is largely constrained by
their ability to develop applications
4. DevOps Can Help
Increase Decrease
Length of development cycles
Time to market
Deployment failures and rollbacks
Time to recover upon failure
Operational overhead
DevOps practices enable companies to innovate at a higher
velocity for customers
Business agility
Application stability
Ability to meet customer demand
Time spent on innovation
Security
5. Infrastructure
as Code
Microservices Logging and
Monitoring
Continuous Integration/
Continuous Delivery
DevOps on AWS
AWS provides on-demand infrastructure resources and tooling built to
enable common DevOps practices
6. Provision the server, storage, and networking capacity
you need on demand
Deploy independently, as a single service, or a group
of services
Make configuration changes repeatable and standardized
Build custom templates to provision resources in a
controlled and predictable way
Use version control to keep track of all changes made to
your infrastructure and application stack
Infrastructure as Code
Replace traditional infrastructure provisioning and management with
code-based techniques
7. Build services around the business capabilities you require
Scale up and down as required with virtually no notice
Make configuration code changes repeatable
and standardized
API-driven model enables management of infrastructure
with language typically used in application code
Free developers from manually configuring operating
systems, system applications, and server software
Microservices
Build applications as a set of small services that communicates with other
services through APIs
8. Maintain visibility and auditability of activity in your
application infrastructure
Assess how application and infrastructure performance
impact end-user experience
Gain insight into the root causes of problems or
unexpected changes
Support services that must be available 24/7 as a result
of continuous integration/ continuous delivery
Create alerts based on thresholds you define
Logging and Monitoring
Capture, categorize, and analyze data and logs generated by applications
and infrastructure
9. Model and visualize your own custom release workflow
Automate deployments of new code
Improve developer productivity and deliver updates faster
Find and address bugs quicker with more frequent and
comprehensive testing
Store anything from source code to binaries using
existing Git tools
Continuous Integration and Continuous Delivery
Rapidly and reliably build, test, and deploy your applications, while
improving quality and reducing time to market.
10. Benefits of DevOps on AWS
Get started quickly
and pay as you go
Automate systems
operations
Scale without
infrastructure constraints
Improve visibility
and security
Leverage fully
managed services
12. PagerDuty At-a-Glance
Trusted by over 8500
Organizations
50 of the Fortune 100 Global Community
80 Countries
200,000+ UsersFounded in 2009 Based in San Francisco Cloud-based incident resolution
190+ Native Integrations
14. PagerDuty Manages the Complexity
Tools Process People
Collaboration/Resolution
Deployment Tools
Monitoring Tools
App
System
Log
Web
Mobile App
Ticketing Tools
Public Cloud Services
On Call Scheduling
Automatic Escalations
System and User
Efficiency
Developer
NOC
Help Desk
IT OPS
15. Triage Notify Mobilize Collaborate Resolve Learn
Identify What’s
Wrong
Commercial response that
engages the business
Visibility that leads to
operations command
PagerDuty is Built on Best Practice Workflows
Get on it Mobilize the
Experts
Diagnose the
Problem
Quick Problem
Resolution
Optimize and
Prevent
16. Lower Costs
Leverage your development and
operations resources more efficiently
Increase Revenue Growth
Deliver customer experiences more
readily and reliably
Manage Your IT Transition
PagerDuty can help you move to a
more agile full-service ownership
practice to deliver better results
Unleash Your Developers
Our platform helps developers deliver
value more quickly and ensures
maximium reliability
Get More From Your Existing Platforms
PagerDuty provides full stack visibility to help
you optimize your toolchain
How Can PagerDuty Help?
17. With PagerDuty, we
spend less time
worrying about on-call,
and more time creating
product to impact lives.
- Panasonic
18. How Can PagerDuty Help?
Improvement in MTTR- PICNIC
Achieved 99.9% uptime
- Pantheon
500%
Improvement
For every product
- Jepperson
100%
On-time delivery
19. Use Case 2:
Operationalize and Monitor
AWS Environments
Use Case 3:
Accelerate Migration to AWS
Use Case 1:
Response Workflows and
Orchestration
Common AWS-PagerDuty Use Cases
20. Use Case 1: Response Workflows and Orchestration
Leverage ChatOps Tools
Integrate with Ticketing Systems
Configure Workflows
21. Identify patterns, trends,
and anomalies
View data
Monitor infrastructure health
Use Case 2: Operationalize and Monitor
AWS Environments
22. Use Case 3: Accelerate Migration to AWS
Create alarms to monitor any
Amazon CloudWatch
Initiate event and
suppression rules
Automate IT incident
response workflows
24. • SaaS based infrastructure and app monitoring
• Open Source Agent with 200+ integrations
• Time series data (metrics and events) and Tracing (APM)
• Processing trillions of data points per day
• Intelligent and Actionable Alerting
• Insightful Dashboards
• We’re hiring! (www.datadoghq.com/careers/)
Datadog Overview
25. Challenges
Building a product while ramping up
number of people involved running it
Increase in the number of services
while shifting incident manage
responsibility out to teams
Alert fatigue
Global growth while maintaining
high reliability expectations
26. How Datadog uses AWS
API focused to allow
custom tooling
Scale up and down as
need for capacity
Integration Dogfooding
27. Why Datadog choose AWS
ScalabilityBorn in the cloud Leverage breadth and
flexibility of AWS
Reliability
28. SRE Team Growth
Dedicated SRE Team
runs stable services
in production
Early days of
Datadog (2010) -
everyone is on-call
As company grows
team leads + directors +
senior engineers on-call
Broader Engineering gets
involved, team based on-call
32. Why Datadog Chose PagerDuty
Strong API’s
Easy to get started, automation friendly
Scales with growth of teams, company, customers
Makes custom tooling and analytics trivial
Great integration partnership
Extensive alerting (incident resolution lifecycle) capabilities
Layering teams
No worries about country specific telco knowledge
Robust monitoring analytics
Allows us to look into patterns and deal with alert fatigue
33. Able to efficiently scale operations from tens of engineers to hundreds
Improved productivity through custom alerts and escalation policies
Reduced alert fatigue
Continuously improving customer experience via sanely managed global on-call coverage
Benefits