Monitoring and making sense of infrastructure data can be an arduous process. Managing a volume of API calls from more than one million active users every minute presents an even more complex and demanding challenge. Using Amazon Web Services (AWS) and Datadog, Grindr overcame a series of infrastructure challenges by both implementing and managing highly scalable, high availability, and top performing infrastructure, as well as aggregating, analyzing, and acting on key infrastructure data KPIs.
2. Traditional development models are obsolete
§ Business is increasingly software-driven
§ End-users expect both continuous improvement and stability from
applications
§ IT needs to be able to provision infrastructure as rapidly as developers
demand it
§ An organization’s pace of innovation is largely constrained by their
ability to develop applications
3. Increase
§ Business agility
§ Application stability
§ Ability to meet customer
demand
§ Time spent on innovation
§ Security
Decrease
§ Length of development cycles
§ Time to market
§ Deployment failures and
rollbacks
§ Time to recover upon failure
DevOps can help
DevOps practices enable companies to innovate at a higher velocity
for customers
4. Infrastructure
as Code
Microservices Logging and
Monitoring
Continuous Integration/
Continuous Delivery
DevOps on AWS
AWS provides on-demand infrastructure resources and tooling built to
enable common DevOps practices
5. § Provision the server, storage, and networking capacity you
need on demand
§ Deploy independently, as a single service, or a group of
services
§ Make configuration changes repeatable and standardized
§ Build custom templates to provision resources in a controlled
and predictable way
§ Use version control to keep track of all changes made to your
infrastructure and application stack
Infrastructure as Code
Replace traditional infrastructure provisioning and management with
code-based techniques
6. § Build services around the business capabilities you require
§ Scale up and down as required with virtually no notice
§ Make configuration code changes repeatable and
standardized
§ API-driven model enables management of infrastructure
with language typically used in application code
§ Free developers from manually configuring operating
systems, system applications, and server software
Microservices
Build applications as a set of small services that communicates with other
services through APIs
7. § Maintain visibility and auditability of activity in your
application infrastructure
§ Assess how application and infrastructure performance
impact end-user experience
§ Gain insight into the root causes of problems or
unexpected changes
§ Support services that must be available 24/7 as a result of
continuous integration/ continuous delivery
§ Create alerts based on thresholds you define
Logging and Monitoring
Capture, categorize, and analyze data and logs generated by
applications and infrastructure
8. § Model and visualize your own custom release workflow
§ Automate deployments of new code
§ Improve developer productivity and deliver updates faster
§ Find and address bugs quicker with more frequent and
comprehensive testing
§ Store anything from source code to binaries using existing
Git tools
Continuous Integration and Continuous Delivery
Rapidly and reliably build, test, and deploy your applications, while
improving quality and reducing time to market.
9. Get started quickly
and pay as you go
Automate systems
operations
Scale without
infrastructure constraints
Improve visibility
and security
Leverage fully
managed services
Benefits of DevOps on AWS
10. Get started quickly
and pay as you go
Automate systems
operations
Scale without
infrastructure
constraints
Improve visibility
and security
Leverage fully
managed services
Benefits of DevOps on AWS
12. Marc Bittner Marc is a Site Reliability Engineering Lead at Grindr and brings with him 9 years of
experience in backend architecture and DevOps engineering. Leveraging cutting edge geospatial,
machine learning, and big data technologies, Grindr’s engineering powerhouse enables the delivery of
highly personalized experiences to more than 1,000,000 concurrently active users across 196 countries
Ilan Rabinovitch Ilan Rabinovitch is Director of Technical Community at Datadog. Prior to joining
Datadog, Ilan spent a number of years leading infrastructure and reliability engineering teams at
organizations such as Ooyala and Edmunds.com. In addition to his work at Datadog, he is active in the
open-source and DevOps communities, where he is a co-organizer of events such as SCALE, Texas
Linux Fest, DevOpsDay LA and DevOpsDays Silicon Valley.
Speakers
Michael Ruiz Mike Ruiz is an AWS Solutions Architect with more than 20 years of industry experience
spanning healthcare, enterprise, public sector, hybrid cloud, and extreme scale mobile/gaming verticals.
Mike is a passionate technologist who currently focuses his energies on the technical enablement of AWS
partner network members ranging from garage based startups to Fortune 100 companies, driving higher
quality solution delivery and superior customer experiences through the use of AWS Cloud technologies.
13. We are a Los Angeles-based-
company of upstarts, rebels, and
techies dedicated to finding new
ways for gay men to connect. Our
application, Grindr, uses creative
technology to help users make new
friends based on their geo location,
similar interests and traits.
About Grindr
15. Some Stats
10,000
Geospatial database operations
per second
300,000
Profile images uploaded per day
3,000,000
Chat images exchanged per day
196
Active countries
16. 196 >40M 80%
Downloads across iOS
and Android
Of user base comes from
North America, Europe and
South America
Countries around the world
Millions of users
Using AWS for highly
scalable, highly available
cloud architecture
Establish regional
data centers
Establish high availability,
federated XMPP cluster
17. Monitoring Challenges:
10k – 20k
API calls per second
85MM
Chat Messages per day
200MM
Chat Images exchanged per day
300,000
Profile Image uploads per day
18. “We pride ourselves in enabling outstanding
responsiveness to our users.
In order to successfully run a highly scalable
and available cloud architecture, we need
the right KPIs on potential issues within very
short lead times.”
-Marc Bittner MARC BITTNER
19. Before Datadog
§ How do we find the right KPIs that
truly matter?
§ How do we avoid monitoring fatigue?
§ How can we work with leading indicators
(not trailing indicators)?
20. After Datadog
§ Auto discovering metrics
§ Optimizing dashboards and
thresholds without
deployment changes
§ Leading indicators to solve
issues on the fly
24. Monitoring 101: tl;dr Edition
Monitoring 101: Alerting on what matters
Series / theory / alerting / monitoring / monitoring 101
More Details at:
http://www.datadoghq.com/blog/monitoring-101-alerting/
40. Asking Better Questions
“Monitor all containers running image web
in region us-west-2 across all availability zones
that use more than 1.5x the average memory
on c3.xlarge”
41. Resources
§ Monitoring 101: Alerting
https://www.datadoghq.com/blog/monitoring-101-alerting/
§ Monitoring 101: Collecting the Right Data
https://www.datadoghq.com/blog/monitoring-101-collecting-data/
§ Monitoring 101: Investigating performance issues
https://www.datadoghq.com/blog/monitoring-101-investigation/
§ The Power of Tagged Metrics
https://www.datadoghq.com/blog/the-power-of-tagged-metrics/