Building Monitoring Framework
Thnks you Ralali, DevOps Indonesia, IDDevops Member dan para peserta event meetup malam ini
Presentasi bisa di akses di: https://www.slideshare.net/isnuryusuf/devops-indonesia-presentation-monitoring-framework
Video Record bisa di lihat di:
- https://www.youtube.com/watch?v=cyopfqHxMqU
- https://www.youtube.com/watch?v=V_HYxs6IUxM
2. PAGE3
DEVOPS INDONESIA
DEVOPS INDONESIA HOUSE RULES
100% ATTENTION
TAKE NOTES, NOT CALLS
RECEIVE KNOWLEDGE, NOT MESSAGES
MUTE NOTIFICATIONS FOR SLACK QQ WHATSAPP IMESSAGE EMAIL
TELEGRAM SNAPCHAT FACEBOOK WEIBO HANGOUTS VOXER SIGNAL G+
TWITTER VIBER SKYPE WECHAT LINE SMS ...
7. PAGE9
DEVOPS INDONESIA
Background...
One of the biggest challenges facing IT ops teams is the lack of
visibility across the entire infrastructure - physical, virtual and in the
cloud. Making things even more complex, any infrastructure
monitoring solution needs to not only meet the IT team’s needs, but
also the needs of other stakeholders including line of business (LOB)
owners and application developers
8. PAGE10
DEVOPS INDONESIA
Monitoring is essential
• Protecting revenue, brand, and security
• Identification of issues before customers are impacted
• Creating feedback loops and stability
• Gathering information on usage and usability
• Collecting information for future analysis
9. PAGE11
DEVOPS INDONESIA
Comprehensive monitoring strategy
• Monitor the components and the whole system level, component level, and
overall applicationmetrics need to be includedto get the full picture.
• Analyze first and third party performance. Problems with a third party affect the
overall digital experiencejust as much as problems with first party content.
• Measure individual pages and multi-step transactions. Users are visiting more
than a single page, you should be monitoring more than the home page.
• Configure alerts to be notified when performance varies from a baseline. Early
identification of issues can help resolve problems before customers are impacted.
• Compare your performance to competitors or industry leaders. Performance is
relative, you are being compared to other sites on a daily basis, do you know how
you stack up?
10. PAGE12
DEVOPS INDONESIA
Comprehensive monitoring strategy
• Monitor from the viewpoint of your users. Capture metrics from real users to
get the broadest coverage and use those locations to influence where to capture
synthetic measurementsfrom.
• Measure performance across multiple connection types. Performance and
availability can vary widely across connection types include a representative
sample of your users.
• Align metrics with business objectives. Why should others in the organization
care about a metric? Describe how the monitoring data is relevant to objectives
such as increasing customer loyalty, increasing revenue, or reducing costs.
• Re-evaluate your strategy on a regular basis. As your company grows, and your
application changes, your monitoring strategy should be re-evaluated. Are you still
measuring from the geographies that matter? Have new components been
introducedthat need to be monitored?
• Look for the anomalies and outliers. We can learn more from the unexpected
than from the everyday occurrences.
11. PAGE13
DEVOPS INDONESIA
Breaking your strategy
• This first component is collection. Any performance monitoring strategy starts with data
collection. If you can’t monitor it, you can’t manage it. To prevent visibility gaps, your
performance-monitoring platform should be data agnostic, with high frequency polling
down to the second.
• Building the baseline. Once you’ve collected the broadest set of performance data at the
required granularity, it’s time to establish a baseline for every metric you monitor. It’s
imperative to understand what “normal” conditions look like at any given moment, especially
in dynamic virtualized environments. Baselines then become your basis for aneffective
alerting method.
• Setting alerts. In addition to setting static thresholds, it’s important to establish alerts based
on deviation from baseline performance. Beyond a daily alert about high bandwidth usage,
you need to know when an unexpected spike occurs during working hours due to a unique
user-initiated action
12. PAGE14
DEVOPS INDONESIA
Breaking your strategy cont..
• Creating reports. Canned reports reveal most utilized interfaces, highest packet loss and
other key metrics. Yet, they don’t allow for the level of manipulation often required to
troubleshoot performance issues
• Analyzing data. The goal is to find actionable insight needed to proactively detect and avoid
performance events, understand correlations that can help fine-tune infrastructure and
make more informed forecasting decisions about the impact infrastructure has on the
business.
• Sharing results. Once armed with the strategic ability to collect, baseline, alert, report and
analyze your performance data, its time to share insights with team members who can truly
benefit from monitoring results.
15. PAGE17
DEVOPS INDONESIA
Monitoring, Alerting, and Capacity Planning
No Category Allerting SIEM
Services Availability Perfomance Monitoring Capacity Forecasting
Sysops Network
Data
Center
Sysops Network
Data
Center
Sysops Network
Data
Center
Bussiness
/Sales
1
Visual Dashboard &
Monitoring
1
a
2
Public Service URL
Monitoring
1
a
3 Notification Tools
1 1st Layer Notification
2 2nd Layer Notification
3 Management Escalation
Prepare Your Checklist
16. PAGE18
DEVOPS INDONESIA
Data Center Monitoring - element
• Asset configuration and change management
• Know trends in data center for a better capacity planning
• Sensing and monitoring temperature
• Establish precision cooling control
• Fluid and humidity detection
• Integrate the environment with other sensors
• Managing alarms and notifications
• Establish Data center Environmental Monitoring Systems (EMS)
17. PAGE19
DEVOPS INDONESIA
Data Center monitoring best practice
• Testing and Maintenance
• Be ready for emergencies
• Have a backup plan ready
• Have an automated recovery plan
19. PAGE25
DEVOPS INDONESIA
Application and Platform
Application monitoring is a process that
ensures that a software application processes
and performs in an expected manner and
scope. This technique routinely identifies,
measures and evaluates the performance of
an application and provides the means to
isolate and rectify any abnormalities or
shortcomings.
20. PAGE26
DEVOPS INDONESIA
Application and Platform Monitoring Element
• Application response time
• API perfomance
• Service Bus perfomance
• Processing perfomance
21. PAGE27
DEVOPS INDONESIA
Database monitoring
Measuring database attributes to monitor application productivity.
• Get comprehensive insight into the health and performance of
your databases
• Track slow queries, expensive statements, response times, failures,
page faults, Dead lock details and a whole lot of KPIs.
• Monitor, easily identify and solve database issues that impact
application performance.
22. PAGE28
DEVOPS INDONESIA
Microservices monitoring
Modern microservices are displacing
monolithic application stacks, accelerating
development and deployment speed,
simplify scaling and more. With all of its
advantages, a microservices-approach
increases the complexity of monitoring and
troubleshooting applications
23. PAGE29
DEVOPS INDONESIA
Visualize Microservice
Interaction
Monitor’s transparent instrumentation
observes all activity at a system call level.
This helps you instantly see how your
microservices interact and provides key
metrics like response time, network traffic
and resource utilization. Dynamic topology
maps help you identify bottlenecks,
visualize your application flow and drill
down to the process level to understand
what is running and where.
30. PAGE36
DEVOPS INDONESIA
Public Service Monitoring
• Monitoring your public service
from multiple geo location
• Monitoring SSL expiration
and domain expiration
• Website Uptime &
Performance Monitoring
33. PAGE39
DEVOPS INDONESIA
Security Monitoring
• Intrusion Detection - Detect threats and suspicious activities early
with host, network, and cloud IDS.
• Vulnerability Assessment - Identify vulnerabilities and AWS
configuration issues that put your organization at risk.
• Event Correlation - Automate event correlation and security analysis
with AlienVault Threat Intelligence.
• Log Management - Automate log collection and analysis and securely
store raw logs in the AlienVault Cloud.
• Compliance Reporting - Be audit-ready sooner with pre-built
compliance reporting templates.
36. PAGE42
DEVOPS INDONESIA
Paging, Alert and Notification
When you are auditing or writing alerting rules, consider these things to keep your
oncall rotation happier:
• Pages should be urgent, important,actionable,and real.
• They should represent either ongoing or imminent problems with your service.
• Err on the side of removing noisy alerts – over monitoring is a harder problem to
solve than under monitoring.
• You should almost always be able to classify the problem into one of: availability &
basic functionality; latency; correctness (completeness, freshness and durability of
data); and feature specific problems.
• Symptoms are a better way to capture more problems more comprehensively and
robustly with less effort