This document discusses the DevOps lifecycle and practices for improving software development and delivery. It outlines several DevOps models including WebOps, NoOps, and Enterprise DevOps. Key challenges discussed include long release cycles, lack of visibility, inconsistent incident tracking. Solutions proposed are reducing mean time to detect and repair issues through improved monitoring, diagnostics, and integration between development and operations teams. Continuous learning is also emphasized to prioritize investments based on data from customer usage and application health.
DevOps accelerates full lifecycle software development and delivery
1.
2. App Lifecycle
DevOps is a full lifecycle
investment
DevOps is a team
undertaking
DevOps enables better
software development and
delivery practices
DevOps accelerates the
last mile of continuous
delivery
3. REQUIREMENTS
BACKLOG
The agile
Methodologies are
accelerating the
construction process
Determine
next set of
investments
based on
learnings
Disconnects between
Development and
Operations increase
mistakes and MTTR
when issues occur
Current ITLM/ITSM “best
practices” made the release
and operate processes
reliable, but not agile
5. Increase deployment
frequency
Reduce change lead-time
(react faster to dynamic
business needs)
Agility performance indicators
Reduce change fail rate
Reduce
Mean-Time-ToDetect & Repair
(MTTD, MTTR)
Reliability performance indicators
6. REQUIREMENTS
BACKLOG
No actionable and
contextual info to
resolve incidents
Prioritize and validate
investments based on
qualitative and quantitative
data.
Inconsistent tracking and
management of incidents
across teams and tools
Quickly detect and
resolve application
issues.
Inconsistent and
chaotic releases
7. Problems
Solutions
Value
Shift from long release cycles to
monthly, or even daily, without
adding unnecessary risks.
Incident management workflows
to integrate development and
operations
Continuous delivery value
Visibility to the release pipeline to
set customer expectations about
when features or fixes go live.
Actionable production
diagnostics
Proper tracking, managing and
approval of releases.
Role-based tools
Consistency, transparency and
traceability for all releases
10. End users
How do I know I have a problem?
?
How do I isolate the problem?
How do I diagnose the problem?
Web servers
Data servers
Application servers
11. Reduce Mean Time to Detect (MTTD)
Problems
Solutions
Value
Visibility to application health
360 degree view to your
application health, with relevant
metrics to help you identify
issues in production.
Minimize outages and customer
impact.
Visibility to application outages
to minimize customer impact.
Automatic Alerts whenever your
application is not responding
according to your thresholds
28. •
Take advantage of the
Load Testing on the cloud
to make sure that your
application can withstand
the load..
29. Check the pulse
of your
application
•
Get 360 degree view to
application’s health with
relevant metrics to help
you detect issues in
production with
Application Insights.
Set up a view of
your application
health with
metrics that you
care
Identify issues
and patterns with
your application
in production
31. Production incident alert
in operations system
•
Automatic Alerts whenever
your application is not
responding according to
your thresholds.
Automatic alerts
with relevant
Contextual
information
32. Reduce Mean Time to Repair (MTTR)
Problems
Solutions
Value
Uncover root cause of production
issues
Detect if the problem is your
code or your dependencies.
Low Mean Time to Repair (MTTR)
Quickly resolve code problems
detected in production
Incident management workflows
to integrate developers and
operations.
Improved communication between
dev and ops teams
Actionable production
diagnostics
Role-based tools
Better information to users
Increased user satisfaction
33. Understand what
failed and why by
drilling down into
failed tests
•
Detect if the problem is your
code or your dependencies
34. Production incident alert
in operations system
Assign code related incidents to development
•
Assign production incidents
from System Center to the
Development team in Visual
Studio for investigation and
resolution.
Development system incident
reference in operations tool
Incident in development system
35. Request additional diagnostics from operations
•
Get actionable production
diagnostics.
Request for additional diagnostics in operations system
36. Generate IntelliTrace logs from within SCOM
•
Get actionable production
diagnostics.
IntelliTrace logs
collected by operations
in development system
Actionable debugging
using IntelliTrace logs
38. Valuable data at your
finger tips
•
Identify systemic issues and
trends affecting application and
infrastructure health
•
Prioritize new features, bug
fixes and strategic direction
based on qualitative and
quantitative data
Validate your investments
DevOps is a relative new term, people refer to individual capabilities to automate the release pipeline as DevOps. However, DevOps is more than that is increasing the scope of agility and should be view as a team undertaking. It requires teams to look at their full lifecycle investments. At its core DevOps enables better software development and enables delivery, accelerating last mile of continuous delivery.
What is driving DevOps? The agile methodologies are accelerating the construction process and creating a significant pressure to Operations teams to update their existing practices to make enable faster cadence, in other words changing and adopting existing process to not only be reliable but also support and agile cadence.In our internal teams and with some of our enterprise customers we noticed that once a team is able to accelerate the construction phase consistently. The next evolution process is to ensure such applications remind available and performing as expected. And when they aren’t having access to information and tools that allow both Operations and development teams to diagnostic and fix issues quickly.Then these teams become more sophisticated and mature and then they want to have access to customer usage information to use quantitative and qualitative data to help them determine the next set of investments and enable continuous learning.
We observe three DevOps flavors:WebOps: companies and teams that have high levels of automation and deliver incremental updates and value very frequently (often hourly. xBox live or Big are good internal examples).NoOps: applies to small teams or start-up teams where there isn’t a dedicated operations team, instead the developers perform operational work.And Enterprise DevOps: where there are dedicated Operations and Development teams, driving the need for great team collaboration.
Companies looking at implementing DevOps practices are balancing two important performance indicators. Agility: their ability to increase deployment frequency and reduce change lead time to react to dynamic business needs.And Reliability: reducing change fail rate and reduce the time to take them to detect and repair production issues. These are very hard to balance metrics and create friction across teams.
When looking at these friction and challenging points we identify 5 top impediments for DevOpsInconsistent and chaotic releases: how to shift from quarterly or monthly release to a more frequent release cadence like daily for example. When you have multiple teams releasing daily it is hard to keep track of what is going to production and who approved it. Quickly detect and resolve application issues: as the team increases their cadence and components run in hybrid environments it becomes more difficult to diagnostic issues in production without proper tools that facilitates this for developers. Inconsistent tracking and management of incidents across teams and tools: Developers and operations use their own tools to manage their own work, while this tools serve different purpose they need to be integrated so there is consistency traceability and transparency around managing incidents, tools that enable collaboration without adding unnecessary overhead.Prioritize and validate investments based on qualitative and quantitative data: allowing teams to be in continuous learning mode.No actionable and contextual info to resolve incidents: it is often the case that production is a unique environment and reproducing issues using pre-production environments could be challenging. To remove friction and increase efficiency, developers need access to rich diagnostics and information that allow them to resolve production issues quickly.So let’s take a look at each area, and talk about problems, solutions and customer value…
How to manage multiple release.
If your are in an continuous agile release cadence, your teams are building, deploying and validating features multiple times a day. This could be chaotic, as the number of features or team size increase. You need to be able to bring consistency, and make sure those process are repeatable (automated) and traceable.
Release Management does 3 main things:It automates deployments directly from TFS to all the environments, including production. Part of the deployment procedure may include things like taking back-ups, generating test data, provisioning Servers on Azure or executing your automated tests. Basically everything you need to start working on a given stage.Ensures that all deployments are done the same way from the same binaries, so that by the time you deploy your application to production, your deployment procedure has been tested over and over - removing a lot of those release related risks & headachesAutomates the approval workflow through all the environments reducing delays and coordination issues to a minimum. Testers receive a notification when a new version is ready for them so they can either confirm that the application meets the stage requirements or stop the release of that specific version.Along the way, Release Management will provide tracking of each attempted release. attempted release.
Is my application working…?
Instructor Note:Do demo.
Management packs tell System Center how to extract specific metrics or enable tasks particular to a given system or productDownload management packs from the online galleryThen, import them for use
From the Authoring tab, select a management pack (e.g. .NET, Windows Service, web transaction, etc.)
Find and select the applications to monitor across your managed devices
Apply an environment tag to selected objects to separate production environment monitors from staging, QA, etc.
Set up some basic monitoring or choose to use the advanced settings. Here, I’ve enabled monitoring for exceptions and performance events crossing my 15 second performance threshold. I’ve also said that I want to configure client-side monitoring.If you selected multiple applications in the previous pages, here’s where you could give them different monitor settings and/or select different exceptions to watch for.
This page allows you to choose how to report performance and exception events from the client. This injects JavaScript into your application pages.
On the Monitoring tab, you can choose a particular view, like this Active Alerts, and select individual alerts to get more details.
With an alert selected, the Operations person can click Health Explorer which gives them a description of the problem, possible causes and resolutions. Create your own knowledge articles to help others resolve similar problems in the future.
You can get more details on the performance event. For instance, here you can see the slowest node was the ProductList page which took 15+ seconds with almost 3 seconds of that being a SQL query. You can even see the parameter value! You can hover over the duration numbers to get the start and end times as well. Best of all – NO special code to return that to the monitor!!
This view shows all of the problems (performance and exceptions) reported by the entire application.
The Distributed Chains view can monitor a transaction across components. Here you can see the transaction from the end-users browser through to the web server and then to the web services layer.
Load Testing on the cloud
To reduce mean time to detect (MTTD) you need access to dashboards that help you detect issues in production before your customer know. With Application Insights, you can set up a view of your application’s health with metrics that are important for you and your business.
System Center, backed by Visual Studio, allows your operations team to monitor your product in production and flag any issues that need attention from developers.
Receive alerts when an application isn’t performing according to your thresholds.
If my application is not performing, why?
By reusing your test scripts you can detect if the problem is your code or your dependencies.
Operations is empowered to create work items in Visual Studio and assign them to developers, ensuring that communication flows quickly and easily between the two teams. The ops team can collect data and attach it to work items for use by the dev team.
Communication is a two way street. Just as operations can use TFS to communicate with developers, the dev team can use TFS to request information, logs and other diagnostic data from operations. As developers work through the issues, TFS keeps operations informed of the status so everyone is always on the same page. Each tool is role-specific allowing dev and ops to perform their individualized work, but both are backed by TFS ensuring transparency and collaboration.
Communication is a two way street. Just as operations can create a work item in Visual Studio to communicate with developers, the dev team can use Visual Studio to request additional information, logs and other diagnostic data from operations. As developers work through the issues, Visual Studio keeps operations informed of the status so everyone is always on the same page. Each tool is role-specific allowing dev and ops to perform their individualized work, but both are backed by TFS ensuring transparency and collaboration.
Too often, determining the next set of investments is difficult. Without insights into what customers are REALLY doing and experiencing, planning becomes mostly guessing - increasing the probability of making poor decisions.By having visibility to usage data, companies can prioritize and validate investment decisions allowing them to continuous learning
By having access to data from different data points into one dashboard, development teams can identify systemic issues and trends affecting applications and overall infrastructure healthThey can learn from usage data to make informed decisions based on qualitative and quantitative data
By implementing DevOps principles and tools, teams can increase efficiency, lower operational costs, increase quality and ensure compliance.