Se você está nesta sala e está interessado em realmente vir para a nuvem, você é um líder deste assunto na sua organização. Você tem que ser capaz de se comunicar e argumentar sobre os benefícios da nuvem.
Fala-se muito sobre agilidade, inovação, segurança. Mas não adianta apenas entender tecnicamente como fazer as coisas, mas sim mostrar o valor disto para sua organização
https://pt.wikipedia.org/wiki/Yo-Yo_Ma
This might seem like an unusual quote to start with, hopefully I’ll explain shortly how this relates to Operations…
We hear a lot about the benefits enabled by the use of Cloud,
Innovation, Agility, Security, Reliability, but there’s much more to getting those benefits than just signing up for the service. Along with the migration to Cloud comes the requirement to align the way your organization works with the new environment, to really embed that agility and flexibility in the way we work.
So we're going to be talking about how the way that we approach our first steps to Cloud Migration should be viewed through the lens of a change in the way we work. Today we’re going to give you a number of practical recommendations about how you should lay out your AWS accounts, some principles around good Cloud Architecture, and we’re also going to talk about how these new processes and principles interact with the culture in your teams and what we can do to ensure that our journey to cloud is well understood by our colleagues across our organisations.
And this is where I want to come back to the quote, in order to make this journey successfully we don’t just need to be technically able to implement the changes. That doesn’t get us where we need to go. At the same time as building our Landing Zone and planning our migration we also need to be preparing our organization for our new way of working so that we can take proper advantage of what Cloud has to offer and in order to do this we need to think a little bit like musicians. We need to understand our goals and methods well enough, and communicate them well enough, that we can enthuse those around us and build the environments that will allow us to succeed.
JL to open…
Thank you for joining us today, thanks for taking the time to come and listen, hopefully we can learn something together!
Our aim is simple today. To better understand what OE looks like for you and how to plan for success
We’re going to have a look at some Principles around how we do Ops in the Cloud. Then we’ll spend a little bit of time talking about what we’ve called Organisational Architecture, and what we mean by this is the way our organisations do things, and some of the practical things we can do to create an appropriate culture and processes to support Operational Excellence. Then we’re going to finish up with some recommendations for your Landing Zone, how we’re going to lay out your accounts to provide reliability, monitoring and ease of management.
Paul 4-11
You start your journey with one account
You start your journey with one account
You start your journey with one account
You start your journey with one account
You start your journey with one account
E de repente isto começa a ficar muito complicado
Most customers are somewhere in between.
But it’s a spectrum and you need to lay the foundations for where you will live at different parts of your journey
Talk about the basic requirements
James 12-16
Organisational structures change as your business grows
Mergers, movement of responsibilities, changes in finances all affect how teams and Bus interact
Contas são isoladas umas das outras
As you grow you applications will also bump into each other.
For some workloads isolation reduces the blast-radius of an event
preventing a knock on effect to other services in your stack
implementar controles e políticas de segurança de forma independente
Atomic security controls mean additional separation of duty
Allows for a second pair of eyes to see what’s happening more easily
Dev, Test, Projetos diferentes, centros de custo diferentes, etc
Esta é uma visibilidade que é muito difícil de conseguir estando-se on-premises, por conta de recursos compartilhados
Accounts can reflect business processes and procedures more readily
Billing can be complex in some organisations.
Separating out by AWS accounts means that you contain who spends what and where
Deep insights are easier to access
Paul 17-18
Read list
Você tem que pensar como as contas se comunicam, se é que se comunicam. São múltiplas contas para serem gerenciadas e orquestradas.
Mas todos estes problemas podem sem abordados com automação. E automação é onde queremos estar.
Read list
Cons are a good thing because they are addressed by automation and that is where you want to be.
James 19-22
Well-Architected is:
A consistent approach to reviewing architectures
It provides you with best practices and design principles for building cloud-native architectures
By using the Well-Architected framework, you will both understand the risks in their architecture and ways to mitigate them
Ideally through this education you will Influence your future architectures for the better
The Operational Excellence pillar focuses on establishing the foundations that will enable long term operations success
Operational Excellence was made the first well-architected pillar in 2017 because its foundational nature influences decisions made in the other 4 pillars: Security, Reliability, Performance, and Cost Optimization
Fazer isto é limitar a quantidade de erros humanos
E também permitir a execução de tarefas com base em triggers/eventos -> operações orientadas a eventos
Vários casos de clientes que reduzem processos de semanas para dias, e os de dias para horas. Pense em tudo o que você pode fazer com este tempo que agora resta para o seu negócio.
Let’s talk about the design principles of Operational Excellence
The perspective that Well-Architected and OE takes is from the workload down through operations.
What WA and OE are trying to achieve is to:
set the customer up for long term operations success through the focus areas
and to get the customer to think differently about operations in the cloud through 5 design principles
Perform Operations as Code
limiting human error
and enabling operations procedures to be automatically executed in response to triggering events in the environment
A UK Government agency (here at the Summit, and Paul’s customer) has reduced deployment times from 2 days to an hour, with End-to-end environment build reduced from 4 weeks to 2 days, and Dev environments going from 2 weeks to 4 hours.
All delivered as code
All bringing value into the platform much more quickly.
Annotate documentation
which enables human and system generated documentation to be used as an input to and output from automated processes
Make frequent small reversible changes
increasing the flow of beneficial changes into the environment
and enabling identification and resolution of issues introduced through change
Getting value in to the environment faster
Shrinking risk
As you use operations procedures, look for opportunities to improve them. As you evolve your workload, evolve your procedures appropriately. Set up regular Game Days to review and validate that all procedures are effective and that teams are familiar with them
Refine operations procedures frequently
to maximize the benefits of experience and learning and to increase the effectiveness and efficiency of operations
Share insights from the execution of operations
Allows value to get into the value as quickly as possible
This has a cultural impact
Perform “pre-mortem” exercises to identify potential sources of failure so that they can be removed or mitigated. Test your failure scenarios and validate your understanding of their impact. Test your response procedures to ensure they are effective and that teams are familiar with their execution. Set up regular Game Days to test workload and team responses to simulated events.
Anticipate failure
so that sources of failure can be removed where
so that procedures can be developed to respond to failure
and so that teams can practice responding to failure so that when it happens they are prepared
We call it Pre-mortem
Process to respond
Game Day
Team building
Drive improvement through lessons learned from all operational events and failures. Share what is learned across teams and through the entire organization.
Learn from all operational failures
These previous principles
Talk about 5 whys
Operations teams need to understand the business and customer needs to effectively and efficiently support business outcomes.
Everything continues to change—your business context, business priorities, customer needs, etc. —so it’s important to design operations to support evolution over time in response to change.
Your teams need to have a shared understanding of your entire workload, their role in it, and shared business goals in order to set the priorities that will enable business success
You also need to consider external regulatory and compliance requirements that may influence your priorities
You Operate
On AWS, you can create temporary duplicates of environments, lowering the risk, effort, and cost of experimentation and testing
Your teams need to have a shared understanding of your entire workload, their role in it, and shared business goals in order to set the priorities that will enable business success.
Don’t forget external regulatory and compliance requirements that may influence your priorities.
Use your priorities to focus your operational improvement efforts where they will have the greatest impact
For example, developing team skills, improving workload performance, automating runbooks, or enhancing monitoring.
Update your priorities as needs change.
AWS can help you educate your teams about AWS and its services to increase their understanding of how operations choices can have an impact on your workload.
The design of your workload should include how it will be deployed, updated, and operated.
You will want to implement engineering practices that align with defect reduction, and quick and safe fixes.
To understand what is happening inside your architecture, you will need to enable observation with logging, instrumentation, and insightful business and technical metrics. Not just logging, but creating actionable information.
This should be as natural as any other part of your design process.
In AWS, you can view your entire workload (applications, infrastructure, policy, governance, and operations) as code. It can all be defined in, and updated by, using code.
Click….
This means you can apply the same engineering discipline that you use for application code to every element of your stack.
You should apply metadata using tags to enable identification of your resources for operations activities.
Tags help in so many situations, use them as an added dimension to your estate.
Ensure that you publish business metrics as well as technical metrics because these will help you understand your user or customers’ behaviours and their impact on your estate
You should use a consistent process, automate where you can. Have runbooks, playbooks, and checklists. This way you know when you are ready to go live with your workload.
Runbooks document your routine activities
Playbooks guide your processes for issue resolution.
Think about people factors, you need to have enough team members to cover all your operational activities. And this includes on-call.
Think Cloud Adoption Framework
Click….
Training is a critical part of this story. Training on AWS, training on your workload, and training on your operations tools.
AWS allows you to treat your operations as code, script your runbook and playbook activities to reduce the risk of human error.
Don’t forget to tag, tag, tag.
Don’t forget that you can spin up environments on demand. This means that you can thoroughly test both applications and processes.
In this space Game Days are essential aids. This has a big input into how people behave
James 35-48
I want to run through some key tenets that can help you make decisions around how to go about working on your organisations culture and values.
DevOps is about people, to a large degree we cannot force or proceduralise the changes required, but we can work to ensure that the right conditions exist within the organization to allow this culture to develop. Focusing on the service you are trying to deliver, demonstrating a willingness to change process as well as technical design, working with touchpoints from other teams to streamline handovers, these are all great ways to start building the right kind of environment where you’ll be able to really start to deliver some progress. Creating a safe environment in which to experiment is key, failures are lessons from which we can gain. <Kay to add example> Ensuring that staff feel “safe” to fail is one of the first steps towards encouraging experimentation and creativity.
There are some technical things we can do here too, make sure everything is in Cloudformation so you can build-up and tear-down quickly and try new ideas cheaply and safely. Use techniques like Blue/Green deployments to keep your Production environment agile.
And do the same with your culture, experiment, be open to change and remember ”everything fails all of the time” applies here too! Don’t be discouraged if things don’t go entirely according to plan.
Often, we will find that our new Agile processes will run up against organisational obstacles from non-tech teams who (very unreasonably in my opinion ) have their own processes and practices that need to be adhered to. We need to pre-empt this by ensuring that all teams who have a stake in the services that we operate are as informed and up to date as we are. Run education sessions on basic cloud principles for these teams (AWS Business Essentials), bring them on board and make them part of the team! If your supporting teams are on board they’ll be much more understanding when you really need to move fast. Remember to think about the whole organisation when considering training opportunities.
As I’ve mentioned previously, organizational culture is all about people, and people love silos We all feel most comfortable with those we work with often and know well, but being the individual who is willing to stick their neck out and go meet the other team is catching…
Silos are endemic in most organisations, whether between departments, teams or in some cases individuals in the same team
When trying to break down these silos the importance of bringing silo’d teams together cannot be overstated. Take advantage of every opportunity to schedule joint meetings, rather than debating which team a certain task should sit with over email or Slack, get teams together face to face. We’ve actually done this with some fairly geographically dispersed teams, I won’t name any names here for obvious reasons, but teams on different continents in some cases and the results have been absolutely amazing. Simply because the teams (who had never met) found it much easier to begin to understand each others frustrations when meeting face to face.
If you’ve been tasked with modernizing your Operations function then a large part of your time should be devoted to ensuring that teams have as much cross-organizational contact as possible, more contact fosters greater understanding of the wider issues within the business and will improve teams ability to prioritise based on business need. Again, there’s some good technical things we can do here as well, AWS Services are built around this model and things like sharing AMIs cross account, centralized document repositories and cross-organization config are all good things to start people working in a collaborative fashion.
Now, whenever you go into an organisation with these aims there are always variable levels of enthusiasm for the changes. It’s important to remember that your most vocal detractors are in a way some of your biggest allies. DevOps is all about removing unnecessary blockers, and individuals who raise obstacles can be a huge help in finding organisational blockers. Lavish your time and attention on the most vocal individuals, these people will be your loudest advocates once you have built trust by understanding their problems. Bring them into meetings with other teams, showing them how collaboration can work to resolve issues can work wonders. Make these people your focus. Your role here will sometimes be to act as an ombudsman (nice!) or a referee (not so nice…!), ensure you’re seen as even handed and fair in these discussions.
The purpose of all this collaboration and education is to build trust, and I think this is actually the most important point here. Where we have trust between teams then, really, much of the hard work is done (although if you can find me an organization where every team trusts every other team 100% I’d be surprised!)
If people get defensive they put up barriers. That’s the exact opposite of what we’re trying to achieve here, we need the organization and our colleagues to understand that our motivation is the best possible service. Don’t put people on the defensive, defensive people put up barriers, we want to break them down.
Trust allows us to delegate authority to the areas where it’s most useful for quick decision-making, for example if our compliance team trusts that the right processes are in place and that the Service Desk will follow them properly then the need for a whole bunch of approvals goes away. It’s getting this trust that unlocks the potential of new ways of working, using guardrails not gates and empowering our delivery teams to work at their best.
Talk about ownership and responsibility
The most successful cultural changes are those introduced incrementally. We can be Agile about changing the culture of our organisations, and in fact from experience this is the most effective method. Introducing new practices incrementally reduces the “shock” factor and also allows trust to build. Certainly I’ve found that DevOps makes my life supporting these applications much easier, and once people begin to see that these changes benefit them as individuals, as well as producing better business outcomes you’ll find there’s a bit of a snowball effect and when you see that, you know that you’ve been successful!
NO BABY STEPS
Your culture is made up of your values and your behaviours. Culture is critical to laying foundations to success.
Your values only have meaning when backed up by the proper behaviours, and behaviour is the domain of the individual. In order to achieve the right organisational behaviours we need to have the correct individual behaviours.
Successful cultures focus on hiring skilled people who are then empowered to own products
Behaviours need to be consistent across the organisation, from the leadership down
Things like having infrastructure as code, continuous delivery, developers who are on call to be able resolve issues that arise, create ownership and it starts to look a lot like DevOps.
So what does this look like from a practical perspective?
Let’s take a look at some practical first steps that allow you build a framework that supports preparing to live and grow with AWS.
We’re going to start building out by using AWS Organizations
This is the master account
Don’t have DX/VPN connections going to it from your DCs
Implement SCP for future accounts, feature of Orgz
Use it for your consolidated billing
Don’t put anything in there that isn’t essential your businesses successful deployment of AWS Accounts. Not production or dev services, only what absolutely has to be there.
Limit access to break-glass, once up and running pull lock down not only who has access, but how they get it too.
Delete Orgs role. Really
Step one is built with security in mind
After that we’re looking at two foundational accounts
Start with a place to send all your logs for audit purposes
Then build out an account that is for security tooling
Make sure access to logging account is break-glass only
And only those who should be in the security account are the only ones with access
These are separate account – call out.
Separate out networking services
Make this the anchor point for your DX links
Once again limit access only to those that need it.
Think least privilege
Deve ser usada pelo time de billing
James talk to the deck
Paul Pick up here and run to the end
This is the starting point. These are the foundational accounts.
Pause 1, 2, 3….
For individual developers
These are subject to the controls placed on the overall environment
Here is where we start to build out the cost allocation, the isolation, and security
You build out your Bus
Here is where you build out your pipeline
NOT Sandbox
You can have multiple dev’s if need be
This is not the same as the developer Sandbox
This is about innovation for the BU
This is the cookie cutter approach for a BU, replicate across your business
Drawing down on the core accounts in a repeatable way
Joe Healy has a session on LandingZone Session 195342