SlideShare une entreprise Scribd logo
1  sur  49
Jon Todd, Chief Architect
ECS and Docker @Okta
August 2, 2016 @JonToddDotCom
Background
Millions of people use Okta every dayMillions of people use Okta every day
Thousands of enterprises use Okta to
connect to Adobe’s Creative Cloud
jim@designer.com
Thousands of Enterprise Customers
Ed, Gov,
Non-Profit
Services Media ConsumerTechnology Manufacturing,
Energy
FinanceCloudHealth
Okta Application Network
Mobility ManagementSingle Sign On Adaptive MFA Provisioning
Universal Directory
Extensible Profiles, Attribute Transformations,
Directory Integration and AD Password Management
Secure SSO for All Your
Web Apps, On-prem
and Cloud, with Flexible
Policy, from Any Device
Contextual Access
Policies,
Modern Factors,
Adaptive Authentication,
Integrations for Apps
and VPNs
Lifecycle Management,
Cloud & On-prem App
Integration, Mastering
from Apps, Directory
Provisioning, Rules,
Workflow, Reporting
Tight User Identity
Integration, Device
Based Contextual
Access,
Light-weight
Management
Okta IT & Platform products
The most reliable IDaaS available
Never taken offline for upgrades
Redundant and scalable
A B C A B C
DC2 DC1
okta.com/trust
A Platform Architecture For Scale
DATA TIER
A B C LOAD
BALANCERS
APP
SERVERS
Global Datacenters
Our stack
stackshare.io/okta/okta
The Problem
Defining a pattern for micro-services
https://www.pinterest.com/pin/205828645447534387/
http://www.bennysbaker.com/poop-emoji-cupcakes/
DevOps abstraction layer
Inspired by: http://dev2ops.org/2010/02/what-is-devops/
Dev OpsWall of turmoil
Dev Ops
I want stabilityI want change
Domain boundary
Repeatability through immutability
• Same runtime environment
dev / test / prod
• Runtime versioned w/ code
• Easy reproducibility
• All changes use same
release process
Additional requirements
• 0-downtime deployments
• Support for our multi-az & multi-region architecture
• Compliance – SOC2 type 2, HIPAA, ISO 27001
• Separation of duties – a.k.a. no developer access to production hosts
• Push button deployment
• Rollback and canary support
Technology Selection
Building blocks
Dev Ops
I want stabilityI want changeContainer frameworks
Cluster schedulerDev Ops
Continuous integration
Options
Container frameworks Cluster schedulers
Amazon EC2
Container Service
LXC
Our problems solved
• Repeatability
• Declarative & composable
Dockerfile
• Images are immutable
• Stability
• Massive community with
production adoption
• Initial release > 3 years ago
• Compliant
• ECS isn’t in flow, EC2 is already
compliant
• DevOps Abstraction
• Hosts and underlying resources
abstracted away
• Task Definition allows developers to
schedule deploys
• Stability
• 0-downtime services
• Fully managed!
• Works with existing AWS tooling
Docker EC2 Container Service
ECS Refresher
Source: All Things Distributed – a.k.a Werner Vogels
Additional concepts
• Task Definitions define one or more containers to run.
• Services define a long running task and run inside a cluster
• Clusters define a set of EC2 resources that can be shared by more
than one service
• Auto scaling groups can be used to define size and launch
configuration of a cluster
CI with ECS Tasks
CI Workflow
Artifactory
(Maven, NPM, Docker, YUM)
Topic builds – topic repo
Promoted builds– release repo
CI Workflow
Why ECS – Isolation & Versioning
1. Lambda: Task which scales cluster based on queue
2. Lambda: Inspect running tasks an bin pack new tasks where possible
• This is one of the changes we had to make in order to use ECS for long running tasks,
rather than long running services spread across many stateless instances
• Disconnects unneeded nodes from cluster allowing themselves to self terminate when they
are idle
Why ECS - Dynamic worker scaling
VS
Dynamic Scaling
Cost Savings With Spot Instances
Feature Requests
• Ability to have spot and on-demand in same Auto Scaling Group (ASG)
• Built-in bin packing scheduler
• Give ASG a termination policy based on ECS status
• i.e. prefer instances with no running tasks
Termination policy
• OldestInstance. Auto Scaling terminates the oldest instance in the group. This
option is useful when you're upgrading the instances in the Auto Scaling group to a
new EC2 instance type, so you can gradually replace instances of the old type with
instances of the new type.
• NewestInstance. Auto Scaling terminates the newest instance in the group. This
policy is useful when you're testing a new launch configuration but don't want to
keep it in production.
• OldestLaunchConfiguration. Auto Scaling terminates instances that have the
oldest launch configuration. This policy is useful when you're updating a group and
phasing out the instances from a previous configuration.
• ClosestToNextInstanceHour. Auto Scaling terminates instances that are closest to
the next billing hour. This policy helps you maximize the use of your instances and
manage costs.
• Default. Auto Scaling uses its default termination policy. This policy is useful when
you have more than one scaling policy associated with the group.
Takeaways
• ECS is running well for us in a 150+ instance cluster
• Bake AMI with large files and common images into host machines
• Spot instances give 2 min warning. Keeps jobs short
Micro-services with
ECS Services
Due diligence
0-Downtime Testing
https://github.com/jontodd/aries
Test Assumptions
• ECS config
• Agent version 1.11.0
• Docker version 1.11.2
• Cluster config
• 8 instances backed by ASG
• ASG config
• 8 instances across 3 AZs
• Default termination policy
• 5 min health check grace period
• ELB
• Timeout 4s
• Interval 5s
• Unhealthy threshold 2
• Healthy threshold 10
• Enable connection draining 300s timeout
• Load generation
• 16 threads
• Throughput
• Interactive  490 r/s
• 10s long poll  1.5 r/s
Operation Interactive Errors
(~70ms latency, 490rps)
Long Poll Errors
(~10s latency, 1.5rps)
Upsize ECS service 4  8 0 0
Downsize ECS service 8  4 0 0
Deploy ECS service – 50% min healthy 0 0
Stop task* 0 0
Downsize Auto Scaling Group (ASG) 0 0
Terminate EC2 instance 0 0
Stop Docker daemon (service docker stop)* 0 0
Stop EC2 instance** 0 0
Kill Docker Container (docker kill <containerId>)* 2 2
Fail health check 450 5
* No intention of running operation in practice ** Caused inconsistent state
Our architecture
Workflow
Auto Scaling Group
Launch Config
EC2
ECS Cluster
ECS
Service
ECS
Canary
Service
Application YAML
Docker Registry
(Artifactory)
ELB
Images pulled
when tasks start
Conductor
(Bastion ECS Controller)
CI Pipeline
Git Repo
Promoted artifactsDockerfile
docker_compose.yml
Test / Preview / ProductionDev
Deploy new version
Application definition
• Developers define YAML for
their application
• Deploy time configuration is
supplied to the ECS task
definition
• Secrets are pulled by the
application at startup
Security conventions
• Container repository
• Only allow containers from internal repository
• IAM separation per service
• Either service per cluster or use new IAM for ECS functionality
• Security scanning of containers - JFrog Xray
• Process monitoring on docker host – cAdvisor from google
• Secrets or any form of config NEVER baked in containers
• Start from minimal, audited base OS
• Run container as non-privileged user w/ user namespaces Docker
1.10+
• Monitor alas.aws.amazon.com for critical updates
Source Conventions
• 3 categories of container definitions
1. “Library” definitions used as the basis for building other images
2. Third-party service definitions e.g. Zookeeper or Elasticsearch
3. Internal service definitions
• Repo per internal service
• Dockerfile in same repo => image versioned with code
• Docker compose for running dependent services
• Pegged versions (no builds)
• Single repo for library and third-party service definitions
Build Conventions
• Integration tests run against code running in container
• Build owns creating immutable version and publishing to artifact server
• Strict rules around “FROM” clause
• Must point at internal artifact server
• Must be tagged following SEMVER-SHORT_SHA convention
• Never allow missing or use of “latest” tag for repeatable builds
Logging and monitoring
• Logging
• All output streams pipe to STDOUT/STDERR of the running process
• Log forwarding is provided by underlying host
• Log entries contain
• Host
• Container Id
• Image name & version
• Request Id
• Metrics
• Host level, generic container metrics provided by host
• App level metrics published directly to well defined endpoints
Feature requests
• ELB
• Dynamic port mapping to containers
• Fail health based on HTTP return code
• Different health endpoint for adding vs removing
• Service level security groups
• Service discovery w/o ELB
• Ability to mark container instances as un-schedulable
• Remove sharp edges around the stopped state
• Give ASG ability to set EC2 ”shutdown behavior”
• Periodic cleanup process in ECS to deregister stopped instances
Takeaways
• /etc/ecs/ecs.config
• ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION for forensics (default 1hr)
• ECS_LOGLEVEL=debug
• Beware of running services in same cluster that use the same ports
• Tune ELB health check
• Docker 1.10 for security enhancements
• Canary & Blue/Green separate service attached to same ELB
• Rollback is trivial
• ECS is incredibly easy to get up and running
• The ecosystem is changing quickly, we are moving cautiously
• ECS team has made a lot of improvements
Dev OpsWall of turmoil
Automated pipeline of awesomeness
Questions?
Thank You
Follow me @JonToddDotCom
Join us @Okta - www.okta.com/company/careers/

Contenu connexe

Tendances

M04_失敗しないための Azure Virtual Desktop 設計ガイド
M04_失敗しないための Azure Virtual Desktop 設計ガイドM04_失敗しないための Azure Virtual Desktop 設計ガイド
M04_失敗しないための Azure Virtual Desktop 設計ガイド
日本マイクロソフト株式会社
 
Tips For Building Private Cloud Architecture With Virtualization
Tips For Building Private Cloud Architecture With Virtualization Tips For Building Private Cloud Architecture With Virtualization
Tips For Building Private Cloud Architecture With Virtualization
Aventis Systems, Inc.
 

Tendances (20)

Pros & Cons of Microservices Architecture
Pros & Cons of Microservices ArchitecturePros & Cons of Microservices Architecture
Pros & Cons of Microservices Architecture
 
AWS Deployment Best Practices
AWS Deployment Best PracticesAWS Deployment Best Practices
AWS Deployment Best Practices
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and Virtualization
 
M04_失敗しないための Azure Virtual Desktop 設計ガイド
M04_失敗しないための Azure Virtual Desktop 設計ガイドM04_失敗しないための Azure Virtual Desktop 設計ガイド
M04_失敗しないための Azure Virtual Desktop 設計ガイド
 
Building secure applications with keycloak
Building secure applications with keycloak Building secure applications with keycloak
Building secure applications with keycloak
 
Tips For Building Private Cloud Architecture With Virtualization
Tips For Building Private Cloud Architecture With Virtualization Tips For Building Private Cloud Architecture With Virtualization
Tips For Building Private Cloud Architecture With Virtualization
 
Azure Logic Apps
Azure Logic AppsAzure Logic Apps
Azure Logic Apps
 
AWS 初心者向けWebinar 利用者が実施するAWS上でのセキュリティ対策
AWS 初心者向けWebinar 利用者が実施するAWS上でのセキュリティ対策AWS 初心者向けWebinar 利用者が実施するAWS上でのセキュリティ対策
AWS 初心者向けWebinar 利用者が実施するAWS上でのセキュリティ対策
 
AWSにおけるIaCを活かしたTerraformの使い方2選! ~循環型IaCとマルチクラウドチックなDR環境~ (HashiTalks: Japan 発...
AWSにおけるIaCを活かしたTerraformの使い方2選! ~循環型IaCとマルチクラウドチックなDR環境~ (HashiTalks: Japan 発...AWSにおけるIaCを活かしたTerraformの使い方2選! ~循環型IaCとマルチクラウドチックなDR環境~ (HashiTalks: Japan 発...
AWSにおけるIaCを活かしたTerraformの使い方2選! ~循環型IaCとマルチクラウドチックなDR環境~ (HashiTalks: Japan 発...
 
AWS Black Belt Tech シリーズ 2015 AWS Device Farm
AWS Black Belt Tech シリーズ 2015 AWS Device FarmAWS Black Belt Tech シリーズ 2015 AWS Device Farm
AWS Black Belt Tech シリーズ 2015 AWS Device Farm
 
Insight into Azure Active Directory #02 - Azure AD B2B Collaboration New Feat...
Insight into Azure Active Directory #02 - Azure AD B2B Collaboration New Feat...Insight into Azure Active Directory #02 - Azure AD B2B Collaboration New Feat...
Insight into Azure Active Directory #02 - Azure AD B2B Collaboration New Feat...
 
Desafio Rest API
Desafio Rest APIDesafio Rest API
Desafio Rest API
 
Lecture - 1 introduction to java
Lecture - 1 introduction to javaLecture - 1 introduction to java
Lecture - 1 introduction to java
 
Authentication vs authorization
Authentication vs authorizationAuthentication vs authorization
Authentication vs authorization
 
Introduction to Microsoft Azure Cloud
Introduction to Microsoft Azure CloudIntroduction to Microsoft Azure Cloud
Introduction to Microsoft Azure Cloud
 
今さら聞けない! Active Directoryドメインサービス入門
今さら聞けない! Active Directoryドメインサービス入門今さら聞けない! Active Directoryドメインサービス入門
今さら聞けない! Active Directoryドメインサービス入門
 
最新Active DirectoryによるIDMaaSとハイブリッド認証基盤の実現
最新Active DirectoryによるIDMaaSとハイブリッド認証基盤の実現最新Active DirectoryによるIDMaaSとハイブリッド認証基盤の実現
最新Active DirectoryによるIDMaaSとハイブリッド認証基盤の実現
 
MDM - airwatch
MDM - airwatchMDM - airwatch
MDM - airwatch
 
Core Java
Core JavaCore Java
Core Java
 
Azure AD B2C – integration in a bank
Azure AD B2C – integration in a bankAzure AD B2C – integration in a bank
Azure AD B2C – integration in a bank
 

En vedette

Cisco Visual Networking Index (VNI) Global Mobile Data Traffic Forecast for 2...
Cisco Visual Networking Index (VNI) Global Mobile Data Traffic Forecast for 2...Cisco Visual Networking Index (VNI) Global Mobile Data Traffic Forecast for 2...
Cisco Visual Networking Index (VNI) Global Mobile Data Traffic Forecast for 2...
Oscar Romano
 
Okta Directory Integration for Microsoft Office365 - from Atidan
Okta Directory Integration for Microsoft Office365 - from AtidanOkta Directory Integration for Microsoft Office365 - from Atidan
Okta Directory Integration for Microsoft Office365 - from Atidan
David J Rosenthal
 

En vedette (20)

KMS at Okta - Intermediate Level
KMS at Okta - Intermediate LevelKMS at Okta - Intermediate Level
KMS at Okta - Intermediate Level
 
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
 
Continuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and DockerContinuous Integration with Amazon ECS and Docker
Continuous Integration with Amazon ECS and Docker
 
Avoiding the Hidden Costs of Active Directory Federation Services (AD FS)
Avoiding the Hidden Costs of Active Directory Federation Services (AD FS)Avoiding the Hidden Costs of Active Directory Federation Services (AD FS)
Avoiding the Hidden Costs of Active Directory Federation Services (AD FS)
 
(SEC401) Encryption Key Storage with AWS KMS at Okta
(SEC401) Encryption Key Storage with AWS KMS at Okta(SEC401) Encryption Key Storage with AWS KMS at Okta
(SEC401) Encryption Key Storage with AWS KMS at Okta
 
Marseille 2017 FTTH Conference Main Programm "Service Delivery in a 3-Layer-O...
Marseille 2017 FTTH Conference Main Programm "Service Delivery in a 3-Layer-O...Marseille 2017 FTTH Conference Main Programm "Service Delivery in a 3-Layer-O...
Marseille 2017 FTTH Conference Main Programm "Service Delivery in a 3-Layer-O...
 
Cisco Visual Networking Index (VNI) Global Mobile Data Traffic Forecast for 2...
Cisco Visual Networking Index (VNI) Global Mobile Data Traffic Forecast for 2...Cisco Visual Networking Index (VNI) Global Mobile Data Traffic Forecast for 2...
Cisco Visual Networking Index (VNI) Global Mobile Data Traffic Forecast for 2...
 
Containers 101 - CloudCamp London
Containers 101 - CloudCamp LondonContainers 101 - CloudCamp London
Containers 101 - CloudCamp London
 
Continuous Delivery to Amazon ECS
Continuous Delivery to Amazon ECSContinuous Delivery to Amazon ECS
Continuous Delivery to Amazon ECS
 
Cisco Visual Networking Index and VNI Service Adoption 2014–2019 - Argentina
Cisco Visual Networking Index and VNI Service Adoption 2014–2019 - ArgentinaCisco Visual Networking Index and VNI Service Adoption 2014–2019 - Argentina
Cisco Visual Networking Index and VNI Service Adoption 2014–2019 - Argentina
 
Marseille 2017 FTTH Conference Workshop "Revenues from Passive Network and fr...
Marseille 2017 FTTH Conference Workshop "Revenues from Passive Network and fr...Marseille 2017 FTTH Conference Workshop "Revenues from Passive Network and fr...
Marseille 2017 FTTH Conference Workshop "Revenues from Passive Network and fr...
 
Amazon API Gateway
Amazon API GatewayAmazon API Gateway
Amazon API Gateway
 
Okta Directory Integration for Microsoft Office365 - from Atidan
Okta Directory Integration for Microsoft Office365 - from AtidanOkta Directory Integration for Microsoft Office365 - from Atidan
Okta Directory Integration for Microsoft Office365 - from Atidan
 
Getting Started with AWS IoT
Getting Started with AWS IoTGetting Started with AWS IoT
Getting Started with AWS IoT
 
Azure API Apps
Azure API AppsAzure API Apps
Azure API Apps
 
Containers - Transforming the data centre as we know it 2016
Containers - Transforming the data centre as we know it 2016Containers - Transforming the data centre as we know it 2016
Containers - Transforming the data centre as we know it 2016
 
Resume Jay a Woods
Resume Jay a WoodsResume Jay a Woods
Resume Jay a Woods
 
Triprockets' Top 10 Sydney bucketlist
Triprockets' Top 10 Sydney bucketlistTriprockets' Top 10 Sydney bucketlist
Triprockets' Top 10 Sydney bucketlist
 
Portafolio de objetos dibujados a mano alzada y en cad
Portafolio de objetos dibujados a mano alzada y en cadPortafolio de objetos dibujados a mano alzada y en cad
Portafolio de objetos dibujados a mano alzada y en cad
 
INFOGRAPHIC: EU Referendum – What do UK business leaders feel about the EU?
INFOGRAPHIC: EU Referendum – What do UK business leaders feel about the EU?INFOGRAPHIC: EU Referendum – What do UK business leaders feel about the EU?
INFOGRAPHIC: EU Referendum – What do UK business leaders feel about the EU?
 

Similaire à ECS and Docker at Okta

Kubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CD
Stfalcon Meetups
 
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
Amazon Web Services
 

Similaire à ECS and Docker at Okta (20)

Managing Your Cloud Assets
Managing Your Cloud AssetsManaging Your Cloud Assets
Managing Your Cloud Assets
 
oci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdfoci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdf
 
Power of Azure Devops
Power of Azure DevopsPower of Azure Devops
Power of Azure Devops
 
Continuous Integration and Deployment Best Practices on AWS
Continuous Integration and Deployment Best Practices on AWSContinuous Integration and Deployment Best Practices on AWS
Continuous Integration and Deployment Best Practices on AWS
 
Elastic Kubernetes Services (EKS)
Elastic Kubernetes Services (EKS)Elastic Kubernetes Services (EKS)
Elastic Kubernetes Services (EKS)
 
Develop and deploy Kubernetes applications with Docker - IBM Index 2018
Develop and deploy Kubernetes  applications with Docker - IBM Index 2018Develop and deploy Kubernetes  applications with Docker - IBM Index 2018
Develop and deploy Kubernetes applications with Docker - IBM Index 2018
 
How Easy to Automate Application Deployment on AWS
How Easy to Automate Application Deployment on AWSHow Easy to Automate Application Deployment on AWS
How Easy to Automate Application Deployment on AWS
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Power of Choice in Docker EE 2.0 - Anoop - Docker - CC18
Power of Choice in Docker EE 2.0 - Anoop - Docker - CC18Power of Choice in Docker EE 2.0 - Anoop - Docker - CC18
Power of Choice in Docker EE 2.0 - Anoop - Docker - CC18
 
Migrating the GoPro Plus Cloud Service to Amazon ECS
Migrating the GoPro Plus Cloud Service to Amazon ECSMigrating the GoPro Plus Cloud Service to Amazon ECS
Migrating the GoPro Plus Cloud Service to Amazon ECS
 
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB201904_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
 
Kubernetes on aws
Kubernetes on awsKubernetes on aws
Kubernetes on aws
 
Kubernetes on on on on on on on on on on on on on on Azure Deck.pptx
Kubernetes on on on on on on on on on on on on on on Azure Deck.pptxKubernetes on on on on on on on on on on on on on on Azure Deck.pptx
Kubernetes on on on on on on on on on on on on on on Azure Deck.pptx
 
AWS January 2016 Webinar Series - Introduction to Deploying Applications on AWS
AWS January 2016 Webinar Series - Introduction to Deploying Applications on AWSAWS January 2016 Webinar Series - Introduction to Deploying Applications on AWS
AWS January 2016 Webinar Series - Introduction to Deploying Applications on AWS
 
Making sense of containers, docker and Kubernetes on Azure.
Making sense of containers, docker and Kubernetes on Azure.Making sense of containers, docker and Kubernetes on Azure.
Making sense of containers, docker and Kubernetes on Azure.
 
Manage your kubernetes cluster with cluster api, azure and git ops
Manage your kubernetes cluster with cluster api, azure and git opsManage your kubernetes cluster with cluster api, azure and git ops
Manage your kubernetes cluster with cluster api, azure and git ops
 
Automating Security in your IaC Pipeline
Automating Security in your IaC PipelineAutomating Security in your IaC Pipeline
Automating Security in your IaC Pipeline
 
Kubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CD
 
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

ECS and Docker at Okta

  • 1. Jon Todd, Chief Architect ECS and Docker @Okta August 2, 2016 @JonToddDotCom
  • 3. Millions of people use Okta every dayMillions of people use Okta every day
  • 4. Thousands of enterprises use Okta to connect to Adobe’s Creative Cloud jim@designer.com
  • 5. Thousands of Enterprise Customers Ed, Gov, Non-Profit Services Media ConsumerTechnology Manufacturing, Energy FinanceCloudHealth
  • 6. Okta Application Network Mobility ManagementSingle Sign On Adaptive MFA Provisioning Universal Directory Extensible Profiles, Attribute Transformations, Directory Integration and AD Password Management Secure SSO for All Your Web Apps, On-prem and Cloud, with Flexible Policy, from Any Device Contextual Access Policies, Modern Factors, Adaptive Authentication, Integrations for Apps and VPNs Lifecycle Management, Cloud & On-prem App Integration, Mastering from Apps, Directory Provisioning, Rules, Workflow, Reporting Tight User Identity Integration, Device Based Contextual Access, Light-weight Management Okta IT & Platform products
  • 7. The most reliable IDaaS available Never taken offline for upgrades Redundant and scalable A B C A B C DC2 DC1 okta.com/trust A Platform Architecture For Scale DATA TIER A B C LOAD BALANCERS APP SERVERS
  • 11. Defining a pattern for micro-services https://www.pinterest.com/pin/205828645447534387/ http://www.bennysbaker.com/poop-emoji-cupcakes/
  • 12. DevOps abstraction layer Inspired by: http://dev2ops.org/2010/02/what-is-devops/ Dev OpsWall of turmoil Dev Ops I want stabilityI want change Domain boundary
  • 13. Repeatability through immutability • Same runtime environment dev / test / prod • Runtime versioned w/ code • Easy reproducibility • All changes use same release process
  • 14. Additional requirements • 0-downtime deployments • Support for our multi-az & multi-region architecture • Compliance – SOC2 type 2, HIPAA, ISO 27001 • Separation of duties – a.k.a. no developer access to production hosts • Push button deployment • Rollback and canary support
  • 16. Building blocks Dev Ops I want stabilityI want changeContainer frameworks Cluster schedulerDev Ops Continuous integration
  • 17. Options Container frameworks Cluster schedulers Amazon EC2 Container Service LXC
  • 18. Our problems solved • Repeatability • Declarative & composable Dockerfile • Images are immutable • Stability • Massive community with production adoption • Initial release > 3 years ago • Compliant • ECS isn’t in flow, EC2 is already compliant • DevOps Abstraction • Hosts and underlying resources abstracted away • Task Definition allows developers to schedule deploys • Stability • 0-downtime services • Fully managed! • Works with existing AWS tooling Docker EC2 Container Service
  • 20. Source: All Things Distributed – a.k.a Werner Vogels
  • 21. Additional concepts • Task Definitions define one or more containers to run. • Services define a long running task and run inside a cluster • Clusters define a set of EC2 resources that can be shared by more than one service • Auto scaling groups can be used to define size and launch configuration of a cluster
  • 22. CI with ECS Tasks
  • 23. CI Workflow Artifactory (Maven, NPM, Docker, YUM) Topic builds – topic repo Promoted builds– release repo
  • 25. Why ECS – Isolation & Versioning
  • 26. 1. Lambda: Task which scales cluster based on queue 2. Lambda: Inspect running tasks an bin pack new tasks where possible • This is one of the changes we had to make in order to use ECS for long running tasks, rather than long running services spread across many stateless instances • Disconnects unneeded nodes from cluster allowing themselves to self terminate when they are idle Why ECS - Dynamic worker scaling VS
  • 28. Cost Savings With Spot Instances
  • 29. Feature Requests • Ability to have spot and on-demand in same Auto Scaling Group (ASG) • Built-in bin packing scheduler • Give ASG a termination policy based on ECS status • i.e. prefer instances with no running tasks
  • 30. Termination policy • OldestInstance. Auto Scaling terminates the oldest instance in the group. This option is useful when you're upgrading the instances in the Auto Scaling group to a new EC2 instance type, so you can gradually replace instances of the old type with instances of the new type. • NewestInstance. Auto Scaling terminates the newest instance in the group. This policy is useful when you're testing a new launch configuration but don't want to keep it in production. • OldestLaunchConfiguration. Auto Scaling terminates instances that have the oldest launch configuration. This policy is useful when you're updating a group and phasing out the instances from a previous configuration. • ClosestToNextInstanceHour. Auto Scaling terminates instances that are closest to the next billing hour. This policy helps you maximize the use of your instances and manage costs. • Default. Auto Scaling uses its default termination policy. This policy is useful when you have more than one scaling policy associated with the group.
  • 31. Takeaways • ECS is running well for us in a 150+ instance cluster • Bake AMI with large files and common images into host machines • Spot instances give 2 min warning. Keeps jobs short
  • 35.
  • 36. Test Assumptions • ECS config • Agent version 1.11.0 • Docker version 1.11.2 • Cluster config • 8 instances backed by ASG • ASG config • 8 instances across 3 AZs • Default termination policy • 5 min health check grace period • ELB • Timeout 4s • Interval 5s • Unhealthy threshold 2 • Healthy threshold 10 • Enable connection draining 300s timeout • Load generation • 16 threads • Throughput • Interactive  490 r/s • 10s long poll  1.5 r/s
  • 37. Operation Interactive Errors (~70ms latency, 490rps) Long Poll Errors (~10s latency, 1.5rps) Upsize ECS service 4  8 0 0 Downsize ECS service 8  4 0 0 Deploy ECS service – 50% min healthy 0 0 Stop task* 0 0 Downsize Auto Scaling Group (ASG) 0 0 Terminate EC2 instance 0 0 Stop Docker daemon (service docker stop)* 0 0 Stop EC2 instance** 0 0 Kill Docker Container (docker kill <containerId>)* 2 2 Fail health check 450 5 * No intention of running operation in practice ** Caused inconsistent state
  • 39. Workflow Auto Scaling Group Launch Config EC2 ECS Cluster ECS Service ECS Canary Service Application YAML Docker Registry (Artifactory) ELB Images pulled when tasks start Conductor (Bastion ECS Controller) CI Pipeline Git Repo Promoted artifactsDockerfile docker_compose.yml Test / Preview / ProductionDev Deploy new version
  • 40. Application definition • Developers define YAML for their application • Deploy time configuration is supplied to the ECS task definition • Secrets are pulled by the application at startup
  • 41. Security conventions • Container repository • Only allow containers from internal repository • IAM separation per service • Either service per cluster or use new IAM for ECS functionality • Security scanning of containers - JFrog Xray • Process monitoring on docker host – cAdvisor from google • Secrets or any form of config NEVER baked in containers • Start from minimal, audited base OS • Run container as non-privileged user w/ user namespaces Docker 1.10+ • Monitor alas.aws.amazon.com for critical updates
  • 42. Source Conventions • 3 categories of container definitions 1. “Library” definitions used as the basis for building other images 2. Third-party service definitions e.g. Zookeeper or Elasticsearch 3. Internal service definitions • Repo per internal service • Dockerfile in same repo => image versioned with code • Docker compose for running dependent services • Pegged versions (no builds) • Single repo for library and third-party service definitions
  • 43. Build Conventions • Integration tests run against code running in container • Build owns creating immutable version and publishing to artifact server • Strict rules around “FROM” clause • Must point at internal artifact server • Must be tagged following SEMVER-SHORT_SHA convention • Never allow missing or use of “latest” tag for repeatable builds
  • 44. Logging and monitoring • Logging • All output streams pipe to STDOUT/STDERR of the running process • Log forwarding is provided by underlying host • Log entries contain • Host • Container Id • Image name & version • Request Id • Metrics • Host level, generic container metrics provided by host • App level metrics published directly to well defined endpoints
  • 45. Feature requests • ELB • Dynamic port mapping to containers • Fail health based on HTTP return code • Different health endpoint for adding vs removing • Service level security groups • Service discovery w/o ELB • Ability to mark container instances as un-schedulable • Remove sharp edges around the stopped state • Give ASG ability to set EC2 ”shutdown behavior” • Periodic cleanup process in ECS to deregister stopped instances
  • 46. Takeaways • /etc/ecs/ecs.config • ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION for forensics (default 1hr) • ECS_LOGLEVEL=debug • Beware of running services in same cluster that use the same ports • Tune ELB health check • Docker 1.10 for security enhancements • Canary & Blue/Green separate service attached to same ELB • Rollback is trivial • ECS is incredibly easy to get up and running • The ecosystem is changing quickly, we are moving cautiously • ECS team has made a lot of improvements
  • 47. Dev OpsWall of turmoil Automated pipeline of awesomeness
  • 49. Thank You Follow me @JonToddDotCom Join us @Okta - www.okta.com/company/careers/

Notes de l'éditeur

  1. How many have heard of Okta? Used it?
  2. This is the full set of Okta IT and Platform products, 100-cloud based and integrated. Each of these are full featured products you could use to replace CA, RSA, or Airwatch.
  3. Java backend JS Front end Entirely hosted in the cloud in AWS In general we like using and giving back to open source
  4. Same environment dev / test / prod Environment should be versioned with code Problems with chef mutating production with bad or incorrect version of config Easy reproducibility Security audit can be done on artifact and then just monitor runtime for correct version
  5. All together we get a PATTERN FOR MICROSERVICES
  6. - We run on ecs optimized - Reduced packages - Upgrade is easier
  7. Developers can run all CI test on any topic branch Master locked down, Bacon is the gate keeper Jenkins used for job definition and lifecycle Slave pool is ECS! ECS run as short lived tasks Each day we get between 100 & 150 containers at peak load
  8. This is bacon
  9. From before: main goal repeatability and immutability Not only is the artifact and it’s runtime immutable but the container which builds the artifact for testing is containerized Solves classic problem: changes to environment in CI
  10. Who has the knowledge about sizing?
  11. We presently respond to Spot price termination notices( you get 2 minutes warning) by placing tasks running on a node to be terminated back into the queue to immediately get picked up by another node. Currently working on recognition of spot price instance pool cascades, so we can switch to on demand. No ability to have both spot and on demand in same ASG Something to worry about. If the prices spike, and cause large outages, what is the availability of on demand instances?
  12. We auto scale daily to around 150 instances and back down to under 20 daily. Preload Maven, NPM, & git repositories. This saved us about 4 minutes on container start time
  13. The integration point between CI and Deployment is artifactory. Any sign off or approval happens there Autoscaling groups control pool of EC2 instances Launch config sets environment variables for ECS config like cluster and ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION ECS cluster per service due to IAM issues, looking forward to using new feature 1 or more services registered to a service ELB supports canary Conductor is bastion service. Allows non-operators to perform deployments
  14. Software is grouped into applications which may have multiple components YAML defines all components. In this case we have an application with a single backend running in ECS
  15. 2016-07-26T18:56:08Z [INFO] Redundant container state change for task op1-sage:15 arn:aws:ecs:us-east-1:011750033084:task/8f9920cf-a289-44bb-ac43-e436d6fb84d7, Status: (RUNNING->RUNNING) Containers: [op1-sage-app (RUNNING->RUNNING),]: op1-sage-app(docker.aue1d.saasure.com/okta-sage:1_1_0_029796_ec67fd3) (RUNNING->RUNNING) to RUNNING, but already RUNNING